Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 針對臉書粉絲專頁貼文之政治傾向預測
Predicting Political Affiliation for Posts on Facebook Fan Pages
作者 張哲嘉
Chang, Che Chia
貢獻者 徐國偉
Hsu, Kuo Wei
張哲嘉
Chang, Che Chia
關鍵詞 政治傾向
分類
臉書
文字探勘
political affiliation
classification
facebook
text mining
日期 2016
上傳時間 1-Jun-2016 13:53:37 (UTC+8)
摘要 近年來社群媒體興起,尤其以臉書為主。在台灣超過1500萬個臉書用戶,其遍及族群從公眾人物到一般民眾。此外,這類的新興資訊交流平台其實內含許多有意義的資訊,每一則貼文都隱含著每個使用者的情緒以及立場傾向。然而,利用社群媒體來預測選舉與使用者政治傾向已成為目前的趨勢,在台灣各政黨與政治人物紛紛成立粉絲專頁,投入利用網路與社群媒體來打選戰與預測民調。本研究發現此一特性,致力於預測粉絲專頁貼文之政治傾向,收集台灣兩大政黨派國民黨與民進黨之粉絲專頁貼文,建立兩種預測模型分別為以相異字為特徵模型與文字互動特徵模型。利用資料探勘之相關技術,以貼文所含藍綠政黨特徵表現建立分類器,並細部探討與設計多種特徵組合,比較不同特徵組合之預測效果與影響因素以及在預測資料不平衡的情況下是否影響分類結果。最後,研究結果顯示使用文字特徵中黨派典型字與互動特徵值域取對數並搭配KNN分類器效果最佳,其準確度可達0.908,F1-score可達0.827。
Recently, the social media is becoming more and more popular, especially Facebook. In Taiwan, there are 15 million Facebook users from celebrities to the general public. Receiving information every day from Facebook has become a lifestyle of most people. These new information-exchanging platforms contain lots of meaningful messages including users` emotions and affiliations. Moreover, using the social media data to predict the election result and political affiliation is becoming the current trend in Taiwan. For example, politicians try to win the election and predict the polls by means of Internet and the social media, and every political parties also have their own fan pages. In this thesis, we make an effort to predict the political inclinations of the posts of fan pages, especially for KMT and DPP which are the two largest political parties in Taiwan. We filter the appropriate literal and interactive features. We use the posts of the two parties to predict the political inclinations by constructing the classification models .In the end, we compare the performances of different classifiers .The result shows that the literal and interactive features work the best with KNN classifier, whose accuracy and F1-score are 0.908 and 0.827, respectively.
參考文獻 [1] D. Gayo-Avello, P. T. Metaxas and E. Mustafaraj, “Limits of Electoral Predictions using Twitter,” Proceedings of the International Conference on Weblogs and Social Media (ICWSM’11), 2011.
[2] A. Boutet, H. Kim, and E. Yoneki, “What’s in Your Tweets? I Know Who You Supported in the UK 2010 General Election,” Proceedings of the International Conference on Weblogs and Social Media (ICWSM’12), 2012.
[3] 結合長詞優先與序列標記之中文斷詞研究 林千翔∗、張嘉惠*、陳貞伶∗ Computational Linguistics and Chinese Language Processing Vol. 15, No. 3-4, September/December 2010, pp. 161-180
[4] Chen, K.J. & Ming-Hong Bai, "Unknown Word Detection for Chinese by a Corpus-based Learning Method," International Journal of Computational linguistics and Chinese Language Processing, 1998, Vol.3, #1, pages 27-44 [PS]
[5]Chen, Keh-Jiann, and Wei-Yun Ma. "Unknown word extraction for Chinese documents." Proceedings of the 19th international conference on Computational linguistics-Volume 1. Association for Computational Linguistics, 2002.
[6]Ma, Wei-Yun, and Keh-Jiann Chen. "A bottom-up merging algorithm for Chinese unknown word extraction." Proceedings of the second SIGHAN workshop on Chinese language processing-Volume 17. Association for Computational Linguistics, 2003.
[7] B. O’Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith, “From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series,” Proceedings of the International Conference on Weblogs and Social Media (ICWSM’10), 2010.

[8] A. Tumasjan, T. O. Sprenger, P. G. Sandner and I. M. Welpe, “Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment,” Proceedings of the International Conference on Weblogs and Social Media (ICWSM’10), 2010.
[9] M. D. Conver, B. Goncalves, J. Ratkiweicz, A. Flammini, F. Menczer, “Predicting the Political Alignment of Twitter Users,” Proceedings of the IEEE Conference on Social Computing (SocialCom’11), 2011.
[10] Clay Fink, Nathan Bos, Alexander Perrone, Edwina Liu, and Jonathon Kopcky, “Twitter, Public Opinion, and the 2011 Nigerian Presidential Election,” Proceedings of the IEEE Conference on Social Computing (SocialCom’13), 2013.
[11] A. Makazhanov and D. Rafiel, “Predicting Political Preference of Twitter Users,” Proceedings of the International Conference on Advances in Social Network Analysis and Mining (ASONAM’13), 2013.
[12] S. O’Banion and L. Birnbaum, “Using Explicit Linguistic Expressions of Preference in Social Media to Predict Voting Behavior,” Proceedings of the International Conference on Advances in Social Network Analysis and Mining (ASONAM’13), 2013.
[13] Marco Pennacchiotti, Ana-Maria Popescu,” Democrats, Republicans and Starbucks Afficionados: User Classification in Twitter,” Proceedings of the 17th SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’11), 2011.
[14] Tumitan, Diego, and Kurt Becker. "Sentiment-based features for predicting election polls: a case study on the brazilian scenario." Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on. Vol. 2. IEEE, 2014.
[15] Z. Dong and Q. Dong, “HowNet and the Computation of Meaning,” World Scientific Publishing Co., Inc., River Edge, NJ, 2006.
[16] Wu, Xindong, et al. "Top 10 algorithms in data mining." Knowledge and Information Systems 14.1 (2008): 1-37.
[17] L. W. Ku and H. H. Chen, "Mining Opinions from the Web: Beyond Relevance Retrieval," Journal of American Society for Information Science and Technology, Special Issue on Mining Web Resources for Enhancing Information Retrieval, 2007, Volume 58 Issue 12, pp.1838-1850.
[18] 江家榕,以社群媒體為考量之選民政治傾向探索,政治大學論文,2015
[19] 陳慧潔,國小高年級學童臉書使用行為,臉書成癮與人際溝通能力相關研究,中華大學碩士論文 2013
[20] 林育珊,科技接受模式對學生使用社群媒體輔助學習的行為意圖之研究,高雄師範大學碩士論文,2015
[21] 陳冰淳,Web2.0時代影響社群媒體新聞資訊信任的心理因素——以微博為例,台灣大學碩士論文,2015
[22] 維基百科https://zh.wikipedia.org/wiki/Wikipedia:%E9%A6%96%E9%A1%B5
[23] 中央研究院中文斷詞系統,http://ckipsvr.iis.sinica.edu.tw/[2011/11/12]
[24] 陳克健, 黃淑齡, 施悅音, 和陳怡君, “多層次概念定義與複雜關係表達-繁體字知網的新增架構,” 漢語詞彙語義研究的現狀與發展趨勢國際學術研討會, 2004.
[25]Weaver, Jesse, and Paul Tarjan. "Facebook linked data via the graph API." Semantic Web 4.3 (2013): 245-250.
[26] 黃羿綺,政治人物之社交網路建置與分析,政治大學論文,2015
[27]Loureiro, Antonio, Luis Torgo, and Carlos Soares. "Outlier detection using clustering methods: a data cleaning application." Proceedings of KDNet Symposium on Knowledge-based Systems for the Public Sector. Bonn, Germany. 2004.
[28]Lewis, David D. "Naive (Bayes) at forty: The independence assumption in information retrieval." Machine learning: ECML-98. Springer Berlin Heidelberg, 1998.
[29]Zhang, Min-Ling, and Zhi-Hua Zhou. "ML-KNN: A lazy learning approach to multi-label learning." Pattern recognition 40.7 (2007): 2038-2048.
[30]Joachims, Thorsten. Making large scale SVM learning practical. Universität Dortmund, 1999.
[31]Safavian, S. Rasoul, and David Landgrebe. "A survey of decision tree classifier methodology." (1990).
[32]Rätsch, Gunnar, Takashi Onoda, and K-R. Müller. "Soft margins for AdaBoost." Machine learning 42.3 (2001): 287-320.
描述 碩士
國立政治大學
資訊科學學系
103753002
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0103753002
資料類型 thesis
dc.contributor.advisor 徐國偉zh_TW
dc.contributor.advisor Hsu, Kuo Weien_US
dc.contributor.author (Authors) 張哲嘉zh_TW
dc.contributor.author (Authors) Chang, Che Chiaen_US
dc.creator (作者) 張哲嘉zh_TW
dc.creator (作者) Chang, Che Chiaen_US
dc.date (日期) 2016en_US
dc.date.accessioned 1-Jun-2016 13:53:37 (UTC+8)-
dc.date.available 1-Jun-2016 13:53:37 (UTC+8)-
dc.date.issued (上傳時間) 1-Jun-2016 13:53:37 (UTC+8)-
dc.identifier (Other Identifiers) G0103753002en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/97113-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學學系zh_TW
dc.description (描述) 103753002zh_TW
dc.description.abstract (摘要) 近年來社群媒體興起,尤其以臉書為主。在台灣超過1500萬個臉書用戶,其遍及族群從公眾人物到一般民眾。此外,這類的新興資訊交流平台其實內含許多有意義的資訊,每一則貼文都隱含著每個使用者的情緒以及立場傾向。然而,利用社群媒體來預測選舉與使用者政治傾向已成為目前的趨勢,在台灣各政黨與政治人物紛紛成立粉絲專頁,投入利用網路與社群媒體來打選戰與預測民調。本研究發現此一特性,致力於預測粉絲專頁貼文之政治傾向,收集台灣兩大政黨派國民黨與民進黨之粉絲專頁貼文,建立兩種預測模型分別為以相異字為特徵模型與文字互動特徵模型。利用資料探勘之相關技術,以貼文所含藍綠政黨特徵表現建立分類器,並細部探討與設計多種特徵組合,比較不同特徵組合之預測效果與影響因素以及在預測資料不平衡的情況下是否影響分類結果。最後,研究結果顯示使用文字特徵中黨派典型字與互動特徵值域取對數並搭配KNN分類器效果最佳,其準確度可達0.908,F1-score可達0.827。zh_TW
dc.description.abstract (摘要) Recently, the social media is becoming more and more popular, especially Facebook. In Taiwan, there are 15 million Facebook users from celebrities to the general public. Receiving information every day from Facebook has become a lifestyle of most people. These new information-exchanging platforms contain lots of meaningful messages including users` emotions and affiliations. Moreover, using the social media data to predict the election result and political affiliation is becoming the current trend in Taiwan. For example, politicians try to win the election and predict the polls by means of Internet and the social media, and every political parties also have their own fan pages. In this thesis, we make an effort to predict the political inclinations of the posts of fan pages, especially for KMT and DPP which are the two largest political parties in Taiwan. We filter the appropriate literal and interactive features. We use the posts of the two parties to predict the political inclinations by constructing the classification models .In the end, we compare the performances of different classifiers .The result shows that the literal and interactive features work the best with KNN classifier, whose accuracy and F1-score are 0.908 and 0.827, respectively.en_US
dc.description.tableofcontents 第一章 緒論 1
1.1研究動機與目的 1
1.2研究對象 3
1.3研究貢獻 5
1.4論文架構 5
第二章 文獻探討 7
2.1社群媒體之相關研究 7
2.1.1臉書與Graph API相關研究 9
2.2中文斷詞器CKIP之相關研究 11
2.3社群媒體與政治傾向預測之相關研究 12
第三章 研究方法 16
3.1系統架構 16
3.2藍綠政黨代表粉絲專頁 18
3.3資料前處理 21
3.3.1 中文斷詞 22
3.3.2去除停止詞 22
3.3.3 限制詞字數 22
3.4資料政治傾向 23
3.5以相異詞為特徵之預測方法 23
3.5.1 TF權重法 24
3.5.2 TF-IDF權重法 26
3.5.3 BTO權重法 27
3.6以文字與互動為特徵之方法 29
3.6.1 文字特徵擷取 29
3.6.2 互動特徵擷取 31
3.6.2.1互動特徵轉型 31
3.6.2.2互動特徵擷取 34
第四章 實驗方法與驗證 38
4.1實驗資料 38
4.1.1 儲存格式 38
4.1.2 實驗資料量 39
4.2實驗環境與評估指標 41
4.2.1實驗環境配置 41
4.2.2實驗評估指標 41
4.3實驗分類器與交叉驗證 42
4.4以相異詞為特徵方法之實驗 44
4.4.1以相異詞為特徵實驗input data 44
4.4.2以相異詞為特徵實驗結果 45
4.5以文字與互動特徵方法之實驗 49
4.5.1黨派典型字個數實驗 50
4.5.2互動特徵值域分布 51
4.5.3貼文數平衡實驗 55
4.5.4特徵組合實驗 57
4.5.5黨派指標人物訓練模型實驗 65
4.6實驗結果比較 68
4.7楊秋興個案討論 69
第五章 結論與未來展望 74
5.1結論 74
5.2未來展望 75
參考文獻 76
附錄 Rapidminer操作簡介 80
zh_TW
dc.format.extent 3446664 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0103753002en_US
dc.subject (關鍵詞) 政治傾向zh_TW
dc.subject (關鍵詞) 分類zh_TW
dc.subject (關鍵詞) 臉書zh_TW
dc.subject (關鍵詞) 文字探勘zh_TW
dc.subject (關鍵詞) political affiliationen_US
dc.subject (關鍵詞) classificationen_US
dc.subject (關鍵詞) facebooken_US
dc.subject (關鍵詞) text miningen_US
dc.title (題名) 針對臉書粉絲專頁貼文之政治傾向預測zh_TW
dc.title (題名) Predicting Political Affiliation for Posts on Facebook Fan Pagesen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] D. Gayo-Avello, P. T. Metaxas and E. Mustafaraj, “Limits of Electoral Predictions using Twitter,” Proceedings of the International Conference on Weblogs and Social Media (ICWSM’11), 2011.
[2] A. Boutet, H. Kim, and E. Yoneki, “What’s in Your Tweets? I Know Who You Supported in the UK 2010 General Election,” Proceedings of the International Conference on Weblogs and Social Media (ICWSM’12), 2012.
[3] 結合長詞優先與序列標記之中文斷詞研究 林千翔∗、張嘉惠*、陳貞伶∗ Computational Linguistics and Chinese Language Processing Vol. 15, No. 3-4, September/December 2010, pp. 161-180
[4] Chen, K.J. & Ming-Hong Bai, "Unknown Word Detection for Chinese by a Corpus-based Learning Method," International Journal of Computational linguistics and Chinese Language Processing, 1998, Vol.3, #1, pages 27-44 [PS]
[5]Chen, Keh-Jiann, and Wei-Yun Ma. "Unknown word extraction for Chinese documents." Proceedings of the 19th international conference on Computational linguistics-Volume 1. Association for Computational Linguistics, 2002.
[6]Ma, Wei-Yun, and Keh-Jiann Chen. "A bottom-up merging algorithm for Chinese unknown word extraction." Proceedings of the second SIGHAN workshop on Chinese language processing-Volume 17. Association for Computational Linguistics, 2003.
[7] B. O’Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith, “From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series,” Proceedings of the International Conference on Weblogs and Social Media (ICWSM’10), 2010.

[8] A. Tumasjan, T. O. Sprenger, P. G. Sandner and I. M. Welpe, “Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment,” Proceedings of the International Conference on Weblogs and Social Media (ICWSM’10), 2010.
[9] M. D. Conver, B. Goncalves, J. Ratkiweicz, A. Flammini, F. Menczer, “Predicting the Political Alignment of Twitter Users,” Proceedings of the IEEE Conference on Social Computing (SocialCom’11), 2011.
[10] Clay Fink, Nathan Bos, Alexander Perrone, Edwina Liu, and Jonathon Kopcky, “Twitter, Public Opinion, and the 2011 Nigerian Presidential Election,” Proceedings of the IEEE Conference on Social Computing (SocialCom’13), 2013.
[11] A. Makazhanov and D. Rafiel, “Predicting Political Preference of Twitter Users,” Proceedings of the International Conference on Advances in Social Network Analysis and Mining (ASONAM’13), 2013.
[12] S. O’Banion and L. Birnbaum, “Using Explicit Linguistic Expressions of Preference in Social Media to Predict Voting Behavior,” Proceedings of the International Conference on Advances in Social Network Analysis and Mining (ASONAM’13), 2013.
[13] Marco Pennacchiotti, Ana-Maria Popescu,” Democrats, Republicans and Starbucks Afficionados: User Classification in Twitter,” Proceedings of the 17th SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’11), 2011.
[14] Tumitan, Diego, and Kurt Becker. "Sentiment-based features for predicting election polls: a case study on the brazilian scenario." Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on. Vol. 2. IEEE, 2014.
[15] Z. Dong and Q. Dong, “HowNet and the Computation of Meaning,” World Scientific Publishing Co., Inc., River Edge, NJ, 2006.
[16] Wu, Xindong, et al. "Top 10 algorithms in data mining." Knowledge and Information Systems 14.1 (2008): 1-37.
[17] L. W. Ku and H. H. Chen, "Mining Opinions from the Web: Beyond Relevance Retrieval," Journal of American Society for Information Science and Technology, Special Issue on Mining Web Resources for Enhancing Information Retrieval, 2007, Volume 58 Issue 12, pp.1838-1850.
[18] 江家榕,以社群媒體為考量之選民政治傾向探索,政治大學論文,2015
[19] 陳慧潔,國小高年級學童臉書使用行為,臉書成癮與人際溝通能力相關研究,中華大學碩士論文 2013
[20] 林育珊,科技接受模式對學生使用社群媒體輔助學習的行為意圖之研究,高雄師範大學碩士論文,2015
[21] 陳冰淳,Web2.0時代影響社群媒體新聞資訊信任的心理因素——以微博為例,台灣大學碩士論文,2015
[22] 維基百科https://zh.wikipedia.org/wiki/Wikipedia:%E9%A6%96%E9%A1%B5
[23] 中央研究院中文斷詞系統,http://ckipsvr.iis.sinica.edu.tw/[2011/11/12]
[24] 陳克健, 黃淑齡, 施悅音, 和陳怡君, “多層次概念定義與複雜關係表達-繁體字知網的新增架構,” 漢語詞彙語義研究的現狀與發展趨勢國際學術研討會, 2004.
[25]Weaver, Jesse, and Paul Tarjan. "Facebook linked data via the graph API." Semantic Web 4.3 (2013): 245-250.
[26] 黃羿綺,政治人物之社交網路建置與分析,政治大學論文,2015
[27]Loureiro, Antonio, Luis Torgo, and Carlos Soares. "Outlier detection using clustering methods: a data cleaning application." Proceedings of KDNet Symposium on Knowledge-based Systems for the Public Sector. Bonn, Germany. 2004.
[28]Lewis, David D. "Naive (Bayes) at forty: The independence assumption in information retrieval." Machine learning: ECML-98. Springer Berlin Heidelberg, 1998.
[29]Zhang, Min-Ling, and Zhi-Hua Zhou. "ML-KNN: A lazy learning approach to multi-label learning." Pattern recognition 40.7 (2007): 2038-2048.
[30]Joachims, Thorsten. Making large scale SVM learning practical. Universität Dortmund, 1999.
[31]Safavian, S. Rasoul, and David Landgrebe. "A survey of decision tree classifier methodology." (1990).
[32]Rätsch, Gunnar, Takashi Onoda, and K-R. Müller. "Soft margins for AdaBoost." Machine learning 42.3 (2001): 287-320.
zh_TW