學術產出-學位論文
文章檢視/開啟
書目匯出
-
題名 針對臉書粉絲專頁貼文之政治傾向預測
Predicting Political Affiliation for Posts on Facebook Fan Pages作者 張哲嘉
Chang, Che Chia貢獻者 徐國偉
Hsu, Kuo Wei
張哲嘉
Chang, Che Chia關鍵詞 政治傾向
分類
臉書
文字探勘
political affiliation
classification
facebook
text mining日期 2016 上傳時間 1-六月-2016 13:53:37 (UTC+8) 摘要 近年來社群媒體興起,尤其以臉書為主。在台灣超過1500萬個臉書用戶,其遍及族群從公眾人物到一般民眾。此外,這類的新興資訊交流平台其實內含許多有意義的資訊,每一則貼文都隱含著每個使用者的情緒以及立場傾向。然而,利用社群媒體來預測選舉與使用者政治傾向已成為目前的趨勢,在台灣各政黨與政治人物紛紛成立粉絲專頁,投入利用網路與社群媒體來打選戰與預測民調。本研究發現此一特性,致力於預測粉絲專頁貼文之政治傾向,收集台灣兩大政黨派國民黨與民進黨之粉絲專頁貼文,建立兩種預測模型分別為以相異字為特徵模型與文字互動特徵模型。利用資料探勘之相關技術,以貼文所含藍綠政黨特徵表現建立分類器,並細部探討與設計多種特徵組合,比較不同特徵組合之預測效果與影響因素以及在預測資料不平衡的情況下是否影響分類結果。最後,研究結果顯示使用文字特徵中黨派典型字與互動特徵值域取對數並搭配KNN分類器效果最佳,其準確度可達0.908,F1-score可達0.827。
Recently, the social media is becoming more and more popular, especially Facebook. In Taiwan, there are 15 million Facebook users from celebrities to the general public. Receiving information every day from Facebook has become a lifestyle of most people. These new information-exchanging platforms contain lots of meaningful messages including users` emotions and affiliations. Moreover, using the social media data to predict the election result and political affiliation is becoming the current trend in Taiwan. For example, politicians try to win the election and predict the polls by means of Internet and the social media, and every political parties also have their own fan pages. In this thesis, we make an effort to predict the political inclinations of the posts of fan pages, especially for KMT and DPP which are the two largest political parties in Taiwan. We filter the appropriate literal and interactive features. We use the posts of the two parties to predict the political inclinations by constructing the classification models .In the end, we compare the performances of different classifiers .The result shows that the literal and interactive features work the best with KNN classifier, whose accuracy and F1-score are 0.908 and 0.827, respectively.參考文獻 [1] D. Gayo-Avello, P. T. Metaxas and E. Mustafaraj, “Limits of Electoral Predictions using Twitter,” Proceedings of the International Conference on Weblogs and Social Media (ICWSM’11), 2011.[2] A. Boutet, H. Kim, and E. Yoneki, “What’s in Your Tweets? I Know Who You Supported in the UK 2010 General Election,” Proceedings of the International Conference on Weblogs and Social Media (ICWSM’12), 2012.[3] 結合長詞優先與序列標記之中文斷詞研究 林千翔∗、張嘉惠*、陳貞伶∗ Computational Linguistics and Chinese Language Processing Vol. 15, No. 3-4, September/December 2010, pp. 161-180[4] Chen, K.J. & Ming-Hong Bai, "Unknown Word Detection for Chinese by a Corpus-based Learning Method," International Journal of Computational linguistics and Chinese Language Processing, 1998, Vol.3, #1, pages 27-44 [PS][5]Chen, Keh-Jiann, and Wei-Yun Ma. "Unknown word extraction for Chinese documents." Proceedings of the 19th international conference on Computational linguistics-Volume 1. Association for Computational Linguistics, 2002.[6]Ma, Wei-Yun, and Keh-Jiann Chen. "A bottom-up merging algorithm for Chinese unknown word extraction." Proceedings of the second SIGHAN workshop on Chinese language processing-Volume 17. Association for Computational Linguistics, 2003.[7] B. O’Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith, “From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series,” Proceedings of the International Conference on Weblogs and Social Media (ICWSM’10), 2010.[8] A. Tumasjan, T. O. Sprenger, P. G. Sandner and I. M. Welpe, “Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment,” Proceedings of the International Conference on Weblogs and Social Media (ICWSM’10), 2010. [9] M. D. Conver, B. Goncalves, J. Ratkiweicz, A. Flammini, F. Menczer, “Predicting the Political Alignment of Twitter Users,” Proceedings of the IEEE Conference on Social Computing (SocialCom’11), 2011.[10] Clay Fink, Nathan Bos, Alexander Perrone, Edwina Liu, and Jonathon Kopcky, “Twitter, Public Opinion, and the 2011 Nigerian Presidential Election,” Proceedings of the IEEE Conference on Social Computing (SocialCom’13), 2013.[11] A. Makazhanov and D. Rafiel, “Predicting Political Preference of Twitter Users,” Proceedings of the International Conference on Advances in Social Network Analysis and Mining (ASONAM’13), 2013.[12] S. O’Banion and L. Birnbaum, “Using Explicit Linguistic Expressions of Preference in Social Media to Predict Voting Behavior,” Proceedings of the International Conference on Advances in Social Network Analysis and Mining (ASONAM’13), 2013.[13] Marco Pennacchiotti, Ana-Maria Popescu,” Democrats, Republicans and Starbucks Afficionados: User Classification in Twitter,” Proceedings of the 17th SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’11), 2011.[14] Tumitan, Diego, and Kurt Becker. "Sentiment-based features for predicting election polls: a case study on the brazilian scenario." Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on. Vol. 2. IEEE, 2014.[15] Z. Dong and Q. Dong, “HowNet and the Computation of Meaning,” World Scientific Publishing Co., Inc., River Edge, NJ, 2006.[16] Wu, Xindong, et al. "Top 10 algorithms in data mining." Knowledge and Information Systems 14.1 (2008): 1-37.[17] L. W. Ku and H. H. Chen, "Mining Opinions from the Web: Beyond Relevance Retrieval," Journal of American Society for Information Science and Technology, Special Issue on Mining Web Resources for Enhancing Information Retrieval, 2007, Volume 58 Issue 12, pp.1838-1850.[18] 江家榕,以社群媒體為考量之選民政治傾向探索,政治大學論文,2015[19] 陳慧潔,國小高年級學童臉書使用行為,臉書成癮與人際溝通能力相關研究,中華大學碩士論文 2013[20] 林育珊,科技接受模式對學生使用社群媒體輔助學習的行為意圖之研究,高雄師範大學碩士論文,2015[21] 陳冰淳,Web2.0時代影響社群媒體新聞資訊信任的心理因素——以微博為例,台灣大學碩士論文,2015[22] 維基百科https://zh.wikipedia.org/wiki/Wikipedia:%E9%A6%96%E9%A1%B5[23] 中央研究院中文斷詞系統,http://ckipsvr.iis.sinica.edu.tw/[2011/11/12][24] 陳克健, 黃淑齡, 施悅音, 和陳怡君, “多層次概念定義與複雜關係表達-繁體字知網的新增架構,” 漢語詞彙語義研究的現狀與發展趨勢國際學術研討會, 2004.[25]Weaver, Jesse, and Paul Tarjan. "Facebook linked data via the graph API." Semantic Web 4.3 (2013): 245-250.[26] 黃羿綺,政治人物之社交網路建置與分析,政治大學論文,2015[27]Loureiro, Antonio, Luis Torgo, and Carlos Soares. "Outlier detection using clustering methods: a data cleaning application." Proceedings of KDNet Symposium on Knowledge-based Systems for the Public Sector. Bonn, Germany. 2004.[28]Lewis, David D. "Naive (Bayes) at forty: The independence assumption in information retrieval." Machine learning: ECML-98. Springer Berlin Heidelberg, 1998. [29]Zhang, Min-Ling, and Zhi-Hua Zhou. "ML-KNN: A lazy learning approach to multi-label learning." Pattern recognition 40.7 (2007): 2038-2048.[30]Joachims, Thorsten. Making large scale SVM learning practical. Universität Dortmund, 1999.[31]Safavian, S. Rasoul, and David Landgrebe. "A survey of decision tree classifier methodology." (1990).[32]Rätsch, Gunnar, Takashi Onoda, and K-R. Müller. "Soft margins for AdaBoost." Machine learning 42.3 (2001): 287-320. 描述 碩士
國立政治大學
資訊科學學系
103753002資料來源 http://thesis.lib.nccu.edu.tw/record/#G0103753002 資料類型 thesis dc.contributor.advisor 徐國偉 zh_TW dc.contributor.advisor Hsu, Kuo Wei en_US dc.contributor.author (作者) 張哲嘉 zh_TW dc.contributor.author (作者) Chang, Che Chia en_US dc.creator (作者) 張哲嘉 zh_TW dc.creator (作者) Chang, Che Chia en_US dc.date (日期) 2016 en_US dc.date.accessioned 1-六月-2016 13:53:37 (UTC+8) - dc.date.available 1-六月-2016 13:53:37 (UTC+8) - dc.date.issued (上傳時間) 1-六月-2016 13:53:37 (UTC+8) - dc.identifier (其他 識別碼) G0103753002 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/97113 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學學系 zh_TW dc.description (描述) 103753002 zh_TW dc.description.abstract (摘要) 近年來社群媒體興起,尤其以臉書為主。在台灣超過1500萬個臉書用戶,其遍及族群從公眾人物到一般民眾。此外,這類的新興資訊交流平台其實內含許多有意義的資訊,每一則貼文都隱含著每個使用者的情緒以及立場傾向。然而,利用社群媒體來預測選舉與使用者政治傾向已成為目前的趨勢,在台灣各政黨與政治人物紛紛成立粉絲專頁,投入利用網路與社群媒體來打選戰與預測民調。本研究發現此一特性,致力於預測粉絲專頁貼文之政治傾向,收集台灣兩大政黨派國民黨與民進黨之粉絲專頁貼文,建立兩種預測模型分別為以相異字為特徵模型與文字互動特徵模型。利用資料探勘之相關技術,以貼文所含藍綠政黨特徵表現建立分類器,並細部探討與設計多種特徵組合,比較不同特徵組合之預測效果與影響因素以及在預測資料不平衡的情況下是否影響分類結果。最後,研究結果顯示使用文字特徵中黨派典型字與互動特徵值域取對數並搭配KNN分類器效果最佳,其準確度可達0.908,F1-score可達0.827。 zh_TW dc.description.abstract (摘要) Recently, the social media is becoming more and more popular, especially Facebook. In Taiwan, there are 15 million Facebook users from celebrities to the general public. Receiving information every day from Facebook has become a lifestyle of most people. These new information-exchanging platforms contain lots of meaningful messages including users` emotions and affiliations. Moreover, using the social media data to predict the election result and political affiliation is becoming the current trend in Taiwan. For example, politicians try to win the election and predict the polls by means of Internet and the social media, and every political parties also have their own fan pages. In this thesis, we make an effort to predict the political inclinations of the posts of fan pages, especially for KMT and DPP which are the two largest political parties in Taiwan. We filter the appropriate literal and interactive features. We use the posts of the two parties to predict the political inclinations by constructing the classification models .In the end, we compare the performances of different classifiers .The result shows that the literal and interactive features work the best with KNN classifier, whose accuracy and F1-score are 0.908 and 0.827, respectively. en_US dc.description.tableofcontents 第一章 緒論 11.1研究動機與目的 11.2研究對象 31.3研究貢獻 51.4論文架構 5第二章 文獻探討 72.1社群媒體之相關研究 72.1.1臉書與Graph API相關研究 92.2中文斷詞器CKIP之相關研究 112.3社群媒體與政治傾向預測之相關研究 12第三章 研究方法 163.1系統架構 163.2藍綠政黨代表粉絲專頁 183.3資料前處理 213.3.1 中文斷詞 223.3.2去除停止詞 223.3.3 限制詞字數 223.4資料政治傾向 233.5以相異詞為特徵之預測方法 233.5.1 TF權重法 243.5.2 TF-IDF權重法 263.5.3 BTO權重法 273.6以文字與互動為特徵之方法 293.6.1 文字特徵擷取 293.6.2 互動特徵擷取 313.6.2.1互動特徵轉型 313.6.2.2互動特徵擷取 34第四章 實驗方法與驗證 384.1實驗資料 384.1.1 儲存格式 384.1.2 實驗資料量 394.2實驗環境與評估指標 414.2.1實驗環境配置 414.2.2實驗評估指標 414.3實驗分類器與交叉驗證 424.4以相異詞為特徵方法之實驗 444.4.1以相異詞為特徵實驗input data 444.4.2以相異詞為特徵實驗結果 454.5以文字與互動特徵方法之實驗 494.5.1黨派典型字個數實驗 504.5.2互動特徵值域分布 514.5.3貼文數平衡實驗 554.5.4特徵組合實驗 574.5.5黨派指標人物訓練模型實驗 654.6實驗結果比較 684.7楊秋興個案討論 69第五章 結論與未來展望 745.1結論 745.2未來展望 75參考文獻 76附錄 Rapidminer操作簡介 80 zh_TW dc.format.extent 3446664 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0103753002 en_US dc.subject (關鍵詞) 政治傾向 zh_TW dc.subject (關鍵詞) 分類 zh_TW dc.subject (關鍵詞) 臉書 zh_TW dc.subject (關鍵詞) 文字探勘 zh_TW dc.subject (關鍵詞) political affiliation en_US dc.subject (關鍵詞) classification en_US dc.subject (關鍵詞) facebook en_US dc.subject (關鍵詞) text mining en_US dc.title (題名) 針對臉書粉絲專頁貼文之政治傾向預測 zh_TW dc.title (題名) Predicting Political Affiliation for Posts on Facebook Fan Pages en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) [1] D. Gayo-Avello, P. T. Metaxas and E. Mustafaraj, “Limits of Electoral Predictions using Twitter,” Proceedings of the International Conference on Weblogs and Social Media (ICWSM’11), 2011.[2] A. Boutet, H. Kim, and E. Yoneki, “What’s in Your Tweets? I Know Who You Supported in the UK 2010 General Election,” Proceedings of the International Conference on Weblogs and Social Media (ICWSM’12), 2012.[3] 結合長詞優先與序列標記之中文斷詞研究 林千翔∗、張嘉惠*、陳貞伶∗ Computational Linguistics and Chinese Language Processing Vol. 15, No. 3-4, September/December 2010, pp. 161-180[4] Chen, K.J. & Ming-Hong Bai, "Unknown Word Detection for Chinese by a Corpus-based Learning Method," International Journal of Computational linguistics and Chinese Language Processing, 1998, Vol.3, #1, pages 27-44 [PS][5]Chen, Keh-Jiann, and Wei-Yun Ma. "Unknown word extraction for Chinese documents." Proceedings of the 19th international conference on Computational linguistics-Volume 1. Association for Computational Linguistics, 2002.[6]Ma, Wei-Yun, and Keh-Jiann Chen. "A bottom-up merging algorithm for Chinese unknown word extraction." Proceedings of the second SIGHAN workshop on Chinese language processing-Volume 17. Association for Computational Linguistics, 2003.[7] B. O’Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith, “From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series,” Proceedings of the International Conference on Weblogs and Social Media (ICWSM’10), 2010.[8] A. Tumasjan, T. O. Sprenger, P. G. Sandner and I. M. Welpe, “Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment,” Proceedings of the International Conference on Weblogs and Social Media (ICWSM’10), 2010. [9] M. D. Conver, B. Goncalves, J. Ratkiweicz, A. Flammini, F. Menczer, “Predicting the Political Alignment of Twitter Users,” Proceedings of the IEEE Conference on Social Computing (SocialCom’11), 2011.[10] Clay Fink, Nathan Bos, Alexander Perrone, Edwina Liu, and Jonathon Kopcky, “Twitter, Public Opinion, and the 2011 Nigerian Presidential Election,” Proceedings of the IEEE Conference on Social Computing (SocialCom’13), 2013.[11] A. Makazhanov and D. Rafiel, “Predicting Political Preference of Twitter Users,” Proceedings of the International Conference on Advances in Social Network Analysis and Mining (ASONAM’13), 2013.[12] S. O’Banion and L. Birnbaum, “Using Explicit Linguistic Expressions of Preference in Social Media to Predict Voting Behavior,” Proceedings of the International Conference on Advances in Social Network Analysis and Mining (ASONAM’13), 2013.[13] Marco Pennacchiotti, Ana-Maria Popescu,” Democrats, Republicans and Starbucks Afficionados: User Classification in Twitter,” Proceedings of the 17th SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’11), 2011.[14] Tumitan, Diego, and Kurt Becker. "Sentiment-based features for predicting election polls: a case study on the brazilian scenario." Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on. Vol. 2. IEEE, 2014.[15] Z. Dong and Q. Dong, “HowNet and the Computation of Meaning,” World Scientific Publishing Co., Inc., River Edge, NJ, 2006.[16] Wu, Xindong, et al. "Top 10 algorithms in data mining." Knowledge and Information Systems 14.1 (2008): 1-37.[17] L. W. Ku and H. H. Chen, "Mining Opinions from the Web: Beyond Relevance Retrieval," Journal of American Society for Information Science and Technology, Special Issue on Mining Web Resources for Enhancing Information Retrieval, 2007, Volume 58 Issue 12, pp.1838-1850.[18] 江家榕,以社群媒體為考量之選民政治傾向探索,政治大學論文,2015[19] 陳慧潔,國小高年級學童臉書使用行為,臉書成癮與人際溝通能力相關研究,中華大學碩士論文 2013[20] 林育珊,科技接受模式對學生使用社群媒體輔助學習的行為意圖之研究,高雄師範大學碩士論文,2015[21] 陳冰淳,Web2.0時代影響社群媒體新聞資訊信任的心理因素——以微博為例,台灣大學碩士論文,2015[22] 維基百科https://zh.wikipedia.org/wiki/Wikipedia:%E9%A6%96%E9%A1%B5[23] 中央研究院中文斷詞系統,http://ckipsvr.iis.sinica.edu.tw/[2011/11/12][24] 陳克健, 黃淑齡, 施悅音, 和陳怡君, “多層次概念定義與複雜關係表達-繁體字知網的新增架構,” 漢語詞彙語義研究的現狀與發展趨勢國際學術研討會, 2004.[25]Weaver, Jesse, and Paul Tarjan. "Facebook linked data via the graph API." Semantic Web 4.3 (2013): 245-250.[26] 黃羿綺,政治人物之社交網路建置與分析,政治大學論文,2015[27]Loureiro, Antonio, Luis Torgo, and Carlos Soares. "Outlier detection using clustering methods: a data cleaning application." Proceedings of KDNet Symposium on Knowledge-based Systems for the Public Sector. Bonn, Germany. 2004.[28]Lewis, David D. "Naive (Bayes) at forty: The independence assumption in information retrieval." Machine learning: ECML-98. Springer Berlin Heidelberg, 1998. [29]Zhang, Min-Ling, and Zhi-Hua Zhou. "ML-KNN: A lazy learning approach to multi-label learning." Pattern recognition 40.7 (2007): 2038-2048.[30]Joachims, Thorsten. Making large scale SVM learning practical. Universität Dortmund, 1999.[31]Safavian, S. Rasoul, and David Landgrebe. "A survey of decision tree classifier methodology." (1990).[32]Rätsch, Gunnar, Takashi Onoda, and K-R. Müller. "Soft margins for AdaBoost." Machine learning 42.3 (2001): 287-320. zh_TW