Publications-Theses
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 新聞輿情與民意偵測追蹤之研究-大資料之研究取向
A Study of News Sentiment & Public Opinion Detection and Tracking-A Big Data Research Approach作者 鄒函升 貢獻者 楊建民
鄒函升關鍵詞 文字探勘
意見探勘
事件偵測追蹤
民意
大資料
Text Mining
Opinion Mining
Events Detection and Tracking
Public Opinion
Big Data日期 2013 上傳時間 29-Jul-2014 16:04:09 (UTC+8) 摘要 隨著人們習慣的改變,從網路上獲取新知漸漸取代傳統媒體,網路新聞比起傳統新聞有著即時且大量的特性,然而面對快速又大量的新聞訊息,人們更加難以去整理吸收。此外,新聞是經過媒體驗證和包裝過的社會輿論,其客觀地闡述事件的發生與經過,亦可以藉由新聞投射出民情民意。因此,要如何在大量的資料中有效且正確地找到想要的資訊是很重要的議題,但更重要的是如何在這些大資料(Big Data)中,發現、解決問題、甚至預測未來。本研究在龐大的資訊海中,除了運用新聞偵測追蹤技術幫助使用者更有效的尋找到資訊之外,更將在這大量新聞中利用意見探勘技術分析新聞事件之輿情,了解社會情緒氣候樂觀或悲觀。在研究過程撰寫爬蟲程式自動蒐集中央新聞社2013年6月10日至2014年5月6日共14,729篇的政治類新聞,運用Single-pass Clustering加時間概念進行新聞偵測、kNN分類法進行新聞追蹤,將結果群集再次利用k-means做第二次分群,以提高事件品質,最後利用意見探勘技術進行輿情分析。在研究結果中,我們將結果的新聞事件群集結果與民間的民意調查資料互相比較。其中負面的新聞事件對照TVBS民意調查中心的資料,可以發現在事件輿情與熱門區間皆有一定相關性。此外,也發現負面的新聞事件大約都持續四週左右,可以在事件爆發時,做好相關的規劃措施,避免社會情緒持續低落。在整體新聞輿情方面,利用整體新聞輿情趨勢,對照台灣指標民調公司發布的行政院長不滿意趨勢,發現有高於七成的相關性。從研究結果可看出能有效的反映出社會民情。本研究在資料科學(Data Science)的現今中,提出一種即時且省資源的觀察新聞事件輿情與社會氣候方式。在未來希望加入不同新聞媒體或更多元的意見來源(社群網站、部落格),來更真實直接反映出社會輿情,或可成為一種新的洞察民情之方式。
Recently, acquiring knowledge and current events from the Internet is gradually replacing traditional media. However, It is more difficult for people to organize and absorb because of the huge amount of news information. In addition, the news is the social conditions that verified and packaged through the media. It implies the public sentiment and public opinions. Therefore, how to effectively and accurately find the information in a large amount of data is a important issue. More importantly, founding & solving problem and even predicting the future is significant issue in this current. In this study, in addition to the use of detection and tracking technique to find the information more effectively, we also apply opinion mining to analyze news sentiment to understand about the optimistic or pessimistic social conditions. In this study, we write a program to collect the political news automatically from The Central News Agency. And then applying event detection and tracking algorithm for classification and opinion mining for sentiment analysis. In the conclusions, we take public opinion polls to valid our results, founding between the news sentiment and public opinion polls exist a certain relevance. Besides, it found that all the negative news lasts about four weeks at peak periods. Overall news sentiment trends have the exceeding seventy percent correlation with the dissatisfaction index of Premier. The results can be effectively reflected the public opinion.In the data science of current, we propose a real-time and resource saving way to observe the news events and society. In the future, we will plan to add various media sources to reflect directly the real public opinion and even become a new way to insight into the public opinion.參考文獻 Allan, J. (2002). Topic detection and tracking: event-based information organization (Vol. 12): Springer.Allan, J., Carbonell, J. G., Doddington, G., Yamron, J., & Yang, Y. (1998). Topic detection and tracking pilot study final report. Allan, J., Papka, R., & Lavrenko, V. (1998). On-line new event detection and tracking. Paper presented at the Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, Melbourne, Australia. Asur, S., & Huberman, B. A. (2010). Predicting the future with social media. Paper presented at the Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on.Collett, S. (2011). Why Big Data is a big deal. ComputerWorld. Dave, K., Lawrence, S., & Pennock, D. M. (2003). Mining the peanut gallery: opinion extraction and semantic classification of product reviews. Paper presented at the Proceedings of the 12th international conference on World Wide Web, Budapest, Hungary. Fan, W., & Bifet, A. (2013). Mining big data: current status, and forecast to the future. ACM SIGKDD Explorations Newsletter, 14(2), 1-5. Feldman, R. (2013). Techniques and applications for sentiment analysis. Commun. ACM, 56(4), 82-89. doi: 10.1145/2436256.2436274Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457(7232), 1012-1014. Hatzivassiloglou, V., & McKeown, K. R. (1997). Predicting the semantic orientation of adjectives. Paper presented at the Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics.Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. Paper presented at the Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.Jones, M. T. (2013). Data science and open source. from http://www.ibm.com/developerworks/opensource/library/os-datascience/os-datascience-pdf.pdfKim, K.-j., & Han, I. (2000). Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index. Expert systems with applications, 19(2), 125-132. Kim, S.-M., & Hovy, E. H. (2007). Crystal: Analyzing Predictive Opinions on the Web. Paper presented at the EMNLP-CoNLL.Ku, L.-W. (2000). A study on the multilingual topic detection of news articles. (Master Dissertation), National Taiwan University Department of Computer Science and Information Engineering. Ku, L. W., & Chen, H. H. (2007). Mining opinions from the Web: Beyond relevance retrieval. Journal of the American Society for Information Science and Technology, 58(12), 1838-1850. Ku, L. W., Ho, H. W., & Chen, H. H. (2009). Opinion mining and relationship discovery using CopeOpi opinion analysis system. Journal of the American Society for Information Science and Technology, 60(7), 1486-1503. Leinweber, D. J. (2007). Stupid data miner tricks: overfitting the S&P 500. The Journal of Investing, 16(1), 15-22. Li, N., & Wu, D. D. (2010). Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decision Support Systems, 48(2), 354-368. Liu, B. (2012). Sentiment Analysis and Opinion Mining: Morgan & Claypool Publishers.McGlohon, M., Glance, N. S., & Reiter, Z. (2010). Star Quality: Aggregating Reviews to Rank Products and Merchants. Paper presented at the ICWSM.Morinaga, S., Yamanishi, K., Tateishi, K., & Fukushima, T. (2002). Mining product reputations on the Web. Paper presented at the Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, Edmonton, Alberta, Canada. Nasukawa, T., & Yi, J. (2003). Sentiment analysis: capturing favorability using natural language processing. Paper presented at the Proceedings of the 2nd international conference on Knowledge capture, Sanibel Island, FL, USA. O`Connor, B., Balasubramanyan, R., Routledge, B. R., & Smith, N. A. (2010). From tweets to polls: Linking text sentiment to public opinion time series. ICWSM, 11, 122-129. Taleb, N. (2012). Anti-fragile: How to Live in a World We Don`t Understand: Allen Lane.Turney, P. D. (2002). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. Paper presented at the Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, Pennsylvania. Wiebe, J., Wilson, T., Bruce, R., Bell, M., & Martin, M. (2004). Learning subjective language. Computational linguistics, 30(3), 277-308. Wiebe, J. M. (1994). Tracking point of view in narrative. Comput. Linguist., 20(2), 233-287. Yang, Y., Ault, T., Pierce, T., & Lattimer, C. W. (2000). Improving text categorization methods for event tracking. Paper presented at the Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, Athens, Greece. Yessenov, K., & Misailovic, S. (2009). Sentiment analysis of movie review comments. Methodology, 1-17. Zhang, W., & Skiena, S. (2010). Trading Strategies to Exploit Blog and News Sentiment. Paper presented at the ICWSM.呂紹玉. (2013). 為什麼 NAVER 砸 3 億買台灣團隊 Gogolook?《TO》專訪創辦人郭建甫談 Gogolook 的專注與優勢. from http://techorange.com/2013/12/10/naver-purchased-taiwans-startup-gogolook/李啟菁. (2010). 中文部落格文章之意見分析. (碩士論文), 國立台北科技大學. 陳立. (2010). 中文情感語意自動分類之研究. (碩士論文), 國立臺灣師範大學 麥爾荀伯格, & 庫基耶. (2013). 大數據: 天下文化.黃純敏, & 詹雅筑. (2013). 透過新聞域加權提升潛在語意分析分群之品質. Paper presented at the 第九屆知識社群國際研討會, 台北市. 楊昌樺, 高虹安, & 陳信希. (2007). 以部落格語料進行情緒趨勢分析. Paper presented at the 第十九屆自然語言與語音處理研討會, 台中縣. 楊意菁. (2005). 民調報導的媒體論述與民意建構. 中華傳播學刊(7), 183-226. 趙品銜. (2010). 以部落格文章中旅遊景點為對象之意見目標辨識之研究. (碩士論文), 國立台灣海洋大學. 劉吉軒, & 吳建良. (2007). 以情緒為中心之情境資訊觀察與評估. Paper presented at the NCS全國計算機會議. 歐智民. (2011). 整合文件探勘與類神經網路預測模型之研究-以財經事件線索預測台灣股市為例. (碩士論文), 國立政治大學. 描述 碩士
國立政治大學
資訊管理研究所
101356032
102資料來源 http://thesis.lib.nccu.edu.tw/record/#G0101356032 資料類型 thesis dc.contributor.advisor 楊建民 zh_TW dc.contributor.author (Authors) 鄒函升 zh_TW dc.creator (作者) 鄒函升 zh_TW dc.date (日期) 2013 en_US dc.date.accessioned 29-Jul-2014 16:04:09 (UTC+8) - dc.date.available 29-Jul-2014 16:04:09 (UTC+8) - dc.date.issued (上傳時間) 29-Jul-2014 16:04:09 (UTC+8) - dc.identifier (Other Identifiers) G0101356032 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/67865 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊管理研究所 zh_TW dc.description (描述) 101356032 zh_TW dc.description (描述) 102 zh_TW dc.description.abstract (摘要) 隨著人們習慣的改變,從網路上獲取新知漸漸取代傳統媒體,網路新聞比起傳統新聞有著即時且大量的特性,然而面對快速又大量的新聞訊息,人們更加難以去整理吸收。此外,新聞是經過媒體驗證和包裝過的社會輿論,其客觀地闡述事件的發生與經過,亦可以藉由新聞投射出民情民意。因此,要如何在大量的資料中有效且正確地找到想要的資訊是很重要的議題,但更重要的是如何在這些大資料(Big Data)中,發現、解決問題、甚至預測未來。本研究在龐大的資訊海中,除了運用新聞偵測追蹤技術幫助使用者更有效的尋找到資訊之外,更將在這大量新聞中利用意見探勘技術分析新聞事件之輿情,了解社會情緒氣候樂觀或悲觀。在研究過程撰寫爬蟲程式自動蒐集中央新聞社2013年6月10日至2014年5月6日共14,729篇的政治類新聞,運用Single-pass Clustering加時間概念進行新聞偵測、kNN分類法進行新聞追蹤,將結果群集再次利用k-means做第二次分群,以提高事件品質,最後利用意見探勘技術進行輿情分析。在研究結果中,我們將結果的新聞事件群集結果與民間的民意調查資料互相比較。其中負面的新聞事件對照TVBS民意調查中心的資料,可以發現在事件輿情與熱門區間皆有一定相關性。此外,也發現負面的新聞事件大約都持續四週左右,可以在事件爆發時,做好相關的規劃措施,避免社會情緒持續低落。在整體新聞輿情方面,利用整體新聞輿情趨勢,對照台灣指標民調公司發布的行政院長不滿意趨勢,發現有高於七成的相關性。從研究結果可看出能有效的反映出社會民情。本研究在資料科學(Data Science)的現今中,提出一種即時且省資源的觀察新聞事件輿情與社會氣候方式。在未來希望加入不同新聞媒體或更多元的意見來源(社群網站、部落格),來更真實直接反映出社會輿情,或可成為一種新的洞察民情之方式。 zh_TW dc.description.abstract (摘要) Recently, acquiring knowledge and current events from the Internet is gradually replacing traditional media. However, It is more difficult for people to organize and absorb because of the huge amount of news information. In addition, the news is the social conditions that verified and packaged through the media. It implies the public sentiment and public opinions. Therefore, how to effectively and accurately find the information in a large amount of data is a important issue. More importantly, founding & solving problem and even predicting the future is significant issue in this current. In this study, in addition to the use of detection and tracking technique to find the information more effectively, we also apply opinion mining to analyze news sentiment to understand about the optimistic or pessimistic social conditions. In this study, we write a program to collect the political news automatically from The Central News Agency. And then applying event detection and tracking algorithm for classification and opinion mining for sentiment analysis. In the conclusions, we take public opinion polls to valid our results, founding between the news sentiment and public opinion polls exist a certain relevance. Besides, it found that all the negative news lasts about four weeks at peak periods. Overall news sentiment trends have the exceeding seventy percent correlation with the dissatisfaction index of Premier. The results can be effectively reflected the public opinion.In the data science of current, we propose a real-time and resource saving way to observe the news events and society. In the future, we will plan to add various media sources to reflect directly the real public opinion and even become a new way to insight into the public opinion. en_US dc.description.tableofcontents 第一章 緒論 11.1 研究背景與動機 11.2 研究目的 2第二章 文獻探討 32.1 資料科學 32.1.1 資料科學概述 32.1.2 資料科學應用 42.2 新聞事件偵測與追蹤 52.2.1 事件偵測 62.2.2 事件追蹤 72.3 意見探勘與相關應用 72.3.1 意見探勘 82.3.2 意見詞彙獲取 92.3.3 意見探勘與輿情分析 112.4 小結 12第三章 研究方法 143.1 研究架構 143.2 研究設計 153.2.1 研究資料來源 153.2.2 資料前處理模組 163.2.3 新聞偵測與追蹤模組 173.2.4 分群結果評估 203.2.5 事件偵測追蹤之參數選擇 213.2.6 第二階段分群 233.2.7 意見辭典 243.2.8 意見萃取 243.2.9 文件極性計算 253.2.10 群集結果與輿情分析 26第四章 研究結果 274.1 事件輿情與民意關聯 274.2 輿情趨勢關聯 61第五章 結論與未來展望 655.1 結論與建議 655.2 未來研究方向與建議 66參考文獻 68附錄一 中研院平衡語料庫詞類標記集 71附錄二 否定辭庫 73 zh_TW dc.format.extent 2694311 bytes - dc.format.mimetype application/pdf - dc.language.iso en_US - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0101356032 en_US dc.subject (關鍵詞) 文字探勘 zh_TW dc.subject (關鍵詞) 意見探勘 zh_TW dc.subject (關鍵詞) 事件偵測追蹤 zh_TW dc.subject (關鍵詞) 民意 zh_TW dc.subject (關鍵詞) 大資料 zh_TW dc.subject (關鍵詞) Text Mining en_US dc.subject (關鍵詞) Opinion Mining en_US dc.subject (關鍵詞) Events Detection and Tracking en_US dc.subject (關鍵詞) Public Opinion en_US dc.subject (關鍵詞) Big Data en_US dc.title (題名) 新聞輿情與民意偵測追蹤之研究-大資料之研究取向 zh_TW dc.title (題名) A Study of News Sentiment & Public Opinion Detection and Tracking-A Big Data Research Approach en_US dc.type (資料類型) thesis en dc.relation.reference (參考文獻) Allan, J. (2002). Topic detection and tracking: event-based information organization (Vol. 12): Springer.Allan, J., Carbonell, J. G., Doddington, G., Yamron, J., & Yang, Y. (1998). Topic detection and tracking pilot study final report. Allan, J., Papka, R., & Lavrenko, V. (1998). On-line new event detection and tracking. Paper presented at the Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, Melbourne, Australia. Asur, S., & Huberman, B. A. (2010). Predicting the future with social media. Paper presented at the Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on.Collett, S. (2011). Why Big Data is a big deal. ComputerWorld. Dave, K., Lawrence, S., & Pennock, D. M. (2003). Mining the peanut gallery: opinion extraction and semantic classification of product reviews. Paper presented at the Proceedings of the 12th international conference on World Wide Web, Budapest, Hungary. Fan, W., & Bifet, A. (2013). Mining big data: current status, and forecast to the future. ACM SIGKDD Explorations Newsletter, 14(2), 1-5. Feldman, R. (2013). Techniques and applications for sentiment analysis. Commun. ACM, 56(4), 82-89. doi: 10.1145/2436256.2436274Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457(7232), 1012-1014. Hatzivassiloglou, V., & McKeown, K. R. (1997). Predicting the semantic orientation of adjectives. Paper presented at the Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics.Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. Paper presented at the Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.Jones, M. T. (2013). Data science and open source. from http://www.ibm.com/developerworks/opensource/library/os-datascience/os-datascience-pdf.pdfKim, K.-j., & Han, I. (2000). Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index. Expert systems with applications, 19(2), 125-132. Kim, S.-M., & Hovy, E. H. (2007). Crystal: Analyzing Predictive Opinions on the Web. Paper presented at the EMNLP-CoNLL.Ku, L.-W. (2000). A study on the multilingual topic detection of news articles. (Master Dissertation), National Taiwan University Department of Computer Science and Information Engineering. Ku, L. W., & Chen, H. H. (2007). Mining opinions from the Web: Beyond relevance retrieval. Journal of the American Society for Information Science and Technology, 58(12), 1838-1850. Ku, L. W., Ho, H. W., & Chen, H. H. (2009). Opinion mining and relationship discovery using CopeOpi opinion analysis system. Journal of the American Society for Information Science and Technology, 60(7), 1486-1503. Leinweber, D. J. (2007). Stupid data miner tricks: overfitting the S&P 500. The Journal of Investing, 16(1), 15-22. Li, N., & Wu, D. D. (2010). Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decision Support Systems, 48(2), 354-368. Liu, B. (2012). Sentiment Analysis and Opinion Mining: Morgan & Claypool Publishers.McGlohon, M., Glance, N. S., & Reiter, Z. (2010). Star Quality: Aggregating Reviews to Rank Products and Merchants. Paper presented at the ICWSM.Morinaga, S., Yamanishi, K., Tateishi, K., & Fukushima, T. (2002). Mining product reputations on the Web. Paper presented at the Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, Edmonton, Alberta, Canada. Nasukawa, T., & Yi, J. (2003). Sentiment analysis: capturing favorability using natural language processing. Paper presented at the Proceedings of the 2nd international conference on Knowledge capture, Sanibel Island, FL, USA. O`Connor, B., Balasubramanyan, R., Routledge, B. R., & Smith, N. A. (2010). From tweets to polls: Linking text sentiment to public opinion time series. ICWSM, 11, 122-129. Taleb, N. (2012). Anti-fragile: How to Live in a World We Don`t Understand: Allen Lane.Turney, P. D. (2002). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. Paper presented at the Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, Pennsylvania. Wiebe, J., Wilson, T., Bruce, R., Bell, M., & Martin, M. (2004). Learning subjective language. Computational linguistics, 30(3), 277-308. Wiebe, J. M. (1994). Tracking point of view in narrative. Comput. Linguist., 20(2), 233-287. Yang, Y., Ault, T., Pierce, T., & Lattimer, C. W. (2000). Improving text categorization methods for event tracking. Paper presented at the Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, Athens, Greece. Yessenov, K., & Misailovic, S. (2009). Sentiment analysis of movie review comments. Methodology, 1-17. Zhang, W., & Skiena, S. (2010). Trading Strategies to Exploit Blog and News Sentiment. Paper presented at the ICWSM.呂紹玉. (2013). 為什麼 NAVER 砸 3 億買台灣團隊 Gogolook?《TO》專訪創辦人郭建甫談 Gogolook 的專注與優勢. from http://techorange.com/2013/12/10/naver-purchased-taiwans-startup-gogolook/李啟菁. (2010). 中文部落格文章之意見分析. (碩士論文), 國立台北科技大學. 陳立. (2010). 中文情感語意自動分類之研究. (碩士論文), 國立臺灣師範大學 麥爾荀伯格, & 庫基耶. (2013). 大數據: 天下文化.黃純敏, & 詹雅筑. (2013). 透過新聞域加權提升潛在語意分析分群之品質. Paper presented at the 第九屆知識社群國際研討會, 台北市. 楊昌樺, 高虹安, & 陳信希. (2007). 以部落格語料進行情緒趨勢分析. Paper presented at the 第十九屆自然語言與語音處理研討會, 台中縣. 楊意菁. (2005). 民調報導的媒體論述與民意建構. 中華傳播學刊(7), 183-226. 趙品銜. (2010). 以部落格文章中旅遊景點為對象之意見目標辨識之研究. (碩士論文), 國立台灣海洋大學. 劉吉軒, & 吳建良. (2007). 以情緒為中心之情境資訊觀察與評估. Paper presented at the NCS全國計算機會議. 歐智民. (2011). 整合文件探勘與類神經網路預測模型之研究-以財經事件線索預測台灣股市為例. (碩士論文), 國立政治大學. zh_TW