Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 應用情感分析於媒體新聞傾向之研究-以中央社為例
Applying sentiment analysis to the tendency of media news: a case study of central news agency
作者 吳信維
Wu, Xin-Wei
貢獻者 姜國輝
Chiang, Kuo-Huie
吳信維
Wu, Xin-Wei
關鍵詞 情感分析
LDA主題模型
n-gram
a-priori
Sentiment analysis
LDA
N-gram
A-priori
日期 2017
上傳時間 11-Jul-2017 11:29:39 (UTC+8)
摘要   本研究目的在於結合關聯規則新詞發掘演算法來擴增詞庫,並藉此提高結斷詞句的精確度以及透過非監督式情感分析方法,從中央通訊社中抓取國民黨以及民進黨的相關新聞文本,建立主題模型與情緒傾向的標注。再藉由監督式學習方法建立分類模型並驗證其成果。
  本研究藉由n-gram with a-priori algorithm來進行斷詞斷句的詞庫擴增。共有32007組詞被發掘,於這些詞中具有真正意義的詞共有28838筆,成功率可達88%。
  本研究比較兩種分群方法建立主題模型,分別為TFIDF-Kmeans以及LDA。在TFIDF-Kmeans分群結果中,因為文本數量遠大於議題詞數量,造成TFIDF矩陣過於稀疏,造成分群效果不佳。在LDA的分群結果底下,因為LDA模型其多文章多主題共享的特性,主題分類的精準度更高達八成以上。故本研究認為在分析具有多主題特性之文本,採用LDA模型來進行議題詞分群會有較佳的表現。
  本研究透過結合不同的資料時間區間,呈現出中央通訊社的新聞文本在我國近五次總統大選前後三個月間的新聞情緒傾向。同時探討各主題模型中各類別於大選前後三個月之情緒傾向變化。可以觀察到大致上文本的情感指數高峰值會出現於投票日的時候,而近三次總統大選的結果顯示,相關的政黨新聞情感值會於選舉過後趨於平緩。而從新聞文本的正負向情感統計以及以及整體情緒傾向分析可以看出,不論執政黨為何,中央通訊社的新聞對於國民黨以及民進黨皆呈現了正向且平穩的內容,大抵不會特別偏向單一政黨
  The purpose of this research is to combine association rules and new word mining algorithms to expand the lexicons so as to improve the accuracy of word segmentations, and by capturing the KMT and DPP news from the Central News Agency, it establishes the theme model and sentiment orientation through the unsupervised sentiment analysis method. Finally, by means of supervised learning methods, this research establishes classifications models and verifies its results.
  This research uses n-gram with a-priori algorithm to segment words and sentences to expand the lexicons. A total of 32007 word are found, and among them, there have 28838 words with real meaning. The success rate is up to 88%.
  In this research, we compare two different clustering methods to form the theme model, which are the TFIDF-Kmeans, and the LDA. From the results of TFIDF-Kmeans, the TFIDF matrix is too sparse, resulting in poor clustering because the number of texts is a lot larger than that of the issues. Unlike TFIDF-Kmeans, because of LDA model with more features of multi-topic sharing, the accuracy of topic classification is more than 80%. Therefore, this research suggests that it will have a better performance to analyze the multi-subjective texts with LDA model to classify the word clustering.
  Through the combination of different data time interval, this research presents the sentimental tendencies of Central News Agency’s news in three months before and after the last five presidential elections in Taiwan. At the same time, it also explores the changes of the sentimental tendencies in the various theme models in the three months before and after the election. It can be observed the sentimental peak of the text will be appeared on the polling day, and nearly three times of the presidential election results show that the sentimental value of the relevant party’s news will become smooth after the election. From the positive and negative sentimental statistics of the news text and the analysis of the overall sentimental tendencies, no matter which the ruling party is, the news of the Central News Agency for the KMT and the DPP presents a positive and stable content, not particularly toward any political party.
參考文獻 [1] 中央通訊社,(2004)。全球新聞神經大透視。台北:中央通訊社。
[2] 王正豪, & 李啟菁. (2010). 中文部落格文章之意見分析. 碩士論文, 國立台北科技大學資訊工程研究所.
[3] 李日斌. (2014). 探討臺灣網民對鄰國的情感. 中山大學資訊管理學系研究所學位論文, 1-66.
[4] 杜嘉忠、徐健、劉穎,(2014)。網絡商品評論的特徵-情感詞本體構建與情感分析方法研究,現代圖書情報技術,30(5),74-82。
[5] 林育龍. (2013). 對使用者評論之情感分析研究-以Google Play市集為例. 國立政治大學資訊管理所碩士論文
[6] 洪崇洋. (2012). LDA 和使用紀錄為基礎的線上電子書主題趨勢發掘方法. 國立中山大學資訊管理所碩士論文
[7] 張日威. (2014). 應用LDA進行Plurk主題分類及使用者情緒分析. 國立雲林科技大學資訊管理所碩士論文
[8] 許桓瑜. (2012). 長句斷詞法和遺傳演算法對新聞分類的影響. 淡江大學資訊工程學系碩士班學位論文,
[9] 陳昭元. (2016). 應用情感分析於輿情之研究-以台灣 2016. 國立政治大學資訊管理學系碩士班學位論文,
[10] 黃居仁 (2007-2009),謝舒凱 (2009-2010)。《跨語言知識表徵基礎架構─面向多語化與 全球化的語言學研究》。國科會專題補助計畫
[11] 黃居仁,謝舒凱,洪嘉馡,陳韻竹,蘇依莉,陳永祥,黃勝偉。中文詞彙網路: 跨語言知識處理基礎架構的設計理念與實踐. 中國語文,24卷第二期
[12] 黃運高, 王妍, 邱武松, 向林泓, & 趙學良. (2014). 基於 K-means 和 TF-IDF 的中文藥名聚類分析. 計算機應用, 1.
[13] 劉吉軒, & 吳建良. (2007). 以情緒為中心之情境資訊觀察與評估. Paper presented at the NCS全國計算機會議.
[14] 龔建彰. (2014). 基於新聞字詞漲跌極性之股價趨勢分類預測. 交通大學資訊管理研究所學位論文, 1-36.
[15] Agrawal, R., & Srikant, R. (1994, September). Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB (Vol. 1215, pp. 487-499).
[16] Baccianella, S., Esuli, A., & Sebastiani, F. (2010, May). SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. In LREC (Vol. 10, pp. 2200-2204).
[17] Basu, T., & Murthy, C. A. (2012, December). Effective text classification by a supervised feature selection approach. In Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on (pp. 918-925). IEEE.
[18] Basu, T., & Murthy, C. A. (2012, December). Effective text classification by a supervised feature selection approach. In Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on (pp. 918-925). IEEE.
[19] Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84.
[20] Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.
[21] Breakthrough Analysis. Retrieved 2015-02-23, from https://breakthroughanalysis.com/2008/08/01/unstructured-data-and-the-80-percent-rule/
[22] Brown, P. F., Desouza, P. V., Mercer, R. L., Pietra, V. J. D., & Lai, J. C. (1992). Class-based n-gram models of natural language. Computational linguistics, 18(4), 467-479.
[23] Changqiu, S., Xiaolong, W., & Jun, X. (2009). Study on Feature Selection in Finance Text Categorization. In Conference Proceedings-IEEE International Conference on Systems, Man and Cybernetics, Art (No. 5346030, pp. 5077-5082).
[24] Chen, K. J., & Liu, S. H. (1992, August). Word identification for Mandarin Chinese sentences. In Proceedings of the 14th conference on Computational linguistics-Volume 1 (pp. 101-107). Association for Computational Linguistics.
[25] Chen, K. J., & Ma, W. Y. (2002, August). Unknown word extraction for Chinese documents. In Proceedings of the 19th international conference on Computational linguistics-Volume 1 (pp. 1-7). Association for Computational Linguistics.
[26] Chen, K. J., & Ma, W. Y. (2002, August). Unknown word extraction for Chinese documents. In Proceedings of the 19th international conference on Computational linguistics-Volume 1 (pp. 1-7). Association for Computational Linguistics.
[27] Chu-Ren Huang and Shu-Kai Hsieh. (2010). Infrastructure for Cross-lingual Knowledge Representation ─ Towards Multilingualism in Linguistic Studies. Taiwan NSC-granted Research Project (NSC 96-2411-H-003-061-MY3)
[28] Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. John Wiley & Sons.
[29] Drucker, H., Wu, D., & Vapnik, V. N. (1999). Support vector machines for spam categorization. IEEE Transactions on Neural networks, 10(5), 1048-1054.
[30] DTREG, Retrieved February 14 2017, from https://www.dtreg.com/solution/view/20
[31] Erkan G, Özgür A, Radev D R (2007) Semi-Supervised Classification for Extracting Protein Interaction Sentences using Dependency Parsing. Proceedings of EMNLP-CoNLL 228–237.
[32] Farhadloo, M., & Rolland, E. (2013, December). Multi-class sentiment analysis with clustering and score representation. In Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on (pp. 904-912). IEEE.
[33] Feldman, R. (2013). Techniques and applications for sentiment analysis. Communications of the ACM, 56(4), 82-89.
[34] Gary King, Jennifer Pan, and Margaret E Roberts. 2013. “How Censorship in China Allows Government Criticism but Silences Collective Expression.” American Political Science Review, 2 (May), 107: 1-18.
[35] Gong, Z., & Yu, T. (2010, November). Chinese web text classification system model based on Naive Bayes. In E-Product E-Service and E-Entertainment (ICEEE), 2010 International Conference on (pp. 1-4). IEEE.
[36] Griffiths, T., & Steyvers, M. (2007). Probabilistic topic models. Handbook of latent semantic analysis, 427(7), 424-440.
[37] Hao, L., & Hao, L. (2008, December). Automatic identification of stop words in chinese text classification. In Computer Science and Software Engineering, 2008 International Conference on (Vol. 1, pp. 718-722). IEEE.
[38] Hotho, A., Nürnberger, A., & Paaß, G. (2005, May). A brief survey of text mining. In Ldv Forum (Vol. 20, No. 1, pp. 19-62).
[39] Hu, M., & Liu, B. (2004, August). Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 168-177). ACM.
[40] Joachims, T. (1998, April). Text categorization with support vector machines: Learning with many relevant features. In European conference on machine learning (pp. 137-142). Springer Berlin Heidelberg.
[41] Jonathan Hassid(2012). Safety Valve or Pressure Cooker. Journal of Communication 62: 212–230
[42] Kim, S. M., & Hovy, E. H. (2007, June). Crystal: Analyzing Predictive Opinions on the Web. In EMNLP-CoNLL (pp. 1056-1064).
[43] Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1), 1-167.
[44] Liu, H., Sun, J., Liu, L., & Zhang, H. (2009). Feature selection with dynamic mutual information. Pattern Recognition, 42(7), 1330-1339.
[45] LOPE Lab, Retrieved February 14 2017, from http://lope.linguistics.ntu.edu.tw/cwn/
[46] Lowe W. (2015) ‘Yoshikoder: Cross-platform multilingual content analysis’. Java software version 0.6.5, URL http://www.yoshikoder.org
[47] Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval (Vol. 1, No. 1, p. 496). Cambridge: Cambridge university press.
[48] Mouthami, K., Devi, K. N., & Bhaskaran, V. M. (2013, February). Sentiment analysis and classification based on textual reviews. In Information communication and embedded systems (ICICES), 2013 international conference on (pp. 271-276). IEEE.
[49] Newman, D., Asuncion, A. U., Smyth, P., & Welling, M. (2007, December). Distributed Inference for Latent Dirichlet Allocation. In NIPS (Vol. 20, pp. 1081-1088).
[50] Oelke, D., Hao, M., Rohrdantz, C., Keim, D. A., Dayal, U., Haug, L. E., & Janetzko, H. (2009, October). Visual opinion analysis of customer feedback data. In Visual Analytics Science and Technology, 2009. VAST 2009. IEEE Symposium on (pp. 187-194). IEEE.
[51] Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1-135.
[52] Pang, B., Lee, L., & Vaithyanathan, S. (2002, July). Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10 (pp. 79-86). Association for Computational Linguistics.
[53] Qamar, A. M., Gaussier, E., Chevallet, J. P., & Lim, J. H. (2008, December). Similarity learning for nearest neighbor classification. In Data Mining, 2008. ICDM`08. Eighth IEEE International Conference on (pp. 983-988). IEEE.
[54] Soliman, T. H. A., Elmasry, M. A., Hedar, A. R., & Doss, M. M. (2012, October). Utilizing support vector machines in mining online customer reviews. In Computer Theory and Applications (ICCTA), 2012 22nd International Conference on (pp. 192-197). IEEE.
[55] Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational linguistics, 37(2), 267-307.
[56] Tata, S., & Patel, J. M. (2007). Estimating the selectivity of tf-idf based cosine similarity predicates. ACM Sigmod Record, 36(2), 7-12.
[57] Turney, P. D. (2002, July). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 417-424). Association for Computational Linguistics.
[58] Turney, P. D., & Littman, M. L. (2003). Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems (TOIS), 21(4), 315-346.
[59] Valakunde, N. D., & Patwardhan, M. S. (2013, November). Multi-aspect and multi-class based document sentiment analysis of educational data catering accreditation process. In Cloud & Ubiquitous Computing & Emerging Technologies (CUBE), 2013 International Conference on (pp. 188-192). IEEE.
[60] Vapnik, V. N. (1999). An overview of statistical learning theory. IEEE transactions on neural networks, 10(5), 988-999.
[61] Wang, Y., & Huang, S. T. (2005, August). Chinese word segmentation based on A-priori and adjacent characters. In Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on (Vol. 6, pp. 3808-3813). IEEE.
[62] Yang, Y., & Pedersen, J. O. (1997, July). A comparative study on feature selection in text categorization. In Icml (Vol. 97, pp. 412-420).
描述 碩士
國立政治大學
資訊管理學系
104356023
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0104356023
資料類型 thesis
dc.contributor.advisor 姜國輝zh_TW
dc.contributor.advisor Chiang, Kuo-Huieen_US
dc.contributor.author (Authors) 吳信維zh_TW
dc.contributor.author (Authors) Wu, Xin-Weien_US
dc.creator (作者) 吳信維zh_TW
dc.creator (作者) Wu, Xin-Weien_US
dc.date (日期) 2017en_US
dc.date.accessioned 11-Jul-2017 11:29:39 (UTC+8)-
dc.date.available 11-Jul-2017 11:29:39 (UTC+8)-
dc.date.issued (上傳時間) 11-Jul-2017 11:29:39 (UTC+8)-
dc.identifier (Other Identifiers) G0104356023en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/110798-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊管理學系zh_TW
dc.description (描述) 104356023zh_TW
dc.description.abstract (摘要)   本研究目的在於結合關聯規則新詞發掘演算法來擴增詞庫,並藉此提高結斷詞句的精確度以及透過非監督式情感分析方法,從中央通訊社中抓取國民黨以及民進黨的相關新聞文本,建立主題模型與情緒傾向的標注。再藉由監督式學習方法建立分類模型並驗證其成果。
  本研究藉由n-gram with a-priori algorithm來進行斷詞斷句的詞庫擴增。共有32007組詞被發掘,於這些詞中具有真正意義的詞共有28838筆,成功率可達88%。
  本研究比較兩種分群方法建立主題模型,分別為TFIDF-Kmeans以及LDA。在TFIDF-Kmeans分群結果中,因為文本數量遠大於議題詞數量,造成TFIDF矩陣過於稀疏,造成分群效果不佳。在LDA的分群結果底下,因為LDA模型其多文章多主題共享的特性,主題分類的精準度更高達八成以上。故本研究認為在分析具有多主題特性之文本,採用LDA模型來進行議題詞分群會有較佳的表現。
  本研究透過結合不同的資料時間區間,呈現出中央通訊社的新聞文本在我國近五次總統大選前後三個月間的新聞情緒傾向。同時探討各主題模型中各類別於大選前後三個月之情緒傾向變化。可以觀察到大致上文本的情感指數高峰值會出現於投票日的時候,而近三次總統大選的結果顯示,相關的政黨新聞情感值會於選舉過後趨於平緩。而從新聞文本的正負向情感統計以及以及整體情緒傾向分析可以看出,不論執政黨為何,中央通訊社的新聞對於國民黨以及民進黨皆呈現了正向且平穩的內容,大抵不會特別偏向單一政黨
zh_TW
dc.description.abstract (摘要)   The purpose of this research is to combine association rules and new word mining algorithms to expand the lexicons so as to improve the accuracy of word segmentations, and by capturing the KMT and DPP news from the Central News Agency, it establishes the theme model and sentiment orientation through the unsupervised sentiment analysis method. Finally, by means of supervised learning methods, this research establishes classifications models and verifies its results.
  This research uses n-gram with a-priori algorithm to segment words and sentences to expand the lexicons. A total of 32007 word are found, and among them, there have 28838 words with real meaning. The success rate is up to 88%.
  In this research, we compare two different clustering methods to form the theme model, which are the TFIDF-Kmeans, and the LDA. From the results of TFIDF-Kmeans, the TFIDF matrix is too sparse, resulting in poor clustering because the number of texts is a lot larger than that of the issues. Unlike TFIDF-Kmeans, because of LDA model with more features of multi-topic sharing, the accuracy of topic classification is more than 80%. Therefore, this research suggests that it will have a better performance to analyze the multi-subjective texts with LDA model to classify the word clustering.
  Through the combination of different data time interval, this research presents the sentimental tendencies of Central News Agency’s news in three months before and after the last five presidential elections in Taiwan. At the same time, it also explores the changes of the sentimental tendencies in the various theme models in the three months before and after the election. It can be observed the sentimental peak of the text will be appeared on the polling day, and nearly three times of the presidential election results show that the sentimental value of the relevant party’s news will become smooth after the election. From the positive and negative sentimental statistics of the news text and the analysis of the overall sentimental tendencies, no matter which the ruling party is, the news of the Central News Agency for the KMT and the DPP presents a positive and stable content, not particularly toward any political party.
en_US
dc.description.tableofcontents 第一章 緒論 1
第一節 研究背景 1
第二節 研究動機 2
第三節 研究目的 3
第二章 文獻探討 4
第一節 中央通訊社簡介 4
第二節 應用文字探勘於政治之研究 4
第三節 中文斷詞技術 5
一、 詞庫斷詞法 (Lexicon-Based Methods) 5
二、 n-gram 5
第四節 擴充詞庫 6
第五節 情感分析 (Sentiment Analysis) 7
一、 情感分析之分類 7
二、 情感分析之方法 8
三、 情感分析之應用 11
四、 台大情緒詞典 (NTUSD) 12
第六節 主題模型 12
一、 TFIDF-Kmeans 12
二、 Latent Dirichlet Allocation, LDA 14
第七節 特徵詞選取 (Feature Selection) 17
一、 文件頻率 (Document Frequency Threshold) 17
二、 訊息增益 (Information Gain) 17
三、 交互資訊量 (Mutual Information) 18
四、 卡方統計量 (Chi-Square Statistic) 19
第八節 文字分類 (Text Classification) 20
一、 簡單貝氏分類器 (Naïve Bayes Classifier) 21
二、 k-nearest neighbors, kNN 22
三、 支援向量機 (Support Vector Machine, SVM) 22
第三章 研究方法 25
第一節 資料蒐集 (Data Collection) 26
第二節 文本前處理 (Document Preprocessing) 26
一、 中文斷詞 (Segmentation/Tokenization) 26
二、 詞性標注 (Part-of-Speech Tagging) 29
三、 否定詞處理 (Negation Process) 29
四、 詞性過濾 (POS Filtering) 29
五、 停用字過濾 (Stop Word Filtering) 31
六、 計算字詞頻率 31
第三節 文本主題標注 31
一、 找出文本熱門議題詞 31
二、 建立主題模型 32
三、 判斷文本主題 33
第四節 情緒傾向標注(Sentiment Orientation) 33
一、 建立情感詞集(Building Sentiment Term Set) 34
二、 情緒指數計算 34
三、 情緒傾向標注 35
第五節 視覺化分析 (Visualization) 35
第六節 建立向量空間模型 35
第七節 特徵詞萃取 36
第八節 分類模型建立與分類成效衡量(Classification) 37
一、 監督式學習的分類演算法 37
二、 分類的效果衡量 37
第四章 實驗結果與討論 39
第一節 政治文本資料蒐集結果 39
第二節 新詞擴充結果 39
第三節 文本主題標注 42
一、 文本熱門議題詞篩選 42
二、 建立主題模型 44
三、 分類效果衡量 48
四、 分類結果討論 49
第四節 情緒傾向標注結果 49
一、 建立情感詞集 49
二、 情緒指數計算與情緒傾向標注 50
三、 情緒傾向標注結果討論 50
第五節 視覺化分析結果 51
一、 整體情緒傾向分析 51
二、 各主題模型類別情緒傾向變化分析 53
第五章 研究結果與建議 59
第一節 研究結果 59
一、 擴增詞庫 59
二、 主題模型 59
三、 情緒傾向判別 60
四、 視覺化分析 60
第二節 研究建議 60
第六章 參考文獻 62
第七章 附錄 70
一、 國民黨相關新聞內容 70
二、 民進黨相關新聞內容 70
三、 混和兩黨相關新聞內容 72
zh_TW
dc.format.extent 3214381 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0104356023en_US
dc.subject (關鍵詞) 情感分析zh_TW
dc.subject (關鍵詞) LDA主題模型zh_TW
dc.subject (關鍵詞) n-gramzh_TW
dc.subject (關鍵詞) a-priorizh_TW
dc.subject (關鍵詞) Sentiment analysisen_US
dc.subject (關鍵詞) LDAen_US
dc.subject (關鍵詞) N-gramen_US
dc.subject (關鍵詞) A-priorien_US
dc.title (題名) 應用情感分析於媒體新聞傾向之研究-以中央社為例zh_TW
dc.title (題名) Applying sentiment analysis to the tendency of media news: a case study of central news agencyen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] 中央通訊社,(2004)。全球新聞神經大透視。台北:中央通訊社。
[2] 王正豪, & 李啟菁. (2010). 中文部落格文章之意見分析. 碩士論文, 國立台北科技大學資訊工程研究所.
[3] 李日斌. (2014). 探討臺灣網民對鄰國的情感. 中山大學資訊管理學系研究所學位論文, 1-66.
[4] 杜嘉忠、徐健、劉穎,(2014)。網絡商品評論的特徵-情感詞本體構建與情感分析方法研究,現代圖書情報技術,30(5),74-82。
[5] 林育龍. (2013). 對使用者評論之情感分析研究-以Google Play市集為例. 國立政治大學資訊管理所碩士論文
[6] 洪崇洋. (2012). LDA 和使用紀錄為基礎的線上電子書主題趨勢發掘方法. 國立中山大學資訊管理所碩士論文
[7] 張日威. (2014). 應用LDA進行Plurk主題分類及使用者情緒分析. 國立雲林科技大學資訊管理所碩士論文
[8] 許桓瑜. (2012). 長句斷詞法和遺傳演算法對新聞分類的影響. 淡江大學資訊工程學系碩士班學位論文,
[9] 陳昭元. (2016). 應用情感分析於輿情之研究-以台灣 2016. 國立政治大學資訊管理學系碩士班學位論文,
[10] 黃居仁 (2007-2009),謝舒凱 (2009-2010)。《跨語言知識表徵基礎架構─面向多語化與 全球化的語言學研究》。國科會專題補助計畫
[11] 黃居仁,謝舒凱,洪嘉馡,陳韻竹,蘇依莉,陳永祥,黃勝偉。中文詞彙網路: 跨語言知識處理基礎架構的設計理念與實踐. 中國語文,24卷第二期
[12] 黃運高, 王妍, 邱武松, 向林泓, & 趙學良. (2014). 基於 K-means 和 TF-IDF 的中文藥名聚類分析. 計算機應用, 1.
[13] 劉吉軒, & 吳建良. (2007). 以情緒為中心之情境資訊觀察與評估. Paper presented at the NCS全國計算機會議.
[14] 龔建彰. (2014). 基於新聞字詞漲跌極性之股價趨勢分類預測. 交通大學資訊管理研究所學位論文, 1-36.
[15] Agrawal, R., & Srikant, R. (1994, September). Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB (Vol. 1215, pp. 487-499).
[16] Baccianella, S., Esuli, A., & Sebastiani, F. (2010, May). SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. In LREC (Vol. 10, pp. 2200-2204).
[17] Basu, T., & Murthy, C. A. (2012, December). Effective text classification by a supervised feature selection approach. In Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on (pp. 918-925). IEEE.
[18] Basu, T., & Murthy, C. A. (2012, December). Effective text classification by a supervised feature selection approach. In Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on (pp. 918-925). IEEE.
[19] Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84.
[20] Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.
[21] Breakthrough Analysis. Retrieved 2015-02-23, from https://breakthroughanalysis.com/2008/08/01/unstructured-data-and-the-80-percent-rule/
[22] Brown, P. F., Desouza, P. V., Mercer, R. L., Pietra, V. J. D., & Lai, J. C. (1992). Class-based n-gram models of natural language. Computational linguistics, 18(4), 467-479.
[23] Changqiu, S., Xiaolong, W., & Jun, X. (2009). Study on Feature Selection in Finance Text Categorization. In Conference Proceedings-IEEE International Conference on Systems, Man and Cybernetics, Art (No. 5346030, pp. 5077-5082).
[24] Chen, K. J., & Liu, S. H. (1992, August). Word identification for Mandarin Chinese sentences. In Proceedings of the 14th conference on Computational linguistics-Volume 1 (pp. 101-107). Association for Computational Linguistics.
[25] Chen, K. J., & Ma, W. Y. (2002, August). Unknown word extraction for Chinese documents. In Proceedings of the 19th international conference on Computational linguistics-Volume 1 (pp. 1-7). Association for Computational Linguistics.
[26] Chen, K. J., & Ma, W. Y. (2002, August). Unknown word extraction for Chinese documents. In Proceedings of the 19th international conference on Computational linguistics-Volume 1 (pp. 1-7). Association for Computational Linguistics.
[27] Chu-Ren Huang and Shu-Kai Hsieh. (2010). Infrastructure for Cross-lingual Knowledge Representation ─ Towards Multilingualism in Linguistic Studies. Taiwan NSC-granted Research Project (NSC 96-2411-H-003-061-MY3)
[28] Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. John Wiley & Sons.
[29] Drucker, H., Wu, D., & Vapnik, V. N. (1999). Support vector machines for spam categorization. IEEE Transactions on Neural networks, 10(5), 1048-1054.
[30] DTREG, Retrieved February 14 2017, from https://www.dtreg.com/solution/view/20
[31] Erkan G, Özgür A, Radev D R (2007) Semi-Supervised Classification for Extracting Protein Interaction Sentences using Dependency Parsing. Proceedings of EMNLP-CoNLL 228–237.
[32] Farhadloo, M., & Rolland, E. (2013, December). Multi-class sentiment analysis with clustering and score representation. In Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on (pp. 904-912). IEEE.
[33] Feldman, R. (2013). Techniques and applications for sentiment analysis. Communications of the ACM, 56(4), 82-89.
[34] Gary King, Jennifer Pan, and Margaret E Roberts. 2013. “How Censorship in China Allows Government Criticism but Silences Collective Expression.” American Political Science Review, 2 (May), 107: 1-18.
[35] Gong, Z., & Yu, T. (2010, November). Chinese web text classification system model based on Naive Bayes. In E-Product E-Service and E-Entertainment (ICEEE), 2010 International Conference on (pp. 1-4). IEEE.
[36] Griffiths, T., & Steyvers, M. (2007). Probabilistic topic models. Handbook of latent semantic analysis, 427(7), 424-440.
[37] Hao, L., & Hao, L. (2008, December). Automatic identification of stop words in chinese text classification. In Computer Science and Software Engineering, 2008 International Conference on (Vol. 1, pp. 718-722). IEEE.
[38] Hotho, A., Nürnberger, A., & Paaß, G. (2005, May). A brief survey of text mining. In Ldv Forum (Vol. 20, No. 1, pp. 19-62).
[39] Hu, M., & Liu, B. (2004, August). Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 168-177). ACM.
[40] Joachims, T. (1998, April). Text categorization with support vector machines: Learning with many relevant features. In European conference on machine learning (pp. 137-142). Springer Berlin Heidelberg.
[41] Jonathan Hassid(2012). Safety Valve or Pressure Cooker. Journal of Communication 62: 212–230
[42] Kim, S. M., & Hovy, E. H. (2007, June). Crystal: Analyzing Predictive Opinions on the Web. In EMNLP-CoNLL (pp. 1056-1064).
[43] Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1), 1-167.
[44] Liu, H., Sun, J., Liu, L., & Zhang, H. (2009). Feature selection with dynamic mutual information. Pattern Recognition, 42(7), 1330-1339.
[45] LOPE Lab, Retrieved February 14 2017, from http://lope.linguistics.ntu.edu.tw/cwn/
[46] Lowe W. (2015) ‘Yoshikoder: Cross-platform multilingual content analysis’. Java software version 0.6.5, URL http://www.yoshikoder.org
[47] Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval (Vol. 1, No. 1, p. 496). Cambridge: Cambridge university press.
[48] Mouthami, K., Devi, K. N., & Bhaskaran, V. M. (2013, February). Sentiment analysis and classification based on textual reviews. In Information communication and embedded systems (ICICES), 2013 international conference on (pp. 271-276). IEEE.
[49] Newman, D., Asuncion, A. U., Smyth, P., & Welling, M. (2007, December). Distributed Inference for Latent Dirichlet Allocation. In NIPS (Vol. 20, pp. 1081-1088).
[50] Oelke, D., Hao, M., Rohrdantz, C., Keim, D. A., Dayal, U., Haug, L. E., & Janetzko, H. (2009, October). Visual opinion analysis of customer feedback data. In Visual Analytics Science and Technology, 2009. VAST 2009. IEEE Symposium on (pp. 187-194). IEEE.
[51] Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1-135.
[52] Pang, B., Lee, L., & Vaithyanathan, S. (2002, July). Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10 (pp. 79-86). Association for Computational Linguistics.
[53] Qamar, A. M., Gaussier, E., Chevallet, J. P., & Lim, J. H. (2008, December). Similarity learning for nearest neighbor classification. In Data Mining, 2008. ICDM`08. Eighth IEEE International Conference on (pp. 983-988). IEEE.
[54] Soliman, T. H. A., Elmasry, M. A., Hedar, A. R., & Doss, M. M. (2012, October). Utilizing support vector machines in mining online customer reviews. In Computer Theory and Applications (ICCTA), 2012 22nd International Conference on (pp. 192-197). IEEE.
[55] Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational linguistics, 37(2), 267-307.
[56] Tata, S., & Patel, J. M. (2007). Estimating the selectivity of tf-idf based cosine similarity predicates. ACM Sigmod Record, 36(2), 7-12.
[57] Turney, P. D. (2002, July). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 417-424). Association for Computational Linguistics.
[58] Turney, P. D., & Littman, M. L. (2003). Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems (TOIS), 21(4), 315-346.
[59] Valakunde, N. D., & Patwardhan, M. S. (2013, November). Multi-aspect and multi-class based document sentiment analysis of educational data catering accreditation process. In Cloud & Ubiquitous Computing & Emerging Technologies (CUBE), 2013 International Conference on (pp. 188-192). IEEE.
[60] Vapnik, V. N. (1999). An overview of statistical learning theory. IEEE transactions on neural networks, 10(5), 988-999.
[61] Wang, Y., & Huang, S. T. (2005, August). Chinese word segmentation based on A-priori and adjacent characters. In Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on (Vol. 6, pp. 3808-3813). IEEE.
[62] Yang, Y., & Pedersen, J. O. (1997, July). A comparative study on feature selection in text categorization. In Icml (Vol. 97, pp. 412-420).
zh_TW