Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 兩種中文情感運算分析策略: 以部首為基礎及深層類神經學習
Two Chinese Sentiment Analysis Approaches: Radical-based and Deep Learning Neural Network
作者 趙逢毅
Chao, August F.Y.
貢獻者 楊亨利
Yang, Heng Li
趙逢毅
Chao, August F.Y.
關鍵詞 中文情感分析
部首資訊
深層學習
屬性選擇
屬性萃取
Chinese Sentiment Analysis
Radical Information
Deep Learning
Feature Selection
Feature Extraction
日期 2015
上傳時間 1-Mar-2016 10:27:35 (UTC+8)
摘要 評論是所有人類行為的核心,因為它影響我們行為的關鍵因素。我們都試著從不同型式的評論分析與研究試著從作者字裡行間的文字呈現內容深入推敲及理解,從而要能過濾出能協助決策的有用資訊。在早期的評論研究將評論視為是文本分類問題,直到2000年前後,從分析評論的主觀句子與評論裡形容詞的程度衡量用詞,學者們開始對解構整篇文本的內容,並試著從語言學的角度分析用字遣詞與情感方向之間的關聯。這種從文字語義關聯分析評論的方式,也使文本挖掘技術必需結合自然語言的處理原則,才能更準確地了解評論的內容。隨著許多新興的機器學習演算法與自然語言處理方法不斷地推陳出新,及網路使用行為拓展至電子商務與線上虛擬社群的建立,情感分析研究亦開始不斷地蓬勃發展。
漢文不同於世界其它語言,它擁有許多獨特表徵:無空格區隔、一字一語素、依詞為語言中表達意義的最小獨立單位,也使得在套用源自西方的情感分析原則時更加困難。然而過去的研究者則加以利用這些語言特徵,建立出專屬中文的情感分析原則。我們務實地討論適用於中文情感分析的情境(a)可取得情感分析資源及專家語言智慧,及(b)可取得領域字詞特徵向量定義的兩個前題下,提出適合的中文情感分析策略。在情境(a)中,我們深入討論運用部首資訊至情感分析中的適用性,並且提出一套能精萃出領域評論文本的觀測字詞/部首組的方法。研究中我們萃取出50個部首組,並運用在領域相近的評論裡得到很好的情感分類成效。而在情境(b)中我們提出適合深層類神經網路學習方法的評論字詞的權重過濾原則,不僅能確保評論字詞在學習過程中仍保有能積旋出合適屬性,並且驗證此權重原則在支援向量機的學習方式下亦有相同的優勢。在研究中,我們亦討論此兩種情境下進行情感分析的必要條件與資訊,並為未來更深入的中文情感分析起到墊腳石的作用。
Opinion is the core of human behaviors, because it directly influences key factor of our behaviors. Despite of personal or organizational decision making processes, we all constantly conduct various kinds of opinion analysis, including explaining and comprehending what users present. At the beginning, opinion studies considered as a text mining problems, and tried to cluster opinions into positive and negative groups. After 2000, researchers intended to decompose sentences from whole opinions by analysing subjective expressing and adjective words presenting within, as well as explained the relationships between semantics and sentiment from linguistics aspect. Therefore, opinion analysis has to incorporate with natural language processing techniques, so we can understand the opinion contents. Nowadays, sentiment analysis grows event booming due to emerging machine learning and natural language processing approaches, as well as the needs of electronic commerce and virtual community on line.
Unfortunately, Chinese is quite unlike other language due to non-space separated, one character as one morpheme, and considering words (compositing with several characters) as minimum semantic expression unit. And those language features also bring difficult to adopted sentiment analysis principles from English. Nevertheless, researchers leveraged Chinese language information to propose specific sentiment analysis approaches dedicated to analyze Chinese opinions. In this study, we practically discussed the situations of conducting sentiment analysis: (a) using sentiment analysis resources and experts’ knowledge; and (b) using word feature vector, called word2vec, and deep learning. In (a) scenario, we propose a Chinese radical-based sentiment analysis approach and experiment the applicability. We also proposed a feature extraction method, so we can generate 50 seeds for further analysis. In (b), we compared 4 different feature selection approaches for deep learning, in order to keep accuracy and make sure understandable feature can be generated in neural network. We also tested feature selection approaches in SVM classifier and retrieved similar results. In this study, we also discussed essential constraints and required information in both scenarios, as well as the results of this study can be the foundation of continuing Chinese sentiment analysis studies.
參考文獻 Aizawa, A. (2003). An information-theoretic perspective of tf–idf measures. Information Processing & Management, 39(1), 45-65.
Arun Meena, T. V. Prabhakar (2007). Sentence Level Sentiment Analysis in the Presence of Conjuncts Using Linguistic Analysis. Lecture Notes in Computer Science, 4425, 573-580.
Bengio, Y., Ducharme, R., Vincent, P., & Janvin, C. (2003). A neural probabilistic language model. The Journal of Machine Learning Research, 3, 1137-1155.
Blunsom, P., Grefenstette, E., & Kalchbrenner, N. (2014). A Convolutional Neural Network for Modelling Sentences. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 655-665.
Bullinaria, J. A., & Levy, J. P. (2007). Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior research methods, 39(3), 510-526.
Bradley, M.M., & Lang, P.J. (1999). Affective norms for English words (ANEW): Instruction manual and affective ratings. Technical Report C-1, The Center for Research in Psychophysiology, University of Florida.
Cambria, E., Havasi, C., & Hussain, A. (2012, May). SenticNet 2: A Semantic and Affective Resource for Opinion Mining and Sentiment Analysis. International Conference of the Florida Artificial Intelligence Research Society, 202-207.
Che, W., Li, Z., & Liu, T. (2010, August). Ltp: A chinese language technology platform. Proceedings of the 23rd International Conference on Computational Linguistics, 13-16.
Choi, Y., Cardie, C., Riloff, E., & Patwardhan, S. (2005, October 6–8). Identifying sources of opinions with conditional random fields and extraction patterns. Proceedings of the conference on empirical methods in natural language processing (EMNLP 2005), Vancouver, BC, Canada, 355–362.
Chowdhury, G. G. (2003). Natural language processing. Annual review of information science and technology, 37(1), 51-89.
Church, K. W., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational linguistics, 16(1), 22-29.
Cohen, R., Goldberg, Y., & Elhadad, M. (2012, July). Domain adaptation of a dependency parser with a class-class selectional preference model. In Proceedings of ACL 2012 Student Research Workshop, 43-48.
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 12, 2493-2537.
Das, S., & Chen, M. (2001). Yahoo! for Amazon: Extracting market sentiment from stock message boards. Management Science, 53(9), 1375-1388.
Dave, K., Lawrence, S., & Pennock, D. M. (2003, May 20–24). Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. Proceedings of the 12th international WWW conference, Budapest, Hungary, 519–528.
Ekkekakis, P. (2013). The measurement of affect, mood, and emotion: A guide for health-behavioral research. Cambridge University Press.
Esuli, A., & Sebastiani, F. (2006, May). Sentiwordnet: A publicly available lexical resource for opinion mining. Proceedings of LREC, 6, 417-422.
Faruqui, M., & Dyer, C. (2014, June). Community evaluation and exchange of word vectors at wordvectors. org. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics.
Fellbaum, C. (1998). WordNet. Blackwell Publishing Ltd.
Feldman, R. (2013). Techniques and applications for sentiment analysis. Communications of the ACM, 56(4), 82-89.
Giraudo, H., & Voga, R.M. (2007). Lexema-based model vs. Morpheme- based model from psycholinguistic perspectives. Comunicación presentada en el congreso Morphology in Tolouse. Tolouse, 108-114.
Hatzivassiloglou, Vasileios and Kathleen R. McKeown, (1997) Predicting the semantic orientation of adjectives. Proceedings of Annual Meeting of the Association for Computational Linguistics.
Hu, M., & Liu, B. (2004, August). Mining and summarizing customer reviews. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 168-177.
Huang, J. R., Hsieh, S. K., Hong, J. F., Chen, Y. Z., Su, I. L., Chen, Y. X., & Huang, S. W. (2010). Chinese Wordnet: design, implementation, and application of an infrastructure for cross-lingual knowledge processing. Journal of Chinese Information Processing, 24(2), 14-23.
Janyce Wiebe M. (1994). Tracking point of view in narrative. Computational Linguistics, 20(2), 233-287.
Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. Springer Berlin Heidelberg.
Jones, S.K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28(1), 11-21.
Kim, Y. (2014). Convolutional neural networks for sentence classification. Conference on Empirical Methods in Natural Language Processing (EMNLP 2014), 1746-1751, arXiv preprint arXiv:1408.5882.
Kim, J., Li, J. J., & Lee, J. H. (2009, August). Discovering the discriminative views: measuring term weights for sentiment analysis. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1, Association for Computational Linguistics, 253-261.
Ko, Y. (2012, August). A study of term weighting schemes using class information for text classification. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, 1029-1030.
Kontopoulos, E., Berberidis, C., Dergiades, T., & Bassiliades, N. (2013). Ontology-based sentiment analysis of twitter posts. Expert systems with applications, 40(10), 4065-4074.
Ku, L.W. & Chen, H.H. (2007). Mining Opinions from the Web: Beyond Relevance Retrieval. Journal of American Society for Information Science and Technology, Special Issue on Mining Web Resources for Enhancing Information Retrieval, 58(12), 1838-1850.
Ku, L. W., Huang, T. H., & Chen, H. H. (2009, August). Using morphological and syntactic structures for Chinese opinion analysis. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 3, 1260-1269.
Ku, L. W., Huang, T. H., & Chen, H. H. (2009). Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 3, 1260-1269.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
Levy, O., & Goldberg, Y. (2014). Dependencybased word embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2, 302-308.
Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1), 1-167.
Liu, B., Hu, M., & Cheng, J. (2005, May). Opinion observer: analyzing and comparing opinions on the web. Proceedings of the 14th international conference on World Wide Web, 342-351.
Liu, B. (2010). Sentiment analysis and subjectivity. Handbook of natural language processing, 2nd edition.
Lu, B., Song, Y., Zhang, X., & Tsou, B. K. (2010). Learning Chinese polarity lexicons by integration of graph models and morphological features. Information retrieval technology, 466-477.
Loper, E., & Bird, S. (2002, July). NLTK: The natural language toolkit. Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics, 1, 63-70.
Ma, Wei-Yun and Keh-Jiann Chen, 2003, Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff, Proceedings of ACL, Second SIGHAN Workshop on Chinese Language Processing, 168-171.
Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011, June). Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, pp. 142-150.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013a). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, 3111-3119.
Mikolov, T., Chen, K., Corrado, G. S., and Dean, D.( 2013b). Efficient estimation of word representations in vector space. CoRR, abs/1301.3781.
Montejo-Ráez, A., Martínez-Cámara, E., Martín-Valdivia, M. T., & Ureña-López, L. A. (2014). Ranked wordnet graph for sentiment polarity classification in twitter. Computer Speech & Language, 28(1), 93-107.
Nakagawa, T., Inui, K., & Kurohashi, S. (2010, June). Dependency tree-based sentiment classification using CRFs with hidden variables. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 786-794.
Paltoglou, G., & Thelwall, M. (2010, July). A study of information retrieval weighting schemes for sentiment analysis. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 1386-1395.
Pang, B., & Lee, L. (2005, June). Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 115-124.
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and trends in information retrieval, 2(1-2), 1-135.
Pang, B., Lee, L., & Vaithyanathan, S. (2002, July). Thumbs up?: sentiment classification using machine learning techniques. Proceedings of the ACL-02 conference on Empirical methods in natural language processing, 10, 79-86.
Pang, B., & Lee, L. (2004, July 21–26). A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. Proceedings of the 42nd annual meeting of the Association for Computational Linguistics (ACL) Barcelona, Spain, 271–278.
Pang, B., & Lee, L. (2005, June). Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp. 115-124). Association for Computational Linguistics.
Peng, F., Feng, F., & McCallum, A. (2004, August). Chinese segmentation and new word detection using conditional random fields. Proceedings of the 20th international conference on Computational Linguistics,562.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et. al. & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12, 2825-2830.
Quan, C., & Ren, F. (2014). Unsupervised product feature extraction for feature-oriented opinion determination. Information Sciences, 272, 16-28.
Riloff, E., & Wiebe, J. (2003, July). Learning extraction patterns for subjective expressions. Proceedings of the 2003 conference on Empirical methods in natural language processing, 105-112.
ŘEHŮŘEK, Radim and Petr SOJKA. (2010) Software Framework for Topic Modelling with Large Corpora. In Proceedings of LREC 2010 workshop New Challenges for NLP Frameworks. Valletta, 46-50.
Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613-620.
Santos, C.N. dos, & Gatti, M. (2014). Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of the 25th International Conference on Computational Linguistics (COLING), Dublin, Ireland, 69-78.
Sears, A., & Jacko, J. A. (Eds.). (2007). The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications. CRC press.
Stone, P. J., & Hunt, E. B. (1963, May). A computer approach to content analysis: studies using the general inquirer system. Proceedings of the May 21-23, 1963, spring joint computer conference, 241-256.
Strapparava, C., & Valitutti, A. (2004, May). WordNet Affect: an Affective Extension of WordNet. Proceedings of the 4th International Conference on Language Resources and Evaluation, 4, 1083-1086.
Su, Q., Xu, X., Guo, H., Guo, Z., Wu, X., Zhang, X., Swen, B., 2008. Hidden sentiment association in Chinese web opinion mining. In: Proceedings of the 17th international conference on World Wide Web, pp. 959-968.
Sun, Y. T., Chen, C. L., Liu, C. C., Liu, C. L., & Soo, V. W. (2010). Sentiment Classification of Short Chinese Sentences. Proceedings of the 22nd Conference on Computational Linguistics and Speech Processing (ROCLING 2010), 184-198. (in Chinese)
Tsai, A.C.R., Wu, C.E., Tsai, R.T.H., & Hsu, J.Y.J. (2013). Building a concept-level sentiment dictionary based on commonsense knowledge. IEEE Intelligent Systems, (2), 22-30.
Tan, J.S.F., Lu, E.H.C., & Tseng, V.S. (2013). Preference-oriented mining techniques for location-based store search. Knowledge and Information Systems, 34(1), 147–169.
Tan, S., & Zhang, J. (2008). An empirical study of sentiment analysis for Chinese documents. Expert Systems with Applications, 34(4), 2622-2629.
Turney, Peter D. (2002) Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL-2002), 417-424.
Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer. New York.
Van Rijsbergen, C. J. (1979). Information Retrieval. 2ed. London: Butterworth.
Wiebe, Janyce, Rebecca F. Bruce, and Thomas P. O`Hara, (1999) Development and use of a gold-standard data set for subjectivity classifications. Proceedings of the Association for Computational Linguistics (ACL-1999)..
Wan, X. (2009, August). Co-training for cross-lingual sentiment classification. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 1, 235-243.
Wang, X., Zhao, Y., & Fu, G. (2011). A Morpheme-based Method to Chinese Sentence-Level Sentiment Classification. International Journal of Asian Language Processing, 21(3), 95-106.
Wu, Y., & Wen, M. (2010, August). Disambiguating dynamic sentiment ambiguous adjectives. Proceedings of the 23rd International Conference on Computational Linguistics, 1191-1199.
Wu, H. C., Luk, R. W. P., Wong, K. F., & Kwok, K. L. (2008). Interpreting tf-idf term weights as making relevance decisions. ACM Transactions on Information Systems (TOIS), 26(3), 13:1-13:37.
Wilson, T., Wiebe, J., & Hoffmann, P. (2005, October). Recognizing contextual polarity in phrase-level sentiment analysis. Proceedings of the conference on human language technology and empirical methods in natural language processing, 347-354.
Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., ... & Patwardhan, S. (2005, October). OpinionFinder: A system for subjectivity analysis. Proceedings of hlt/emnlp on interactive demonstrations, 34-35.
Wu, Z., & Tseng, G. (1993). Chinese text segmentation for text retrieval: Achievements and problems. Journal of the American Society for Information Science, 44(9), 532-542.
Xu, G., Huang, C.R., Wang, H., 2013. Extracting Chinese product features: representing a sequence by a set of skip-bigrams. In: Chinese Lexical Semantics, 72-83.
Yang, H.L., Chao F.Y.C., (2014), Sentiment analysis for Chinese reviews of movies in multi-genre based on morpheme-based features and collocations, Information Systems Frontiers, http://dx.doi.org/10.1007/s10796-014-9498-1.
Yu, H. C., Huang, T. H. K., & Chen, H. H. (2012). Domain Dependent Word Polarity Analysis for Sentiment Classification. Computational Linguistics and Chinese Language Processing ROCLING XXIV, 17(4), 33-48.
Zhang, W., Xu, H., & Wan, W. (2012). Weakness Finder: Find product weakness from Chinese reviews by using aspects based sentiment analysis. Expert Systems with Applications, 39(11), 10283-10291.
Zhang, C., Zeng, D., Li, J., Wang, F.Y., & Zuo, W. (2009). Sentiment analysis of Chinese documents: From sentence to document level. Journal of the American Society for Information Science and Technology, 60(12), 2474-2487.
Zhang, H., Yu, Z., Xu, M., Shi, Y., 2011. Feature-level sentiment analysis for Chinese product reviews. International Conference on Computer Research and Development ICCRD, 2, pp. 135-140.
Zhou, S., & Mondragón, R. J. (2004). The rich-club phenomenon in the Internet topology. Communications Letters, IEEE, 8(3), 180-182.
吳孟淞與王新民(2011) 運用詞關聯轉化語言模型於新聞文件分類之研究,第十七屆資訊管理暨實務研討會,高雄。
黃居仁(2005) 漢字知識表達的幾個層面:字、詞與詞義關係概論,漢字與全球化國際學術研討會,臺北。
高照明(2010) 中文詞彙語意資料的整合及擷取:詞彙語意學的觀點,高照明編著,計算語言學論文集, 68-97。
周亞民,黃居仁 (2005)漢字意符知識結構的建立,第六届汉语词汇语义学研讨会论文集。
洪嘉馡,黃居仁,許銘維(2013) 以中文十億詞語料庫為基礎之兩岸詞彙對比研究,中文計算語言學期刊,8(2),19-34.
周亞民,吳玲玲,黃居仁(2005) 漢字知識本體-以字為本的知識架構與其應用示例,國立台灣大學資訊管理學系,未發表博士論文。
趙逢毅,鍾曉芳 (2011) 基於辭典詞彙釋義之多階層釋義關聯程度計量─ 以 [目] 字部為例,中文計算語言學期刊,16(3-4),21-39。
趙元任(2002) 中國語文法,香港中文大學,丁邦新譯,2002增訂版。
何永清(2005) 現代漢語語法新探,臺北:臺灣商務印書館。
梅家駒,竺一鳴,高蘊琦,殷鴻翔(1984) 同義詞詞林,香港:商務印書館。
林語堂(1955) 整理漢字草案,中國世紀, 14-15。
唐蘭(1935) 古文字學導論。
詞庫小組(1993) 技術報告93-05:中文詞類分析(三版),中央研究院。
描述 博士
國立政治大學
資訊管理學系
97356506
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0973565061
資料類型 thesis
dc.contributor.advisor 楊亨利zh_TW
dc.contributor.advisor Yang, Heng Lien_US
dc.contributor.author (Authors) 趙逢毅zh_TW
dc.contributor.author (Authors) Chao, August F.Y.en_US
dc.creator (作者) 趙逢毅zh_TW
dc.creator (作者) Chao, August F.Y.en_US
dc.date (日期) 2015en_US
dc.date.accessioned 1-Mar-2016 10:27:35 (UTC+8)-
dc.date.available 1-Mar-2016 10:27:35 (UTC+8)-
dc.date.issued (上傳時間) 1-Mar-2016 10:27:35 (UTC+8)-
dc.identifier (Other Identifiers) G0973565061en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/81464-
dc.description (描述) 博士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊管理學系zh_TW
dc.description (描述) 97356506zh_TW
dc.description.abstract (摘要) 評論是所有人類行為的核心,因為它影響我們行為的關鍵因素。我們都試著從不同型式的評論分析與研究試著從作者字裡行間的文字呈現內容深入推敲及理解,從而要能過濾出能協助決策的有用資訊。在早期的評論研究將評論視為是文本分類問題,直到2000年前後,從分析評論的主觀句子與評論裡形容詞的程度衡量用詞,學者們開始對解構整篇文本的內容,並試著從語言學的角度分析用字遣詞與情感方向之間的關聯。這種從文字語義關聯分析評論的方式,也使文本挖掘技術必需結合自然語言的處理原則,才能更準確地了解評論的內容。隨著許多新興的機器學習演算法與自然語言處理方法不斷地推陳出新,及網路使用行為拓展至電子商務與線上虛擬社群的建立,情感分析研究亦開始不斷地蓬勃發展。
漢文不同於世界其它語言,它擁有許多獨特表徵:無空格區隔、一字一語素、依詞為語言中表達意義的最小獨立單位,也使得在套用源自西方的情感分析原則時更加困難。然而過去的研究者則加以利用這些語言特徵,建立出專屬中文的情感分析原則。我們務實地討論適用於中文情感分析的情境(a)可取得情感分析資源及專家語言智慧,及(b)可取得領域字詞特徵向量定義的兩個前題下,提出適合的中文情感分析策略。在情境(a)中,我們深入討論運用部首資訊至情感分析中的適用性,並且提出一套能精萃出領域評論文本的觀測字詞/部首組的方法。研究中我們萃取出50個部首組,並運用在領域相近的評論裡得到很好的情感分類成效。而在情境(b)中我們提出適合深層類神經網路學習方法的評論字詞的權重過濾原則,不僅能確保評論字詞在學習過程中仍保有能積旋出合適屬性,並且驗證此權重原則在支援向量機的學習方式下亦有相同的優勢。在研究中,我們亦討論此兩種情境下進行情感分析的必要條件與資訊,並為未來更深入的中文情感分析起到墊腳石的作用。
zh_TW
dc.description.abstract (摘要) Opinion is the core of human behaviors, because it directly influences key factor of our behaviors. Despite of personal or organizational decision making processes, we all constantly conduct various kinds of opinion analysis, including explaining and comprehending what users present. At the beginning, opinion studies considered as a text mining problems, and tried to cluster opinions into positive and negative groups. After 2000, researchers intended to decompose sentences from whole opinions by analysing subjective expressing and adjective words presenting within, as well as explained the relationships between semantics and sentiment from linguistics aspect. Therefore, opinion analysis has to incorporate with natural language processing techniques, so we can understand the opinion contents. Nowadays, sentiment analysis grows event booming due to emerging machine learning and natural language processing approaches, as well as the needs of electronic commerce and virtual community on line.
Unfortunately, Chinese is quite unlike other language due to non-space separated, one character as one morpheme, and considering words (compositing with several characters) as minimum semantic expression unit. And those language features also bring difficult to adopted sentiment analysis principles from English. Nevertheless, researchers leveraged Chinese language information to propose specific sentiment analysis approaches dedicated to analyze Chinese opinions. In this study, we practically discussed the situations of conducting sentiment analysis: (a) using sentiment analysis resources and experts’ knowledge; and (b) using word feature vector, called word2vec, and deep learning. In (a) scenario, we propose a Chinese radical-based sentiment analysis approach and experiment the applicability. We also proposed a feature extraction method, so we can generate 50 seeds for further analysis. In (b), we compared 4 different feature selection approaches for deep learning, in order to keep accuracy and make sure understandable feature can be generated in neural network. We also tested feature selection approaches in SVM classifier and retrieved similar results. In this study, we also discussed essential constraints and required information in both scenarios, as well as the results of this study can be the foundation of continuing Chinese sentiment analysis studies.
en_US
dc.description.tableofcontents 第一章 緒論 1
第一節 研究背景與動機 1
第二節 研究問題與目的 5
(1) 探討利用「部首」資訊在情感分析的差異 5
(2) 領域相依的觀察字詞(部首組)表萃取 6
(3) 討論適用深層類神經網路學習下的字詞過濾原則 6
第二章 文獻探討 8
第一節 情感分析方法及自然語言處理 8
第二節 屬性單元選擇 11
第三節 屬性選擇原則與分類器之間的差異 17
第四節 已知情感極性語料資源 18
第五節 文本向量化策略 19
第六節 機器學習方法 24
第七節 中文情感分析 29
第三章 以部首為基礎的情感分析 37
第一節 餐廳評論資料集與文本處理原則 37
第二節 中文字詞的部首組表示單元 40
第三節 搭配詞的範圍界定 42
第四節 實驗設計與比較說明 43
第五節 Unigram屬性比較 45
第六節 Bigram屬性比較 47
第七節 產生參考字(部首)表 53
第八節 FRRank部首表再利用 57
第四章 類神經學習網路情感分析 60
第一節 準備文本集合與字詞特徵向量定義 62
第二節 同義詞替換及觀察字詞選擇 64
第三節 訓練文本準備 67
第四節 類神經網路之學習及預測 69
第五章 結論、研究限制及未來方向 73
第一節 結論 73
第二節 研究限制與未來方向 75
參考文獻 77
附錄1:單句邊界標記符號 83
附錄2:僅出現於wTFIDF及bTFIDF的字集 83
zh_TW
dc.format.extent 97254 bytes-
dc.format.extent 212797 bytes-
dc.format.extent 256014 bytes-
dc.format.extent 464606 bytes-
dc.format.extent 3135797 bytes-
dc.format.extent 306529 bytes-
dc.format.extent 479272 bytes-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0973565061en_US
dc.subject (關鍵詞) 中文情感分析zh_TW
dc.subject (關鍵詞) 部首資訊zh_TW
dc.subject (關鍵詞) 深層學習zh_TW
dc.subject (關鍵詞) 屬性選擇zh_TW
dc.subject (關鍵詞) 屬性萃取zh_TW
dc.subject (關鍵詞) Chinese Sentiment Analysisen_US
dc.subject (關鍵詞) Radical Informationen_US
dc.subject (關鍵詞) Deep Learningen_US
dc.subject (關鍵詞) Feature Selectionen_US
dc.subject (關鍵詞) Feature Extractionen_US
dc.title (題名) 兩種中文情感運算分析策略: 以部首為基礎及深層類神經學習zh_TW
dc.title (題名) Two Chinese Sentiment Analysis Approaches: Radical-based and Deep Learning Neural Networken_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Aizawa, A. (2003). An information-theoretic perspective of tf–idf measures. Information Processing & Management, 39(1), 45-65.
Arun Meena, T. V. Prabhakar (2007). Sentence Level Sentiment Analysis in the Presence of Conjuncts Using Linguistic Analysis. Lecture Notes in Computer Science, 4425, 573-580.
Bengio, Y., Ducharme, R., Vincent, P., & Janvin, C. (2003). A neural probabilistic language model. The Journal of Machine Learning Research, 3, 1137-1155.
Blunsom, P., Grefenstette, E., & Kalchbrenner, N. (2014). A Convolutional Neural Network for Modelling Sentences. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 655-665.
Bullinaria, J. A., & Levy, J. P. (2007). Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior research methods, 39(3), 510-526.
Bradley, M.M., & Lang, P.J. (1999). Affective norms for English words (ANEW): Instruction manual and affective ratings. Technical Report C-1, The Center for Research in Psychophysiology, University of Florida.
Cambria, E., Havasi, C., & Hussain, A. (2012, May). SenticNet 2: A Semantic and Affective Resource for Opinion Mining and Sentiment Analysis. International Conference of the Florida Artificial Intelligence Research Society, 202-207.
Che, W., Li, Z., & Liu, T. (2010, August). Ltp: A chinese language technology platform. Proceedings of the 23rd International Conference on Computational Linguistics, 13-16.
Choi, Y., Cardie, C., Riloff, E., & Patwardhan, S. (2005, October 6–8). Identifying sources of opinions with conditional random fields and extraction patterns. Proceedings of the conference on empirical methods in natural language processing (EMNLP 2005), Vancouver, BC, Canada, 355–362.
Chowdhury, G. G. (2003). Natural language processing. Annual review of information science and technology, 37(1), 51-89.
Church, K. W., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational linguistics, 16(1), 22-29.
Cohen, R., Goldberg, Y., & Elhadad, M. (2012, July). Domain adaptation of a dependency parser with a class-class selectional preference model. In Proceedings of ACL 2012 Student Research Workshop, 43-48.
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 12, 2493-2537.
Das, S., & Chen, M. (2001). Yahoo! for Amazon: Extracting market sentiment from stock message boards. Management Science, 53(9), 1375-1388.
Dave, K., Lawrence, S., & Pennock, D. M. (2003, May 20–24). Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. Proceedings of the 12th international WWW conference, Budapest, Hungary, 519–528.
Ekkekakis, P. (2013). The measurement of affect, mood, and emotion: A guide for health-behavioral research. Cambridge University Press.
Esuli, A., & Sebastiani, F. (2006, May). Sentiwordnet: A publicly available lexical resource for opinion mining. Proceedings of LREC, 6, 417-422.
Faruqui, M., & Dyer, C. (2014, June). Community evaluation and exchange of word vectors at wordvectors. org. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics.
Fellbaum, C. (1998). WordNet. Blackwell Publishing Ltd.
Feldman, R. (2013). Techniques and applications for sentiment analysis. Communications of the ACM, 56(4), 82-89.
Giraudo, H., & Voga, R.M. (2007). Lexema-based model vs. Morpheme- based model from psycholinguistic perspectives. Comunicación presentada en el congreso Morphology in Tolouse. Tolouse, 108-114.
Hatzivassiloglou, Vasileios and Kathleen R. McKeown, (1997) Predicting the semantic orientation of adjectives. Proceedings of Annual Meeting of the Association for Computational Linguistics.
Hu, M., & Liu, B. (2004, August). Mining and summarizing customer reviews. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 168-177.
Huang, J. R., Hsieh, S. K., Hong, J. F., Chen, Y. Z., Su, I. L., Chen, Y. X., & Huang, S. W. (2010). Chinese Wordnet: design, implementation, and application of an infrastructure for cross-lingual knowledge processing. Journal of Chinese Information Processing, 24(2), 14-23.
Janyce Wiebe M. (1994). Tracking point of view in narrative. Computational Linguistics, 20(2), 233-287.
Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. Springer Berlin Heidelberg.
Jones, S.K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28(1), 11-21.
Kim, Y. (2014). Convolutional neural networks for sentence classification. Conference on Empirical Methods in Natural Language Processing (EMNLP 2014), 1746-1751, arXiv preprint arXiv:1408.5882.
Kim, J., Li, J. J., & Lee, J. H. (2009, August). Discovering the discriminative views: measuring term weights for sentiment analysis. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1, Association for Computational Linguistics, 253-261.
Ko, Y. (2012, August). A study of term weighting schemes using class information for text classification. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, 1029-1030.
Kontopoulos, E., Berberidis, C., Dergiades, T., & Bassiliades, N. (2013). Ontology-based sentiment analysis of twitter posts. Expert systems with applications, 40(10), 4065-4074.
Ku, L.W. & Chen, H.H. (2007). Mining Opinions from the Web: Beyond Relevance Retrieval. Journal of American Society for Information Science and Technology, Special Issue on Mining Web Resources for Enhancing Information Retrieval, 58(12), 1838-1850.
Ku, L. W., Huang, T. H., & Chen, H. H. (2009, August). Using morphological and syntactic structures for Chinese opinion analysis. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 3, 1260-1269.
Ku, L. W., Huang, T. H., & Chen, H. H. (2009). Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 3, 1260-1269.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
Levy, O., & Goldberg, Y. (2014). Dependencybased word embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2, 302-308.
Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1), 1-167.
Liu, B., Hu, M., & Cheng, J. (2005, May). Opinion observer: analyzing and comparing opinions on the web. Proceedings of the 14th international conference on World Wide Web, 342-351.
Liu, B. (2010). Sentiment analysis and subjectivity. Handbook of natural language processing, 2nd edition.
Lu, B., Song, Y., Zhang, X., & Tsou, B. K. (2010). Learning Chinese polarity lexicons by integration of graph models and morphological features. Information retrieval technology, 466-477.
Loper, E., & Bird, S. (2002, July). NLTK: The natural language toolkit. Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics, 1, 63-70.
Ma, Wei-Yun and Keh-Jiann Chen, 2003, Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff, Proceedings of ACL, Second SIGHAN Workshop on Chinese Language Processing, 168-171.
Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011, June). Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, pp. 142-150.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013a). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, 3111-3119.
Mikolov, T., Chen, K., Corrado, G. S., and Dean, D.( 2013b). Efficient estimation of word representations in vector space. CoRR, abs/1301.3781.
Montejo-Ráez, A., Martínez-Cámara, E., Martín-Valdivia, M. T., & Ureña-López, L. A. (2014). Ranked wordnet graph for sentiment polarity classification in twitter. Computer Speech & Language, 28(1), 93-107.
Nakagawa, T., Inui, K., & Kurohashi, S. (2010, June). Dependency tree-based sentiment classification using CRFs with hidden variables. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 786-794.
Paltoglou, G., & Thelwall, M. (2010, July). A study of information retrieval weighting schemes for sentiment analysis. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 1386-1395.
Pang, B., & Lee, L. (2005, June). Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 115-124.
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and trends in information retrieval, 2(1-2), 1-135.
Pang, B., Lee, L., & Vaithyanathan, S. (2002, July). Thumbs up?: sentiment classification using machine learning techniques. Proceedings of the ACL-02 conference on Empirical methods in natural language processing, 10, 79-86.
Pang, B., & Lee, L. (2004, July 21–26). A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. Proceedings of the 42nd annual meeting of the Association for Computational Linguistics (ACL) Barcelona, Spain, 271–278.
Pang, B., & Lee, L. (2005, June). Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp. 115-124). Association for Computational Linguistics.
Peng, F., Feng, F., & McCallum, A. (2004, August). Chinese segmentation and new word detection using conditional random fields. Proceedings of the 20th international conference on Computational Linguistics,562.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et. al. & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12, 2825-2830.
Quan, C., & Ren, F. (2014). Unsupervised product feature extraction for feature-oriented opinion determination. Information Sciences, 272, 16-28.
Riloff, E., & Wiebe, J. (2003, July). Learning extraction patterns for subjective expressions. Proceedings of the 2003 conference on Empirical methods in natural language processing, 105-112.
ŘEHŮŘEK, Radim and Petr SOJKA. (2010) Software Framework for Topic Modelling with Large Corpora. In Proceedings of LREC 2010 workshop New Challenges for NLP Frameworks. Valletta, 46-50.
Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613-620.
Santos, C.N. dos, & Gatti, M. (2014). Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of the 25th International Conference on Computational Linguistics (COLING), Dublin, Ireland, 69-78.
Sears, A., & Jacko, J. A. (Eds.). (2007). The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications. CRC press.
Stone, P. J., & Hunt, E. B. (1963, May). A computer approach to content analysis: studies using the general inquirer system. Proceedings of the May 21-23, 1963, spring joint computer conference, 241-256.
Strapparava, C., & Valitutti, A. (2004, May). WordNet Affect: an Affective Extension of WordNet. Proceedings of the 4th International Conference on Language Resources and Evaluation, 4, 1083-1086.
Su, Q., Xu, X., Guo, H., Guo, Z., Wu, X., Zhang, X., Swen, B., 2008. Hidden sentiment association in Chinese web opinion mining. In: Proceedings of the 17th international conference on World Wide Web, pp. 959-968.
Sun, Y. T., Chen, C. L., Liu, C. C., Liu, C. L., & Soo, V. W. (2010). Sentiment Classification of Short Chinese Sentences. Proceedings of the 22nd Conference on Computational Linguistics and Speech Processing (ROCLING 2010), 184-198. (in Chinese)
Tsai, A.C.R., Wu, C.E., Tsai, R.T.H., & Hsu, J.Y.J. (2013). Building a concept-level sentiment dictionary based on commonsense knowledge. IEEE Intelligent Systems, (2), 22-30.
Tan, J.S.F., Lu, E.H.C., & Tseng, V.S. (2013). Preference-oriented mining techniques for location-based store search. Knowledge and Information Systems, 34(1), 147–169.
Tan, S., & Zhang, J. (2008). An empirical study of sentiment analysis for Chinese documents. Expert Systems with Applications, 34(4), 2622-2629.
Turney, Peter D. (2002) Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL-2002), 417-424.
Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer. New York.
Van Rijsbergen, C. J. (1979). Information Retrieval. 2ed. London: Butterworth.
Wiebe, Janyce, Rebecca F. Bruce, and Thomas P. O`Hara, (1999) Development and use of a gold-standard data set for subjectivity classifications. Proceedings of the Association for Computational Linguistics (ACL-1999)..
Wan, X. (2009, August). Co-training for cross-lingual sentiment classification. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 1, 235-243.
Wang, X., Zhao, Y., & Fu, G. (2011). A Morpheme-based Method to Chinese Sentence-Level Sentiment Classification. International Journal of Asian Language Processing, 21(3), 95-106.
Wu, Y., & Wen, M. (2010, August). Disambiguating dynamic sentiment ambiguous adjectives. Proceedings of the 23rd International Conference on Computational Linguistics, 1191-1199.
Wu, H. C., Luk, R. W. P., Wong, K. F., & Kwok, K. L. (2008). Interpreting tf-idf term weights as making relevance decisions. ACM Transactions on Information Systems (TOIS), 26(3), 13:1-13:37.
Wilson, T., Wiebe, J., & Hoffmann, P. (2005, October). Recognizing contextual polarity in phrase-level sentiment analysis. Proceedings of the conference on human language technology and empirical methods in natural language processing, 347-354.
Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., ... & Patwardhan, S. (2005, October). OpinionFinder: A system for subjectivity analysis. Proceedings of hlt/emnlp on interactive demonstrations, 34-35.
Wu, Z., & Tseng, G. (1993). Chinese text segmentation for text retrieval: Achievements and problems. Journal of the American Society for Information Science, 44(9), 532-542.
Xu, G., Huang, C.R., Wang, H., 2013. Extracting Chinese product features: representing a sequence by a set of skip-bigrams. In: Chinese Lexical Semantics, 72-83.
Yang, H.L., Chao F.Y.C., (2014), Sentiment analysis for Chinese reviews of movies in multi-genre based on morpheme-based features and collocations, Information Systems Frontiers, http://dx.doi.org/10.1007/s10796-014-9498-1.
Yu, H. C., Huang, T. H. K., & Chen, H. H. (2012). Domain Dependent Word Polarity Analysis for Sentiment Classification. Computational Linguistics and Chinese Language Processing ROCLING XXIV, 17(4), 33-48.
Zhang, W., Xu, H., & Wan, W. (2012). Weakness Finder: Find product weakness from Chinese reviews by using aspects based sentiment analysis. Expert Systems with Applications, 39(11), 10283-10291.
Zhang, C., Zeng, D., Li, J., Wang, F.Y., & Zuo, W. (2009). Sentiment analysis of Chinese documents: From sentence to document level. Journal of the American Society for Information Science and Technology, 60(12), 2474-2487.
Zhang, H., Yu, Z., Xu, M., Shi, Y., 2011. Feature-level sentiment analysis for Chinese product reviews. International Conference on Computer Research and Development ICCRD, 2, pp. 135-140.
Zhou, S., & Mondragón, R. J. (2004). The rich-club phenomenon in the Internet topology. Communications Letters, IEEE, 8(3), 180-182.
吳孟淞與王新民(2011) 運用詞關聯轉化語言模型於新聞文件分類之研究,第十七屆資訊管理暨實務研討會,高雄。
黃居仁(2005) 漢字知識表達的幾個層面:字、詞與詞義關係概論,漢字與全球化國際學術研討會,臺北。
高照明(2010) 中文詞彙語意資料的整合及擷取:詞彙語意學的觀點,高照明編著,計算語言學論文集, 68-97。
周亞民,黃居仁 (2005)漢字意符知識結構的建立,第六届汉语词汇语义学研讨会论文集。
洪嘉馡,黃居仁,許銘維(2013) 以中文十億詞語料庫為基礎之兩岸詞彙對比研究,中文計算語言學期刊,8(2),19-34.
周亞民,吳玲玲,黃居仁(2005) 漢字知識本體-以字為本的知識架構與其應用示例,國立台灣大學資訊管理學系,未發表博士論文。
趙逢毅,鍾曉芳 (2011) 基於辭典詞彙釋義之多階層釋義關聯程度計量─ 以 [目] 字部為例,中文計算語言學期刊,16(3-4),21-39。
趙元任(2002) 中國語文法,香港中文大學,丁邦新譯,2002增訂版。
何永清(2005) 現代漢語語法新探,臺北:臺灣商務印書館。
梅家駒,竺一鳴,高蘊琦,殷鴻翔(1984) 同義詞詞林,香港:商務印書館。
林語堂(1955) 整理漢字草案,中國世紀, 14-15。
唐蘭(1935) 古文字學導論。
詞庫小組(1993) 技術報告93-05:中文詞類分析(三版),中央研究院。
zh_TW