學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 網路評價搜尋結果的正負意見分類系統
A sentiment classification system on search results of web opinions
作者 黃泓彰
Huang, Hung Chang
貢獻者 楊亨利
Yang, Heng Li
黃泓彰
Huang, Hung Chang
關鍵詞 意見探勘
情感分析
情感分類
網路評價
Opinion mining
Sentiment analysis
Sentiment classification
Web opinion
日期 2013
上傳時間 25-Aug-2014 15:16:17 (UTC+8)
摘要 本研究嘗試建置一個包含兩個主要功能的系統,分別是網路評價搜尋以及情感分類。在網路評價搜尋的部份,我們使用Google搜尋並蒐集一攜帶型智慧裝置(智慧型手機、平板電腦與筆記型電腦)的網路評價搜尋結果;情感分類的部分則是將搜尋結果依照對該產品的意見分類為,共有正面/負面/中立、正面/負面、正面/非正面,以及負面/非負面等四種分類方式。為了建置此系統,我們首先從知名的網路論壇Mobile01和批踢踢蒐集和攜帶型智慧裝置有關的網路文章以及產品名稱,接著以人工的方式標記每篇文章,以及部分文章中的句子的情感。本研究設計了兩個層次的情感分類實驗,我們首先從語句層次出發,以監督式機器學習法訓練將句子分為正面/負面/中立等三個類別的分類模型後,再進入文章層次,將句子的意見彙整,並同樣以監督式機器學習法訓練四種不同文章層次的分類模型:正面/負面/中立、正面/負面、正面/非正面,以及負面/非負面。我們分別選出四種分類實驗中表現最佳的模型,並用於系統建置,其中表現最佳的是分類為正面/負面的分類模型,平均的F-measure為0.87;其次是分類為負面/非負面的模型,對負面類別的F-measure為0.83;接著是分類為正面/非正面的模型,對正面類別的F-measure為0.81;表現最差的是正面/負面/中立的分類,平均的F-measure為0.77。在正面/負面分類的準確率上,本研究的表現並不壞於過去以英文為主要語言的相關研究。最後,我們也以過去不經過語句層次的分類方法進行實驗並比較,其結果發現經過語句層次的情感分類比不經過語句層次的情感分類較佳。
In this research, we implemented a system that retrieves the search results of mobile phones, tablets, and notebooks from Google, and then classifies them as: (1) positive, negative, or neutral, (2) positive or negative, (3) positive or non-positive, (4) negative or non-negative. To build this system, first we collected some documents about mobile phones, tablets, and notebooks on two popular web forums: mobile01.com and ptt.cc. Next, a sentiment label (positive, negative, or neutral) is attached to each document and each sentence of these documents. We designed a two-level supervised sentiment classification experiment. At sentence level, we trained classifiers that classify sentences as positive, negative, or neutral. The best sentence classifier was then used at document level. At document level, the sentiment labels of the sentences in documents are used. We trained classifiers in four different classification problems: (1) positive, negative, or neutral, (2) positive vs. negative, (3) positive vs. non-positive, (4) negative vs. non-negative. The best is the second classifier with an average F-measure of 0.87. The next is the fourth classifier with an F-measure of 0.83 on negative class, and then comes with the third classifier with an F-measure of 0.81 on positive class. The last is the first classifier with an average F-measure of 0.77. Our accuracy is not worse than the past English study on the classification of positive vs. negative. Finally, we conducted another classification experiment using document-level-only classification method, and the results showed that our two-level sentiment classification (first sentence level, then document level) outperforms document-level-only sentiment classification.
參考文獻 張育蓉(2012)。使用情緒分析於圖書館使用者滿意度評估之研究。國立中興大學圖書資訊學研究所未出版碩士學位論文,台灣,台中。
張慧美(2006)。網路語言之語言風格研究。彰化師大國文學誌,13,331-359。
梅家駒、竺一鳴、高蘊琦、殷鴻翔(1982)。同義詞詞林。上海:上海辭書出版社。
黃泓彰、楊亨利(2013)。運用機器學習與語言模型的網路用語轉譯系統。第19屆資訊管理暨實務研討會(IMP2013),台灣,台中。
Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 1-27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Church, K. W., & Hanks, P. (1989). Word association norms, mutual information and lexicography. Proceedings of the 27th Annual Conference of the ACL. New Brunswick, New Jersey.
Das, S., & Chen, M. (2001). Yahoo! for Amazon: Extracting market sentiment from stock message boards. Proceedings of the Asia Pacific Finance Association Annual Conference (APFA), Bangkok, Thailand.
Dave, K., Lawrence, S., & Pennock, D. M. (2003). Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. Proceedings of the 12th International World Wide Web Conference (WWW), Budapest, Hungary.
Demartini, G., & Siersdorfer, S. (2010). Dear search engine: what`s your opinion about...?: Sentiment analysis for semantic enrichment of web search results. Proceedings of the 3rd Semantic Search Workshop, Raleigh, North Carolina.
Ding, X., Liu, B., & Yu, P. S. (2008). A holistic lexicon-based approach to opinion mining. Proceedings of the Conference on Web Search and Web Data Mining (WSDM), Stanford, California.
Eirinaki, M., Pisal, S., & Singh, J. (2012). Feature-based opinion mining and ranking. Journal of Computer and System Sciences, 78(4), 1175-1184.
Ganapathibholta, M., & Liu, B. (2008). Mining opinions in comparative sentences. Proceedings of the 22nd International Conference on Computational Linguistics (COLING), Manchester, United Kindom.
Kouloumpis, E., Wilson, T., & Moore, J. (2011). Twitter sentiment analysis: The good the bad and the OMG! Proceedings of the 5th International AAAI Conference on Weblogs and Social Media, Barcelona, Spain.
Ku, L. W., & Chen, H. H. (2007). Mining opinions from the web: Beyond relevance retrieval. Journal of American Society for Information Science and Technology, Special Issue on Mining Web Resources for Enhancing Information Retrieval, 58(12), 1838-1850. Software available at
http://nlg18.csie.ntu.edu.tw:8080/opinion/index.html
Lin, C. J., & Chao, P. H. (2010). Tourism-related opinion detection and tourist-attraction target identification. International Journal of Computational Linguistics and Chinese Language Processing, 15(1), 37-60.
Liu, B. (2010). Sentiment Analysis and Subjectivity. In N. Indurkhya & F.J. Damerau (Eds.), Handbook of Natural Language Processing, (2nd ed.). Boca Raton: Chapman & Hall/CRC.
Liu, B. (2012). Sentiment analysis and opinion mining. Morgan & Claypool Publishers.
Liu, J., & Seneff, S. (2009). Review sentiment scoring via a parse-and-paraphrase paradigm. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Singapore, Singapore.
Lu, B. (2010). Identifying opinion holders and targets with dependency parser in Chinese news texts. Proceedings of the Human Language Technologies: the 2010 Annual Conference of the North American Chapter of the ACL (NAACL HLT), Student Research Workshop, Los Angeles, California.
Ma, T., & Wan, X. (2010). Opinion target extraction in Chinese news comments. Proceedings of the International Conference on Computational Linguistics (COLING) Poster Volume, Beijing, China.
Moghaddam, S., & Ester, M. (2012). Aspect-based opinion mining from online reviews. Tutorial at Special Interest Group on Information Retrieval (SIGIR), Portland, Oregon. Retrieved March 10, 2014, from
https://www.cs.sfu.ca/~ester/papers/SIGIR2012.Tutorial.Final.pdf
Na, J. C., Sui, H., Khoo, C., Chan, S., & Zhou, Y. (2004). Effectiveness of simple linguistic processing in automatic sentiment classification of product reviews. Proceedings of the 8th International Society for Knowledge Organization Conference (ISKO), London, United Kindom.
Nasukawa, T., & Yi, J. (2003). Sentiment analysis: Capturing favorability using natural language processing. Proceedings of the International Conference on Knowledge Capture (K-CAP), Sanibel Island, Florida.
Pak, A., & Paroubek, P. (2010). Twitter as a corpus for sentiment analysis and opinion mining. Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC), Valletta, Malta.
Pang, B., & Lee, L. (2008a). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2), 1-135.
Pang, B., & Lee, L. (2008b). Using very simple statistics for review search: An exploration. Proceedings of the International Conference on Computational Linguistics (COLING) Poster Paper, Manchester, United Kindom.
Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, Pennsylvania.
Popescu, A. M., & Etzioni, O. (2005). Extracting product features and opinions from reviews. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Vancouver, Canada.
Su, Q., Xu, X., Guo, H., Guo, Z., Wu, X., Zhang, X., & Swen, B. (2008). Hidden sentiment association in Chinese web opinion mining. Proceedings of the 17th International Conference on World Wide Web, Beijing, China.
Turney, P. D. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, Pennsylvania.
Turney, P. D., & Littman, M. L. (2002). Unsupervised learning of semantic orientation from a hundred-billion-word corpus (Technical Report ERB-1094). Ottawa, Canada: National Research Council Canada.
Wiebe, J., Bruce, R. F., & O’Hara, T. P. (1999). Development and use of a gold-standard data set for subjectivity classifications. Proceedings of the Association for Computational Linguistics (ACL), College Park, Maryland.
Yu, H., & Hatzivassiloglou, V. (2003). Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Sapporo, Japan.
描述 碩士
國立政治大學
資訊管理研究所
101356016
102
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0101356016
資料類型 thesis
dc.contributor.advisor 楊亨利zh_TW
dc.contributor.advisor Yang, Heng Lien_US
dc.contributor.author (Authors) 黃泓彰zh_TW
dc.contributor.author (Authors) Huang, Hung Changen_US
dc.creator (作者) 黃泓彰zh_TW
dc.creator (作者) Huang, Hung Changen_US
dc.date (日期) 2013en_US
dc.date.accessioned 25-Aug-2014 15:16:17 (UTC+8)-
dc.date.available 25-Aug-2014 15:16:17 (UTC+8)-
dc.date.issued (上傳時間) 25-Aug-2014 15:16:17 (UTC+8)-
dc.identifier (Other Identifiers) G0101356016en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/69195-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊管理研究所zh_TW
dc.description (描述) 101356016zh_TW
dc.description (描述) 102zh_TW
dc.description.abstract (摘要) 本研究嘗試建置一個包含兩個主要功能的系統,分別是網路評價搜尋以及情感分類。在網路評價搜尋的部份,我們使用Google搜尋並蒐集一攜帶型智慧裝置(智慧型手機、平板電腦與筆記型電腦)的網路評價搜尋結果;情感分類的部分則是將搜尋結果依照對該產品的意見分類為,共有正面/負面/中立、正面/負面、正面/非正面,以及負面/非負面等四種分類方式。為了建置此系統,我們首先從知名的網路論壇Mobile01和批踢踢蒐集和攜帶型智慧裝置有關的網路文章以及產品名稱,接著以人工的方式標記每篇文章,以及部分文章中的句子的情感。本研究設計了兩個層次的情感分類實驗,我們首先從語句層次出發,以監督式機器學習法訓練將句子分為正面/負面/中立等三個類別的分類模型後,再進入文章層次,將句子的意見彙整,並同樣以監督式機器學習法訓練四種不同文章層次的分類模型:正面/負面/中立、正面/負面、正面/非正面,以及負面/非負面。我們分別選出四種分類實驗中表現最佳的模型,並用於系統建置,其中表現最佳的是分類為正面/負面的分類模型,平均的F-measure為0.87;其次是分類為負面/非負面的模型,對負面類別的F-measure為0.83;接著是分類為正面/非正面的模型,對正面類別的F-measure為0.81;表現最差的是正面/負面/中立的分類,平均的F-measure為0.77。在正面/負面分類的準確率上,本研究的表現並不壞於過去以英文為主要語言的相關研究。最後,我們也以過去不經過語句層次的分類方法進行實驗並比較,其結果發現經過語句層次的情感分類比不經過語句層次的情感分類較佳。zh_TW
dc.description.abstract (摘要) In this research, we implemented a system that retrieves the search results of mobile phones, tablets, and notebooks from Google, and then classifies them as: (1) positive, negative, or neutral, (2) positive or negative, (3) positive or non-positive, (4) negative or non-negative. To build this system, first we collected some documents about mobile phones, tablets, and notebooks on two popular web forums: mobile01.com and ptt.cc. Next, a sentiment label (positive, negative, or neutral) is attached to each document and each sentence of these documents. We designed a two-level supervised sentiment classification experiment. At sentence level, we trained classifiers that classify sentences as positive, negative, or neutral. The best sentence classifier was then used at document level. At document level, the sentiment labels of the sentences in documents are used. We trained classifiers in four different classification problems: (1) positive, negative, or neutral, (2) positive vs. negative, (3) positive vs. non-positive, (4) negative vs. non-negative. The best is the second classifier with an average F-measure of 0.87. The next is the fourth classifier with an F-measure of 0.83 on negative class, and then comes with the third classifier with an F-measure of 0.81 on positive class. The last is the first classifier with an average F-measure of 0.77. Our accuracy is not worse than the past English study on the classification of positive vs. negative. Finally, we conducted another classification experiment using document-level-only classification method, and the results showed that our two-level sentiment classification (first sentence level, then document level) outperforms document-level-only sentiment classification.en_US
dc.description.tableofcontents 壹、緒論 1
一、研究背景及動機 1
二、研究目的與問題 3
三、研究方法與架構 4
四、研究限制 4
貳、文獻探討 6
一、意見探勘與情感分析 6
二、意見探勘與搜尋系統的相關研究 15
參、實驗資料蒐集與處理 17
一、攜帶型智慧裝置名稱蒐集 17
二、語料庫蒐集 19
三、網路文本處理 22
四、人工標記 30
五、情感詞庫的蒐集與建立 32
肆、情感分類實驗 39
一、語句層次特徵抽取 39
二、語句層次分類實驗 44
三、文件層次特徵抽取 49
四、文件層次情感分類 52
伍、雛型系統建置 71
一、系統架構與建置環境 71
二、系統處理流程與介面 72
三、系統處理效率 74
陸、結論 76
一、研究成果 76
二、研究貢獻 77
三、未來研究方向 77
參考文獻 79
zh_TW
dc.format.extent 1905029 bytes-
dc.format.mimetype application/pdf-
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0101356016en_US
dc.subject (關鍵詞) 意見探勘zh_TW
dc.subject (關鍵詞) 情感分析zh_TW
dc.subject (關鍵詞) 情感分類zh_TW
dc.subject (關鍵詞) 網路評價zh_TW
dc.subject (關鍵詞) Opinion miningen_US
dc.subject (關鍵詞) Sentiment analysisen_US
dc.subject (關鍵詞) Sentiment classificationen_US
dc.subject (關鍵詞) Web opinionen_US
dc.title (題名) 網路評價搜尋結果的正負意見分類系統zh_TW
dc.title (題名) A sentiment classification system on search results of web opinionsen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) 張育蓉(2012)。使用情緒分析於圖書館使用者滿意度評估之研究。國立中興大學圖書資訊學研究所未出版碩士學位論文,台灣,台中。
張慧美(2006)。網路語言之語言風格研究。彰化師大國文學誌,13,331-359。
梅家駒、竺一鳴、高蘊琦、殷鴻翔(1982)。同義詞詞林。上海:上海辭書出版社。
黃泓彰、楊亨利(2013)。運用機器學習與語言模型的網路用語轉譯系統。第19屆資訊管理暨實務研討會(IMP2013),台灣,台中。
Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 1-27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Church, K. W., & Hanks, P. (1989). Word association norms, mutual information and lexicography. Proceedings of the 27th Annual Conference of the ACL. New Brunswick, New Jersey.
Das, S., & Chen, M. (2001). Yahoo! for Amazon: Extracting market sentiment from stock message boards. Proceedings of the Asia Pacific Finance Association Annual Conference (APFA), Bangkok, Thailand.
Dave, K., Lawrence, S., & Pennock, D. M. (2003). Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. Proceedings of the 12th International World Wide Web Conference (WWW), Budapest, Hungary.
Demartini, G., & Siersdorfer, S. (2010). Dear search engine: what`s your opinion about...?: Sentiment analysis for semantic enrichment of web search results. Proceedings of the 3rd Semantic Search Workshop, Raleigh, North Carolina.
Ding, X., Liu, B., & Yu, P. S. (2008). A holistic lexicon-based approach to opinion mining. Proceedings of the Conference on Web Search and Web Data Mining (WSDM), Stanford, California.
Eirinaki, M., Pisal, S., & Singh, J. (2012). Feature-based opinion mining and ranking. Journal of Computer and System Sciences, 78(4), 1175-1184.
Ganapathibholta, M., & Liu, B. (2008). Mining opinions in comparative sentences. Proceedings of the 22nd International Conference on Computational Linguistics (COLING), Manchester, United Kindom.
Kouloumpis, E., Wilson, T., & Moore, J. (2011). Twitter sentiment analysis: The good the bad and the OMG! Proceedings of the 5th International AAAI Conference on Weblogs and Social Media, Barcelona, Spain.
Ku, L. W., & Chen, H. H. (2007). Mining opinions from the web: Beyond relevance retrieval. Journal of American Society for Information Science and Technology, Special Issue on Mining Web Resources for Enhancing Information Retrieval, 58(12), 1838-1850. Software available at
http://nlg18.csie.ntu.edu.tw:8080/opinion/index.html
Lin, C. J., & Chao, P. H. (2010). Tourism-related opinion detection and tourist-attraction target identification. International Journal of Computational Linguistics and Chinese Language Processing, 15(1), 37-60.
Liu, B. (2010). Sentiment Analysis and Subjectivity. In N. Indurkhya & F.J. Damerau (Eds.), Handbook of Natural Language Processing, (2nd ed.). Boca Raton: Chapman & Hall/CRC.
Liu, B. (2012). Sentiment analysis and opinion mining. Morgan & Claypool Publishers.
Liu, J., & Seneff, S. (2009). Review sentiment scoring via a parse-and-paraphrase paradigm. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Singapore, Singapore.
Lu, B. (2010). Identifying opinion holders and targets with dependency parser in Chinese news texts. Proceedings of the Human Language Technologies: the 2010 Annual Conference of the North American Chapter of the ACL (NAACL HLT), Student Research Workshop, Los Angeles, California.
Ma, T., & Wan, X. (2010). Opinion target extraction in Chinese news comments. Proceedings of the International Conference on Computational Linguistics (COLING) Poster Volume, Beijing, China.
Moghaddam, S., & Ester, M. (2012). Aspect-based opinion mining from online reviews. Tutorial at Special Interest Group on Information Retrieval (SIGIR), Portland, Oregon. Retrieved March 10, 2014, from
https://www.cs.sfu.ca/~ester/papers/SIGIR2012.Tutorial.Final.pdf
Na, J. C., Sui, H., Khoo, C., Chan, S., & Zhou, Y. (2004). Effectiveness of simple linguistic processing in automatic sentiment classification of product reviews. Proceedings of the 8th International Society for Knowledge Organization Conference (ISKO), London, United Kindom.
Nasukawa, T., & Yi, J. (2003). Sentiment analysis: Capturing favorability using natural language processing. Proceedings of the International Conference on Knowledge Capture (K-CAP), Sanibel Island, Florida.
Pak, A., & Paroubek, P. (2010). Twitter as a corpus for sentiment analysis and opinion mining. Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC), Valletta, Malta.
Pang, B., & Lee, L. (2008a). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2), 1-135.
Pang, B., & Lee, L. (2008b). Using very simple statistics for review search: An exploration. Proceedings of the International Conference on Computational Linguistics (COLING) Poster Paper, Manchester, United Kindom.
Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, Pennsylvania.
Popescu, A. M., & Etzioni, O. (2005). Extracting product features and opinions from reviews. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Vancouver, Canada.
Su, Q., Xu, X., Guo, H., Guo, Z., Wu, X., Zhang, X., & Swen, B. (2008). Hidden sentiment association in Chinese web opinion mining. Proceedings of the 17th International Conference on World Wide Web, Beijing, China.
Turney, P. D. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, Pennsylvania.
Turney, P. D., & Littman, M. L. (2002). Unsupervised learning of semantic orientation from a hundred-billion-word corpus (Technical Report ERB-1094). Ottawa, Canada: National Research Council Canada.
Wiebe, J., Bruce, R. F., & O’Hara, T. P. (1999). Development and use of a gold-standard data set for subjectivity classifications. Proceedings of the Association for Computational Linguistics (ACL), College Park, Maryland.
Yu, H., & Hatzivassiloglou, V. (2003). Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Sapporo, Japan.
zh_TW