網路評價搜尋結果的正負意見分類系統

Publications-Theses

Article View/Open

pdf(1711)

Publication Export

Google Scholar^TM

題名	網路評價搜尋結果的正負意見分類系統 A sentiment classification system on search results of web opinions
作者	黃泓彰 Huang, Hung Chang
貢獻者	楊亨利 Yang, Heng Li 黃泓彰 Huang, Hung Chang
關鍵詞	意見探勘情感分析情感分類網路評價 Opinion mining Sentiment analysis Sentiment classification Web opinion
日期	2013
上傳時間	25-Aug-2014 15:16:17 (UTC+8)
摘要	本研究嘗試建置一個包含兩個主要功能的系統，分別是網路評價搜尋以及情感分類。在網路評價搜尋的部份，我們使用Google搜尋並蒐集一攜帶型智慧裝置（智慧型手機、平板電腦與筆記型電腦）的網路評價搜尋結果；情感分類的部分則是將搜尋結果依照對該產品的意見分類為，共有正面／負面／中立、正面／負面、正面／非正面，以及負面／非負面等四種分類方式。為了建置此系統，我們首先從知名的網路論壇Mobile01和批踢踢蒐集和攜帶型智慧裝置有關的網路文章以及產品名稱，接著以人工的方式標記每篇文章，以及部分文章中的句子的情感。本研究設計了兩個層次的情感分類實驗，我們首先從語句層次出發，以監督式機器學習法訓練將句子分為正面／負面／中立等三個類別的分類模型後，再進入文章層次，將句子的意見彙整，並同樣以監督式機器學習法訓練四種不同文章層次的分類模型：正面／負面／中立、正面／負面、正面／非正面，以及負面／非負面。我們分別選出四種分類實驗中表現最佳的模型，並用於系統建置，其中表現最佳的是分類為正面／負面的分類模型，平均的F-measure為0.87；其次是分類為負面／非負面的模型，對負面類別的F-measure為0.83；接著是分類為正面／非正面的模型，對正面類別的F-measure為0.81；表現最差的是正面／負面／中立的分類，平均的F-measure為0.77。在正面／負面分類的準確率上，本研究的表現並不壞於過去以英文為主要語言的相關研究。最後，我們也以過去不經過語句層次的分類方法進行實驗並比較，其結果發現經過語句層次的情感分類比不經過語句層次的情感分類較佳。 In this research, we implemented a system that retrieves the search results of mobile phones, tablets, and notebooks from Google, and then classifies them as: (1) positive, negative, or neutral, (2) positive or negative, (3) positive or non-positive, (4) negative or non-negative. To build this system, first we collected some documents about mobile phones, tablets, and notebooks on two popular web forums: mobile01.com and ptt.cc. Next, a sentiment label (positive, negative, or neutral) is attached to each document and each sentence of these documents. We designed a two-level supervised sentiment classification experiment. At sentence level, we trained classifiers that classify sentences as positive, negative, or neutral. The best sentence classifier was then used at document level. At document level, the sentiment labels of the sentences in documents are used. We trained classifiers in four different classification problems: (1) positive, negative, or neutral, (2) positive vs. negative, (3) positive vs. non-positive, (4) negative vs. non-negative. The best is the second classifier with an average F-measure of 0.87. The next is the fourth classifier with an F-measure of 0.83 on negative class, and then comes with the third classifier with an F-measure of 0.81 on positive class. The last is the first classifier with an average F-measure of 0.77. Our accuracy is not worse than the past English study on the classification of positive vs. negative. Finally, we conducted another classification experiment using document-level-only classification method, and the results showed that our two-level sentiment classification (first sentence level, then document level) outperforms document-level-only sentiment classification.
參考文獻	張育蓉（2012）。使用情緒分析於圖書館使用者滿意度評估之研究。國立中興大學圖書資訊學研究所未出版碩士學位論文，台灣，台中。張慧美（2006）。網路語言之語言風格研究。彰化師大國文學誌，13，331-359。梅家駒、竺一鳴、高蘊琦、殷鴻翔（1982）。同義詞詞林。上海：上海辭書出版社。黃泓彰、楊亨利（2013）。運用機器學習與語言模型的網路用語轉譯系統。第19屆資訊管理暨實務研討會（IMP2013），台灣，台中。 Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 1-27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Church, K. W., & Hanks, P. (1989). Word association norms, mutual information and lexicography. Proceedings of the 27th Annual Conference of the ACL. New Brunswick, New Jersey. Das, S., & Chen, M. (2001). Yahoo! for Amazon: Extracting market sentiment from stock message boards. Proceedings of the Asia Pacific Finance Association Annual Conference (APFA), Bangkok, Thailand. Dave, K., Lawrence, S., & Pennock, D. M. (2003). Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. Proceedings of the 12th International World Wide Web Conference (WWW), Budapest, Hungary. Demartini, G., & Siersdorfer, S. (2010). Dear search engine: what`s your opinion about...?: Sentiment analysis for semantic enrichment of web search results. Proceedings of the 3rd Semantic Search Workshop, Raleigh, North Carolina. Ding, X., Liu, B., & Yu, P. S. (2008). A holistic lexicon-based approach to opinion mining. Proceedings of the Conference on Web Search and Web Data Mining (WSDM), Stanford, California. Eirinaki, M., Pisal, S., & Singh, J. (2012). Feature-based opinion mining and ranking. Journal of Computer and System Sciences, 78(4), 1175-1184. Ganapathibholta, M., & Liu, B. (2008). Mining opinions in comparative sentences. Proceedings of the 22nd International Conference on Computational Linguistics (COLING), Manchester, United Kindom. Kouloumpis, E., Wilson, T., & Moore, J. (2011). Twitter sentiment analysis: The good the bad and the OMG! Proceedings of the 5th International AAAI Conference on Weblogs and Social Media, Barcelona, Spain. Ku, L. W., & Chen, H. H. (2007). Mining opinions from the web: Beyond relevance retrieval. Journal of American Society for Information Science and Technology, Special Issue on Mining Web Resources for Enhancing Information Retrieval, 58(12), 1838-1850. Software available at http://nlg18.csie.ntu.edu.tw:8080/opinion/index.html Lin, C. J., & Chao, P. H. (2010). Tourism-related opinion detection and tourist-attraction target identification. International Journal of Computational Linguistics and Chinese Language Processing, 15(1), 37-60. Liu, B. (2010). Sentiment Analysis and Subjectivity. In N. Indurkhya & F.J. Damerau (Eds.), Handbook of Natural Language Processing, (2nd ed.). Boca Raton: Chapman & Hall/CRC. Liu, B. (2012). Sentiment analysis and opinion mining. Morgan & Claypool Publishers. Liu, J., & Seneff, S. (2009). Review sentiment scoring via a parse-and-paraphrase paradigm. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Singapore, Singapore. Lu, B. (2010). Identifying opinion holders and targets with dependency parser in Chinese news texts. Proceedings of the Human Language Technologies: the 2010 Annual Conference of the North American Chapter of the ACL (NAACL HLT), Student Research Workshop, Los Angeles, California. Ma, T., & Wan, X. (2010). Opinion target extraction in Chinese news comments. Proceedings of the International Conference on Computational Linguistics (COLING) Poster Volume, Beijing, China. Moghaddam, S., & Ester, M. (2012). Aspect-based opinion mining from online reviews. Tutorial at Special Interest Group on Information Retrieval (SIGIR), Portland, Oregon. Retrieved March 10, 2014, from https://www.cs.sfu.ca/~ester/papers/SIGIR2012.Tutorial.Final.pdf Na, J. C., Sui, H., Khoo, C., Chan, S., & Zhou, Y. (2004). Effectiveness of simple linguistic processing in automatic sentiment classification of product reviews. Proceedings of the 8th International Society for Knowledge Organization Conference (ISKO), London, United Kindom. Nasukawa, T., & Yi, J. (2003). Sentiment analysis: Capturing favorability using natural language processing. Proceedings of the International Conference on Knowledge Capture (K-CAP), Sanibel Island, Florida. Pak, A., & Paroubek, P. (2010). Twitter as a corpus for sentiment analysis and opinion mining. Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC), Valletta, Malta. Pang, B., & Lee, L. (2008a). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2), 1-135. Pang, B., & Lee, L. (2008b). Using very simple statistics for review search: An exploration. Proceedings of the International Conference on Computational Linguistics (COLING) Poster Paper, Manchester, United Kindom. Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, Pennsylvania. Popescu, A. M., & Etzioni, O. (2005). Extracting product features and opinions from reviews. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Vancouver, Canada. Su, Q., Xu, X., Guo, H., Guo, Z., Wu, X., Zhang, X., & Swen, B. (2008). Hidden sentiment association in Chinese web opinion mining. Proceedings of the 17th International Conference on World Wide Web, Beijing, China. Turney, P. D. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, Pennsylvania. Turney, P. D., & Littman, M. L. (2002). Unsupervised learning of semantic orientation from a hundred-billion-word corpus (Technical Report ERB-1094). Ottawa, Canada: National Research Council Canada. Wiebe, J., Bruce, R. F., & O’Hara, T. P. (1999). Development and use of a gold-standard data set for subjectivity classifications. Proceedings of the Association for Computational Linguistics (ACL), College Park, Maryland. Yu, H., & Hatzivassiloglou, V. (2003). Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Sapporo, Japan.
描述	碩士國立政治大學資訊管理研究所 101356016 102
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0101356016
資料類型	thesis

dc.contributor.advisor	楊亨利	zh_TW
dc.contributor.advisor	Yang, Heng Li	en_US
dc.contributor.author (Authors)	黃泓彰	zh_TW
dc.contributor.author (Authors)	Huang, Hung Chang	en_US
dc.creator (作者)	黃泓彰	zh_TW
dc.creator (作者)	Huang, Hung Chang	en_US
dc.date (日期)	2013	en_US
dc.date.accessioned	25-Aug-2014 15:16:17 (UTC+8)	-
dc.date.available	25-Aug-2014 15:16:17 (UTC+8)	-
dc.date.issued (上傳時間)	25-Aug-2014 15:16:17 (UTC+8)	-
dc.identifier (Other Identifiers)	G0101356016	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/69195	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊管理研究所	zh_TW
dc.description (描述)	101356016	zh_TW
dc.description (描述)	102	zh_TW
dc.description.abstract (摘要)	本研究嘗試建置一個包含兩個主要功能的系統，分別是網路評價搜尋以及情感分類。在網路評價搜尋的部份，我們使用Google搜尋並蒐集一攜帶型智慧裝置（智慧型手機、平板電腦與筆記型電腦）的網路評價搜尋結果；情感分類的部分則是將搜尋結果依照對該產品的意見分類為，共有正面／負面／中立、正面／負面、正面／非正面，以及負面／非負面等四種分類方式。為了建置此系統，我們首先從知名的網路論壇Mobile01和批踢踢蒐集和攜帶型智慧裝置有關的網路文章以及產品名稱，接著以人工的方式標記每篇文章，以及部分文章中的句子的情感。本研究設計了兩個層次的情感分類實驗，我們首先從語句層次出發，以監督式機器學習法訓練將句子分為正面／負面／中立等三個類別的分類模型後，再進入文章層次，將句子的意見彙整，並同樣以監督式機器學習法訓練四種不同文章層次的分類模型：正面／負面／中立、正面／負面、正面／非正面，以及負面／非負面。我們分別選出四種分類實驗中表現最佳的模型，並用於系統建置，其中表現最佳的是分類為正面／負面的分類模型，平均的F-measure為0.87；其次是分類為負面／非負面的模型，對負面類別的F-measure為0.83；接著是分類為正面／非正面的模型，對正面類別的F-measure為0.81；表現最差的是正面／負面／中立的分類，平均的F-measure為0.77。在正面／負面分類的準確率上，本研究的表現並不壞於過去以英文為主要語言的相關研究。最後，我們也以過去不經過語句層次的分類方法進行實驗並比較，其結果發現經過語句層次的情感分類比不經過語句層次的情感分類較佳。	zh_TW
dc.description.abstract (摘要)	In this research, we implemented a system that retrieves the search results of mobile phones, tablets, and notebooks from Google, and then classifies them as: (1) positive, negative, or neutral, (2) positive or negative, (3) positive or non-positive, (4) negative or non-negative. To build this system, first we collected some documents about mobile phones, tablets, and notebooks on two popular web forums: mobile01.com and ptt.cc. Next, a sentiment label (positive, negative, or neutral) is attached to each document and each sentence of these documents. We designed a two-level supervised sentiment classification experiment. At sentence level, we trained classifiers that classify sentences as positive, negative, or neutral. The best sentence classifier was then used at document level. At document level, the sentiment labels of the sentences in documents are used. We trained classifiers in four different classification problems: (1) positive, negative, or neutral, (2) positive vs. negative, (3) positive vs. non-positive, (4) negative vs. non-negative. The best is the second classifier with an average F-measure of 0.87. The next is the fourth classifier with an F-measure of 0.83 on negative class, and then comes with the third classifier with an F-measure of 0.81 on positive class. The last is the first classifier with an average F-measure of 0.77. Our accuracy is not worse than the past English study on the classification of positive vs. negative. Finally, we conducted another classification experiment using document-level-only classification method, and the results showed that our two-level sentiment classification (first sentence level, then document level) outperforms document-level-only sentiment classification.	en_US
dc.description.tableofcontents	壹、緒論 1 一、研究背景及動機 1 二、研究目的與問題 3 三、研究方法與架構 4 四、研究限制 4 貳、文獻探討 6 一、意見探勘與情感分析 6 二、意見探勘與搜尋系統的相關研究 15 參、實驗資料蒐集與處理 17 一、攜帶型智慧裝置名稱蒐集 17 二、語料庫蒐集 19 三、網路文本處理 22 四、人工標記 30 五、情感詞庫的蒐集與建立 32 肆、情感分類實驗 39 一、語句層次特徵抽取 39 二、語句層次分類實驗 44 三、文件層次特徵抽取 49 四、文件層次情感分類 52 伍、雛型系統建置 71 一、系統架構與建置環境 71 二、系統處理流程與介面 72 三、系統處理效率 74 陸、結論 76 一、研究成果 76 二、研究貢獻 77 三、未來研究方向 77 參考文獻 79	zh_TW
dc.format.extent	1905029 bytes	-
dc.format.mimetype	application/pdf	-
dc.language.iso	en_US	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0101356016	en_US
dc.subject (關鍵詞)	意見探勘	zh_TW
dc.subject (關鍵詞)	情感分析	zh_TW
dc.subject (關鍵詞)	情感分類	zh_TW
dc.subject (關鍵詞)	網路評價	zh_TW
dc.subject (關鍵詞)	Opinion mining	en_US
dc.subject (關鍵詞)	Sentiment analysis	en_US
dc.subject (關鍵詞)	Sentiment classification	en_US
dc.subject (關鍵詞)	Web opinion	en_US
dc.title (題名)	網路評價搜尋結果的正負意見分類系統	zh_TW
dc.title (題名)	A sentiment classification system on search results of web opinions	en_US
dc.type (資料類型)	thesis	en
dc.relation.reference (參考文獻)	張育蓉（2012）。使用情緒分析於圖書館使用者滿意度評估之研究。國立中興大學圖書資訊學研究所未出版碩士學位論文，台灣，台中。張慧美（2006）。網路語言之語言風格研究。彰化師大國文學誌，13，331-359。梅家駒、竺一鳴、高蘊琦、殷鴻翔（1982）。同義詞詞林。上海：上海辭書出版社。黃泓彰、楊亨利（2013）。運用機器學習與語言模型的網路用語轉譯系統。第19屆資訊管理暨實務研討會（IMP2013），台灣，台中。 Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 1-27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Church, K. W., & Hanks, P. (1989). Word association norms, mutual information and lexicography. Proceedings of the 27th Annual Conference of the ACL. New Brunswick, New Jersey. Das, S., & Chen, M. (2001). Yahoo! for Amazon: Extracting market sentiment from stock message boards. Proceedings of the Asia Pacific Finance Association Annual Conference (APFA), Bangkok, Thailand. Dave, K., Lawrence, S., & Pennock, D. M. (2003). Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. Proceedings of the 12th International World Wide Web Conference (WWW), Budapest, Hungary. Demartini, G., & Siersdorfer, S. (2010). Dear search engine: what`s your opinion about...?: Sentiment analysis for semantic enrichment of web search results. Proceedings of the 3rd Semantic Search Workshop, Raleigh, North Carolina. Ding, X., Liu, B., & Yu, P. S. (2008). A holistic lexicon-based approach to opinion mining. Proceedings of the Conference on Web Search and Web Data Mining (WSDM), Stanford, California. Eirinaki, M., Pisal, S., & Singh, J. (2012). Feature-based opinion mining and ranking. Journal of Computer and System Sciences, 78(4), 1175-1184. Ganapathibholta, M., & Liu, B. (2008). Mining opinions in comparative sentences. Proceedings of the 22nd International Conference on Computational Linguistics (COLING), Manchester, United Kindom. Kouloumpis, E., Wilson, T., & Moore, J. (2011). Twitter sentiment analysis: The good the bad and the OMG! Proceedings of the 5th International AAAI Conference on Weblogs and Social Media, Barcelona, Spain. Ku, L. W., & Chen, H. H. (2007). Mining opinions from the web: Beyond relevance retrieval. Journal of American Society for Information Science and Technology, Special Issue on Mining Web Resources for Enhancing Information Retrieval, 58(12), 1838-1850. Software available at http://nlg18.csie.ntu.edu.tw:8080/opinion/index.html Lin, C. J., & Chao, P. H. (2010). Tourism-related opinion detection and tourist-attraction target identification. International Journal of Computational Linguistics and Chinese Language Processing, 15(1), 37-60. Liu, B. (2010). Sentiment Analysis and Subjectivity. In N. Indurkhya & F.J. Damerau (Eds.), Handbook of Natural Language Processing, (2nd ed.). Boca Raton: Chapman & Hall/CRC. Liu, B. (2012). Sentiment analysis and opinion mining. Morgan & Claypool Publishers. Liu, J., & Seneff, S. (2009). Review sentiment scoring via a parse-and-paraphrase paradigm. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Singapore, Singapore. Lu, B. (2010). Identifying opinion holders and targets with dependency parser in Chinese news texts. Proceedings of the Human Language Technologies: the 2010 Annual Conference of the North American Chapter of the ACL (NAACL HLT), Student Research Workshop, Los Angeles, California. Ma, T., & Wan, X. (2010). Opinion target extraction in Chinese news comments. Proceedings of the International Conference on Computational Linguistics (COLING) Poster Volume, Beijing, China. Moghaddam, S., & Ester, M. (2012). Aspect-based opinion mining from online reviews. Tutorial at Special Interest Group on Information Retrieval (SIGIR), Portland, Oregon. Retrieved March 10, 2014, from https://www.cs.sfu.ca/~ester/papers/SIGIR2012.Tutorial.Final.pdf Na, J. C., Sui, H., Khoo, C., Chan, S., & Zhou, Y. (2004). Effectiveness of simple linguistic processing in automatic sentiment classification of product reviews. Proceedings of the 8th International Society for Knowledge Organization Conference (ISKO), London, United Kindom. Nasukawa, T., & Yi, J. (2003). Sentiment analysis: Capturing favorability using natural language processing. Proceedings of the International Conference on Knowledge Capture (K-CAP), Sanibel Island, Florida. Pak, A., & Paroubek, P. (2010). Twitter as a corpus for sentiment analysis and opinion mining. Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC), Valletta, Malta. Pang, B., & Lee, L. (2008a). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2), 1-135. Pang, B., & Lee, L. (2008b). Using very simple statistics for review search: An exploration. Proceedings of the International Conference on Computational Linguistics (COLING) Poster Paper, Manchester, United Kindom. Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, Pennsylvania. Popescu, A. M., & Etzioni, O. (2005). Extracting product features and opinions from reviews. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Vancouver, Canada. Su, Q., Xu, X., Guo, H., Guo, Z., Wu, X., Zhang, X., & Swen, B. (2008). Hidden sentiment association in Chinese web opinion mining. Proceedings of the 17th International Conference on World Wide Web, Beijing, China. Turney, P. D. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, Pennsylvania. Turney, P. D., & Littman, M. L. (2002). Unsupervised learning of semantic orientation from a hundred-billion-word corpus (Technical Report ERB-1094). Ottawa, Canada: National Research Council Canada. Wiebe, J., Bruce, R. F., & O’Hara, T. P. (1999). Development and use of a gold-standard data set for subjectivity classifications. Proceedings of the Association for Computational Linguistics (ACL), College Park, Maryland. Yu, H., & Hatzivassiloglou, V. (2003). Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Sapporo, Japan.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM