應用文字探勘文件分類分群技術於股價走勢預測之研究─以台灣股票市場為例 | Publication

Publications-Theses

Article View/Open

pdf(595)

Publication Export

Google Scholar^TM

題名	應用文字探勘文件分類分群技術於股價走勢預測之研究─以台灣股票市場為例 A Study of Stock Price Prediction with Text Mining, Classification and Clustering Techniques in Taiwan Stock Market
作者	薛弘業 Hsueh, Hung Yeh
貢獻者	楊建民 Yang, Jiann Min 薛弘業 Hsueh, Hung Yeh
關鍵詞	個股新聞文字探勘 kNN 2-way kNN 股價轉折點 Stock news Text Mining kNN 2-way kNN Reversal Points of Stock Price
日期	2012
上傳時間	2-Sep-2013 16:01:44 (UTC+8)
摘要	本研究欲探究個股新聞影響台灣股票市場之關係，透過蒐集宏達電、台積電與鴻海等三間上市公司從2012年6月至2013年5月的歷史交易資料和個股新聞，使用文字探勘技術找出各新聞內容的特徵，再透過歷史資料、技術分析指標與kNN和2-way kNN演算法將新聞先做分類後分群，建立預測模型，分析新聞對股價漲跌的影響與程度，以及漲跌幅度較高之群集與股價漲跌和轉折的關係。研究結果發現，加入技術分析指標後能夠提升分類的準確率，而漲跌類別內的分群能夠界定各群集與股價漲跌之間的關係，且漲跌幅度較高之群集的分析則能大幅提升投資準確率至80%左右，而股價轉折點之預測則能提供一個明確的投資進場時間點，並確保當投資人依照此預測模型的結果進行7交易日投資時，可以在風險極低的前提下，穩當且迅速的獲取2.82%至22.03%不等的投資報酬。 This study investigated the relation that the stock news effect on Taiwan Stock Market. Through collected the historical transaction data and stock news from July, 2012 to May, 2013, and use text mining、kNN Classification and 2-Way kNN Clustering technique analyzing the stock news, build a forecast model to analyze the degree of news effect on the stock price, and find the relation between the cluster which has great degree and the reversal points of stock price. The result shows that using the change range and Technical Indicator rise classification’s accuracy, and clustering in the ”up” group and “down” group can identify the range stock price move, and rise the invested accuracy up to about 80 percent. The forecast of reversal points of stock price offers a specific time to invest, and insure the investors who execute a 7 trading day investment depend on this model can get 2.82 to 22.03 percent return reliably and quickly with low risk.
參考文獻	中文文獻 1. Ham, J. & Kamber, M.(2003)，資料採礦-概念與技術(曾龍譯)，維科出版。(原著出版年：2000年) 2. 王慧雯(1998)，晚報推薦資訊對台灣股票市場影響之研究，國立成功大學工業管理研究所碩士論文。 3. 李惠弘(1985)，台灣股票市場弱式效率性之實證研究，國立中山大學企業管理研究所未出版碩士論文。 4. 杜金龍(2008)，最新技術指標讚台灣股市應用的訣竅(增訂三版)，財信出版。 5. 杜雅建(1993)，內部關係人鉅額持股轉讓交易及申報對股價影響之實證研究，國立中山大學企業管理研究所碩士論文。 6. 吳漢瑞(2011)，應用文字探勘技術於台灣上市公司重大訊息對股價影響之研究,國立政治大學資訊管理學系研究所碩士論文。 7. 吳真慧(2000)，專業性報紙頭版新聞對股票價量的影響，中原大學會計系碩士論文。 8. 吳振和(2011)，應用文件探勘技術於概念股股價共同移動之研究，國立政治大學資訊管理研究所碩士論文。 9. 林麗珍(2005)，以不同進場點與黃金分割率所形成之濾嘴比率檢驗台灣股票市場效率性，國立中正大學國際經濟研究所碩士論文。 10. 林國興(2002)，媒體資訊揭露對於股票價格影響之實證研究-----以工商時報、經濟日報所揭露之上市公司訊息為例，南華大學經濟學研究所碩士論文。 11. 柯禹伸(2011)，使用文字探勘技術預測股票漲跌之研究，北臺灣科學技術學院電子商務研究所碩士論文。 12. 倪晶瑛(1990)，股票交易成本與股價之相關性研究─以台灣股票市場為實證，國立中興大學企業管理研究所碩士論文。 13. 黃馨瑩、楊建民、李耀中(2009)，財經新聞探勘影響股價趨勢之探討-以跨兩岸面板產業為例。 14. 張金桂(1980)，台灣股票市場股價行為之實證研究，大同工學院事業經營研究所碩士論文。 15. 陳惠純(1998)，台灣店頭市場效率性檢定，逢甲大學經濟學研究所碩士論文。 16. 陳鴻基、嚴紀中(2004)，管理資訊系統，雙葉書廊。 17. 陳俊達(2007)，以文件分類技術預測股價趨勢，國立政治大學資訊科學學系碩士論文。 18. 陳尚群(1989)，從股票本益檢定半強式效率資本市場－以台灣股票市場為例，國立台灣大學商學研究所碩士論文。 19. 曾國傑(2012)，運用kNN文字探勘分析智慧型終端App 群集之研究，國立政治大學資訊管理研究所碩士論文。 20. 曾憲雄、蔡秀滿、蘇東興、曾秋蓉、王慶堯(2005)，資料探勘(Data Mining)，旗標出版社。 21. 喻欣凱(2007)，運用支援向量機與文字探勘於股價漲跌趨勢之預測，輔仁大學資訊管理學系碩士論文。 22. 楊淳如(2009)，運用向量誤差修正模型探討產業與大盤間資訊傳遞速度，政大金融研究所碩士論文。 23. 鄭雅仁(1994)，台灣股市弱式效率市場之再驗證，國立台灣大學財務金融學系碩士論文。 24. 廖清達(1998)，綜合性技術指標的有效性驗證—兼論台灣股票市場的弱勢效率性假說，國立東華大學國際經濟研究所。 25. 廖述賢、溫智皓(2009)，資料採礦與商業智慧，雙葉書廊。 26. 蔡瀚賢(2000)，成交量放大訊號及技術指標綜合策略在台灣股市之實證研究，國立成功大學企業管理學系碩士論文。 27. 歐智民(2011)，整合文件探勘與類神經網路預測模型之研究-以財經事件線索預測台灣股市為例，國立政治大學資訊管理研究所碩士論文。 28. 戴尚學(2003)，運用事件偵測與追蹤技術於中文多文件摘要之研究，國立雲林科技大學資訊管理系碩士論文。 29. 盧廷當(1996)，以參考成交量之濾嘴法則檢定資本市場弱勢效率性—台灣股票市場之實證研究，國立交通大學管理科學研究所碩士論文。 30. 賴勝章(1990)，台灣股票市場若是效率性實證研究─以技術分析檢驗，國立臺灣大學商學院研究所碩士論文。英文文獻 1. Ahmad, K., Oliveira, P., Manomaisupat, P., Casey, M. & Taskay, T. (2002).Description of Events: An Analysis of Keywords and Indexical Names, ThirdInternational Conference on Language Resources and Evaluation, LREC 2002:Workshop on Event Modelling for Multilingual Document Linking, p29-35. 2. Berry, M.J.A., and Linoff(1997), L. “Data Mining Techniques: For Marketing, Sales, and Customer Support,” John Wiley & Sons, Inc. 3. Chen, K. J., Kiu, S. H. (1992). Word Identification for Mandarin Chinese Sentences. Fifth International Conference on Computational Linguistics,101-107. 4. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze(2008), Introduction to Information Retrieval, Cambridge University Press. 5. Elliott, R. K., and P. D. Jacobson., (1994).“Costsand Benefits of Business InformationDisclosure”, Accounting Horizons, Vol. 8,80-96. 6. Fama, E. (1970). “Efficient Capital Markets: A Review of Theory and Empirical Work”，Journal of Finance, Vol.25, p.383-417 7. Granville, Joseph E. (1960). A Strategy of Daily Stock Market Timing for Maximum Profit, Englewood Cliffs: Prentice-Hall, Inc. 8. Ham, J. & Kamber, M. (2001).Data Mining: Concepts and Techniuqes. Morgan Kaufmann Publishers, San Francisco, CA 9. Ham, J. & Kamber, M. (2006). Data Mining: Concepts and Techniques, 2nd，Morgan Kaufmann Publishers 10. Hand,D.J.,Blunt,G., Kelly, M.G. & Adams,N.M.(2000).Data Mining for fun and profit.Statistical Science,15(2),111-131 11. Keim, D. A., Panse, C., Sips, M., North, S. C.(2004). Pixel based visual data mining of geo-spatial data, Computers & Graphics, Vol 28, 327–344. 12. L da F Costa, P R Villas Boas, F N Silva and F A Rodrigues. (2010). A pattern recognition approach to complex networks, Journal of Statistical Mechanics: Theory and Experiment Vol 2010, Nov 2010. 13. Lavrenko, V., M. Schmill, D. Lawrie, P. Ogilvie, D. Jensen, and J. Allan (2000). “Mining of concurrent text and time series. ” In: Proceedings of the 6th international conference on knowledge discovery and data mining, pp.37-44. 14. Mittermayer, M. A. (2004). “Forecasting intraday stock price trends with text mining techniques. ” Proceedings of the 37th Hawaii international conference on system sciences, pp.64-73. 15. Nie, Jian-Yun, Brisebois, Martin & Ren, Xiaobo (1996). On Chinese Text Retrieval. Conference Proceedings of SIGIR, pp.225-233. 16. Peng, L. and Xiong, W. (2006). “Investor Attention, Overconfidence and Category Learning,” Journal of Financial Economics 80, 563-602. 17. Samuelson, Paul A. (1965). Proof That Properly Anticipated Prices Fluctuate Randomly ,. Industrial Management Review, Vol.6, p.41-49. 18. Salton, G., Wong, A. & Yang, C. S. (1975). A Vector Space Model for Automatic Indexing. 19. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), p1-47. 20. Sproat, R, Shih , C., (1990). A Statistical Method for Finding Word Boundaries in Chinese Text. Computer Processing of Chinese and Oriental Languages,336-351. 21. Sullivan, D. (2001). Document warehousing and text mining: techniques for improving business operations, marketing and sales. Wiley, New York. 22. Yang , Yiming, Lin, Xin (1999). A Re-examination of Text Categorization Methods. Proceedings of the 22nd Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pp.12-29. 23. Yang, Y., Ault, T., & Pierce, T. (2000). Improving Text Categorization Methods for Event Tracking. Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.65-72.
描述	碩士國立政治大學資訊管理研究所 100356031 101
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0100356031
資料類型	thesis

dc.contributor.advisor	楊建民	zh_TW
dc.contributor.advisor	Yang, Jiann Min	en_US
dc.contributor.author (Authors)	薛弘業	zh_TW
dc.contributor.author (Authors)	Hsueh, Hung Yeh	en_US
dc.creator (作者)	薛弘業	zh_TW
dc.creator (作者)	Hsueh, Hung Yeh	en_US
dc.date (日期)	2012	en_US
dc.date.accessioned	2-Sep-2013 16:01:44 (UTC+8)	-
dc.date.available	2-Sep-2013 16:01:44 (UTC+8)	-
dc.date.issued (上傳時間)	2-Sep-2013 16:01:44 (UTC+8)	-
dc.identifier (Other Identifiers)	G0100356031	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/59299	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊管理研究所	zh_TW
dc.description (描述)	100356031	zh_TW
dc.description (描述)	101	zh_TW
dc.description.abstract (摘要)	本研究欲探究個股新聞影響台灣股票市場之關係，透過蒐集宏達電、台積電與鴻海等三間上市公司從2012年6月至2013年5月的歷史交易資料和個股新聞，使用文字探勘技術找出各新聞內容的特徵，再透過歷史資料、技術分析指標與kNN和2-way kNN演算法將新聞先做分類後分群，建立預測模型，分析新聞對股價漲跌的影響與程度，以及漲跌幅度較高之群集與股價漲跌和轉折的關係。研究結果發現，加入技術分析指標後能夠提升分類的準確率，而漲跌類別內的分群能夠界定各群集與股價漲跌之間的關係，且漲跌幅度較高之群集的分析則能大幅提升投資準確率至80%左右，而股價轉折點之預測則能提供一個明確的投資進場時間點，並確保當投資人依照此預測模型的結果進行7交易日投資時，可以在風險極低的前提下，穩當且迅速的獲取2.82%至22.03%不等的投資報酬。	zh_TW
dc.description.abstract (摘要)	This study investigated the relation that the stock news effect on Taiwan Stock Market. Through collected the historical transaction data and stock news from July, 2012 to May, 2013, and use text mining、kNN Classification and 2-Way kNN Clustering technique analyzing the stock news, build a forecast model to analyze the degree of news effect on the stock price, and find the relation between the cluster which has great degree and the reversal points of stock price. The result shows that using the change range and Technical Indicator rise classification’s accuracy, and clustering in the ”up” group and “down” group can identify the range stock price move, and rise the invested accuracy up to about 80 percent. The forecast of reversal points of stock price offers a specific time to invest, and insure the investors who execute a 7 trading day investment depend on this model can get 2.82 to 22.03 percent return reliably and quickly with low risk.	en_US
dc.description.tableofcontents	第一章緒論 1 第一節研究動機 1 第二節研究目的 3 第二章文獻探討 4 第一節效率市場 4 第二節台灣股票市場 5 2.2.1 台灣股票市場現況 5 2.2.2 淺碟型市場 5 2.2.3 台灣股票市場之效率性相關研究 6 第三節新聞與股價之關係 8 第四節技術指標 9 第五節文字探勘 10 2.5.1 斷詞 11 2.5.2 特徵選取 12 2.5.3 向量空間模型與相似度計算 13 2.5.4 分類與分群技術 14 2.5.5 分類與分群結果評估 17 第六節小結 18 第三章研究方法與設計 19 第一節研究架構 19 第二節研究設計 21 3.2.1 資料來源 21 3.2.2 資料前處理模組 21 3.2.3 漲跌預測模組 24 3.2.4 分類結果評估 27 3.2.5 分群結果評估 28 3.2.6 前測實驗 29 3.2.7 預測個股趨勢 34 第四章研究結果 35 第一節實驗一：新聞分類 36 第二節實驗二：新聞分群 40 4.2.1 各公司之上漲與下跌類別內分群 40 4.2.2 分類與分群結果之比較 46 第三節實驗三：股價轉折點預測 48 第五章結論與建議 53 第一節研究結論與貢獻 53 第二節未來方向與建議 55 參考文獻 56 附錄：分類k值測試參數 62	zh_TW
dc.format.extent	1043546 bytes	-
dc.format.mimetype	application/pdf	-
dc.language.iso	en_US	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0100356031	en_US
dc.subject (關鍵詞)	個股新聞	zh_TW
dc.subject (關鍵詞)	文字探勘	zh_TW
dc.subject (關鍵詞)	kNN	zh_TW
dc.subject (關鍵詞)	2-way kNN	zh_TW
dc.subject (關鍵詞)	股價轉折點	zh_TW
dc.subject (關鍵詞)	Stock news	en_US
dc.subject (關鍵詞)	Text Mining	en_US
dc.subject (關鍵詞)	kNN	en_US
dc.subject (關鍵詞)	2-way kNN	en_US
dc.subject (關鍵詞)	Reversal Points of Stock Price	en_US
dc.title (題名)	應用文字探勘文件分類分群技術於股價走勢預測之研究─以台灣股票市場為例	zh_TW
dc.title (題名)	A Study of Stock Price Prediction with Text Mining, Classification and Clustering Techniques in Taiwan Stock Market	en_US
dc.type (資料類型)	thesis	en
dc.relation.reference (參考文獻)	中文文獻 1. Ham, J. & Kamber, M.(2003)，資料採礦-概念與技術(曾龍譯)，維科出版。(原著出版年：2000年) 2. 王慧雯(1998)，晚報推薦資訊對台灣股票市場影響之研究，國立成功大學工業管理研究所碩士論文。 3. 李惠弘(1985)，台灣股票市場弱式效率性之實證研究，國立中山大學企業管理研究所未出版碩士論文。 4. 杜金龍(2008)，最新技術指標讚台灣股市應用的訣竅(增訂三版)，財信出版。 5. 杜雅建(1993)，內部關係人鉅額持股轉讓交易及申報對股價影響之實證研究，國立中山大學企業管理研究所碩士論文。 6. 吳漢瑞(2011)，應用文字探勘技術於台灣上市公司重大訊息對股價影響之研究,國立政治大學資訊管理學系研究所碩士論文。 7. 吳真慧(2000)，專業性報紙頭版新聞對股票價量的影響，中原大學會計系碩士論文。 8. 吳振和(2011)，應用文件探勘技術於概念股股價共同移動之研究，國立政治大學資訊管理研究所碩士論文。 9. 林麗珍(2005)，以不同進場點與黃金分割率所形成之濾嘴比率檢驗台灣股票市場效率性，國立中正大學國際經濟研究所碩士論文。 10. 林國興(2002)，媒體資訊揭露對於股票價格影響之實證研究-----以工商時報、經濟日報所揭露之上市公司訊息為例，南華大學經濟學研究所碩士論文。 11. 柯禹伸(2011)，使用文字探勘技術預測股票漲跌之研究，北臺灣科學技術學院電子商務研究所碩士論文。 12. 倪晶瑛(1990)，股票交易成本與股價之相關性研究─以台灣股票市場為實證，國立中興大學企業管理研究所碩士論文。 13. 黃馨瑩、楊建民、李耀中(2009)，財經新聞探勘影響股價趨勢之探討-以跨兩岸面板產業為例。 14. 張金桂(1980)，台灣股票市場股價行為之實證研究，大同工學院事業經營研究所碩士論文。 15. 陳惠純(1998)，台灣店頭市場效率性檢定，逢甲大學經濟學研究所碩士論文。 16. 陳鴻基、嚴紀中(2004)，管理資訊系統，雙葉書廊。 17. 陳俊達(2007)，以文件分類技術預測股價趨勢，國立政治大學資訊科學學系碩士論文。 18. 陳尚群(1989)，從股票本益檢定半強式效率資本市場－以台灣股票市場為例，國立台灣大學商學研究所碩士論文。 19. 曾國傑(2012)，運用kNN文字探勘分析智慧型終端App 群集之研究，國立政治大學資訊管理研究所碩士論文。 20. 曾憲雄、蔡秀滿、蘇東興、曾秋蓉、王慶堯(2005)，資料探勘(Data Mining)，旗標出版社。 21. 喻欣凱(2007)，運用支援向量機與文字探勘於股價漲跌趨勢之預測，輔仁大學資訊管理學系碩士論文。 22. 楊淳如(2009)，運用向量誤差修正模型探討產業與大盤間資訊傳遞速度，政大金融研究所碩士論文。 23. 鄭雅仁(1994)，台灣股市弱式效率市場之再驗證，國立台灣大學財務金融學系碩士論文。 24. 廖清達(1998)，綜合性技術指標的有效性驗證—兼論台灣股票市場的弱勢效率性假說，國立東華大學國際經濟研究所。 25. 廖述賢、溫智皓(2009)，資料採礦與商業智慧，雙葉書廊。 26. 蔡瀚賢(2000)，成交量放大訊號及技術指標綜合策略在台灣股市之實證研究，國立成功大學企業管理學系碩士論文。 27. 歐智民(2011)，整合文件探勘與類神經網路預測模型之研究-以財經事件線索預測台灣股市為例，國立政治大學資訊管理研究所碩士論文。 28. 戴尚學(2003)，運用事件偵測與追蹤技術於中文多文件摘要之研究，國立雲林科技大學資訊管理系碩士論文。 29. 盧廷當(1996)，以參考成交量之濾嘴法則檢定資本市場弱勢效率性—台灣股票市場之實證研究，國立交通大學管理科學研究所碩士論文。 30. 賴勝章(1990)，台灣股票市場若是效率性實證研究─以技術分析檢驗，國立臺灣大學商學院研究所碩士論文。英文文獻 1. Ahmad, K., Oliveira, P., Manomaisupat, P., Casey, M. & Taskay, T. (2002).Description of Events: An Analysis of Keywords and Indexical Names, ThirdInternational Conference on Language Resources and Evaluation, LREC 2002:Workshop on Event Modelling for Multilingual Document Linking, p29-35. 2. Berry, M.J.A., and Linoff(1997), L. “Data Mining Techniques: For Marketing, Sales, and Customer Support,” John Wiley & Sons, Inc. 3. Chen, K. J., Kiu, S. H. (1992). Word Identification for Mandarin Chinese Sentences. Fifth International Conference on Computational Linguistics,101-107. 4. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze(2008), Introduction to Information Retrieval, Cambridge University Press. 5. Elliott, R. K., and P. D. Jacobson., (1994).“Costsand Benefits of Business InformationDisclosure”, Accounting Horizons, Vol. 8,80-96. 6. Fama, E. (1970). “Efficient Capital Markets: A Review of Theory and Empirical Work”，Journal of Finance, Vol.25, p.383-417 7. Granville, Joseph E. (1960). A Strategy of Daily Stock Market Timing for Maximum Profit, Englewood Cliffs: Prentice-Hall, Inc. 8. Ham, J. & Kamber, M. (2001).Data Mining: Concepts and Techniuqes. Morgan Kaufmann Publishers, San Francisco, CA 9. Ham, J. & Kamber, M. (2006). Data Mining: Concepts and Techniques, 2nd，Morgan Kaufmann Publishers 10. Hand,D.J.,Blunt,G., Kelly, M.G. & Adams,N.M.(2000).Data Mining for fun and profit.Statistical Science,15(2),111-131 11. Keim, D. A., Panse, C., Sips, M., North, S. C.(2004). Pixel based visual data mining of geo-spatial data, Computers & Graphics, Vol 28, 327–344. 12. L da F Costa, P R Villas Boas, F N Silva and F A Rodrigues. (2010). A pattern recognition approach to complex networks, Journal of Statistical Mechanics: Theory and Experiment Vol 2010, Nov 2010. 13. Lavrenko, V., M. Schmill, D. Lawrie, P. Ogilvie, D. Jensen, and J. Allan (2000). “Mining of concurrent text and time series. ” In: Proceedings of the 6th international conference on knowledge discovery and data mining, pp.37-44. 14. Mittermayer, M. A. (2004). “Forecasting intraday stock price trends with text mining techniques. ” Proceedings of the 37th Hawaii international conference on system sciences, pp.64-73. 15. Nie, Jian-Yun, Brisebois, Martin & Ren, Xiaobo (1996). On Chinese Text Retrieval. Conference Proceedings of SIGIR, pp.225-233. 16. Peng, L. and Xiong, W. (2006). “Investor Attention, Overconfidence and Category Learning,” Journal of Financial Economics 80, 563-602. 17. Samuelson, Paul A. (1965). Proof That Properly Anticipated Prices Fluctuate Randomly ,. Industrial Management Review, Vol.6, p.41-49. 18. Salton, G., Wong, A. & Yang, C. S. (1975). A Vector Space Model for Automatic Indexing. 19. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), p1-47. 20. Sproat, R, Shih , C., (1990). A Statistical Method for Finding Word Boundaries in Chinese Text. Computer Processing of Chinese and Oriental Languages,336-351. 21. Sullivan, D. (2001). Document warehousing and text mining: techniques for improving business operations, marketing and sales. Wiley, New York. 22. Yang , Yiming, Lin, Xin (1999). A Re-examination of Text Categorization Methods. Proceedings of the 22nd Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pp.12-29. 23. Yang, Y., Ault, T., & Pierce, T. (2000). Improving Text Categorization Methods for Event Tracking. Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.65-72.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM