Publications-Theses

題名 以文件分類技術預測股價趨勢
Predicting Trends of Stock Prices with Text Classification Techniques
作者 陳俊達
Chen, Jiun-da
貢獻者 王台平<br>劉昭麟
Wang, Tai-Ping<br>Liu, Chao-Lin
陳俊達
Chen, Jiun-da
關鍵詞 股價預測
文字探勘
簡易貝氏模型
k最近鄰居模型
混合模型
Stock Price Prediction
text mining
naïve Bayesian models
k-nearest neighbors models
hybrid models
日期 2006
上傳時間 17-Sep-2009 14:02:52 (UTC+8)
摘要 股價的漲跌變化是由於證券市場中眾多不同投資人及其投資決策後所產生的結果。然而,影響股價變動的因素眾多且複雜,新聞也屬於其中一種,新聞事件不但是投資人用來得知該股票上市公司的相關營運資訊的主要媒介,同時也是影響投資人決定或變更其股票投資策略的主要因素之一。本研究提出以新聞文件做為股價漲跌預測系統的基礎架構,透過文字探勘技術及分類技術來建置出能預測當日個股收盤股價漲跌趨勢之系統。
本研究共提出三種分類模型,分別是簡易貝氏模型、k最近鄰居模型以及混合模型,並設計了三組實驗,分別是分類器效能的比較、新聞樣本資料深度的比較、以及新聞樣本資料廣度的比較來檢驗系統的預測效能。實驗結果顯示,本研究所提出的分類模型可以有效改善相關研究中整體正確率高但各個類別的預測效能卻差異甚大的情況。而對於影響投資人獲利與否的關鍵類別"漲"及類別"跌"的平均預測效能上,本研究所提出的這三種分類模型亦同時具有良好的成效,可以做為投資人進行投資決策時的有效參考依據。
Stocks` closing price levels can provide hints about investors` aggregate demands and aggregate supplies in the stock trading markets. If the level of a stock`s closing price is higher than its previous closing price, it indicates that the aggregate demand is stronger than the aggregate supply in this trading day. Otherwise, the aggregate demand is weaker than the aggregate supply. It would be profitable if we can predict the individual stock`s closing price level. For example, in case that one stock`s current price is lower than its previous closing price. We can do the proper strategies(buy or sell) to gain profit if we can predict the stock`s closing price level correctly in advance.
In this thesis, we propose and evaluate three models for predicting individual stock`s closing price in the Taiwan stock market. These models include a naïve Bayes model, a k-nearest neighbors model, and a hybrid model. Experimental results show the proposed methods perform better than the NewsCATS system for the "UP" and "DOWN" categories.
參考文獻 [1] Yahoo!奇摩股市,http://tw.stock.yahoo.com/。
[2] 中文斷詞系統,http://ckipsvr.iis.sinica.edu.tw/。
[3] 中央研究院資訊科學所中文組實驗室中文詞知識庫小組,http://godel.iis.sinica.edu.tw/CKIP/index.htm。
[4] 中華民國證券櫃檯買賣中心,http://www.otc.org.tw/。
[5] 方世榮,統計學導論,華泰書局,頁39-81、215-231,1993。
[6] 王春笙,以技術指標預測台灣股市股價漲跌之實證研究-以類神經網路與複迴歸模式建構,台灣大學資訊管理研究所碩士論文,1996。
[7]王疏艷,基於決策樹方法的分類規則的挖掘,海鼎出版,2002,http://hd123.com/asprun/Message/MessageList.asp?gid=17658。
[8] 杜金龍,基本分析在台灣股市應用的訣竅,財訊出版社,頁9-30,2002。
[9] 邱浩政,量化研究與統計分析,五南圖書,頁3-11,2000。
[10] 施正宏,結合總體經濟指標及個股財報資料以預測個股漲跌-以台灣電子類股為例,中原大學資訊管理學系碩士論文,2004。
[11] 淺井涌二郎,投資劃線原理,投資月刊社,頁6-108,1978。
[12] 曾元顯,"關鍵詞自動擷取技術與相關詞回饋",中國圖書館學會會報59期,頁59-64,1997。
[13] 曾龍,資料採礦-概念與技術,維科圖書,頁279-330,2003。
[14] 臺灣證券交易所,http://www.tse.com.tw/。
[15] 謝劍平,現代投資學,智勝文化,頁402-519,1998。
[16] 謝德宗,投資學,華泰書局,頁235-253、324、403-418,1997。
[17] 鍾任明,運用文字探勘於日內股價漲跌趨勢預測之研究,中原大學資訊管理研究所碩士論文,2005。
[18] 鐘朝宏,投資學,五南圖書,頁243-368、400-441,1992。
[19] Helmut Braun and John S. Chandler, "Predicting Stock Market Behavior through Rule Induction: An Application of the Learning-from-Example Approach," Decision Sciences, volume 18, number 3, pp. 415-429, 1987.
[20] Man-Chung Chan, Chi-Cheong Wong, W. F. Tse, Bernard K.-S. Cheung, Gordon Y.-N. Tang, "Artificial Intelligence in Portfolio Management," Intelligent Data Engineering and Automated Learning, volume 2412 , pp. 403-409, 2002.
[21] Corinna Cortes and Vladimir Vapnik, "Support-Vector Networks," Machine Learning, Volume 20, Number 13, 1995.
[22] Eugene Fama, "Efficient Capital Markets: A Review of Theory and Empirical Work," The Journal of Finance Papers and Proceedings of the Twenty-Eighth Annual Meeting of the American Finance Association New York, volume 25, number 2, pp. 383-417, 1969.
[23] Gabriel Pui Cheong Fung, Jeffrey Xu Yu and Wai Lam, "News Sensitive Stock Trend Prediction," Proceedings of the Sixth Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 289-296, 2002.
[24] Győző Gidófalvi, "Using News Articles to Predict Stock Price Movements," Technical Report: CSE 254, Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA, 2001.
[25] Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, Second Edition, Morgan Kaufmann, pp. 614-626, 2006.
[26] John H. Holland, "Adaptation in Natural and Artificial Systems," University of Michigan Press, Ann Arbor, 1975.
[27] Hans Peter Luhn, "The Automatic Creation of Literature Abstracts," IBM of Research and Development, pp. 159-165, 1958.
[28] Wei-Yun Ma and Keh-Jiann Chen, "Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff," Proceedings of ACL, Second SIGHAN Workshop on Chinese Language Processing, pages 168-171, 2003.
[29] MarketThoughts.com,http://www.marketthoughts.com/dow_theory.html.
[30] Marc-André Mittermayer, "Forecasting Intraday Stock Price Trends with Text Mining Techniques," Proceedings of the Thirty-Seventh Annual Hawaii International Conference on System Sciences, Track 3, p. 30064b, 2004.
[31] Gerard Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley, 1989.
[32] Gerard Salton, A. Wong, and C. S. Yang, "A Vector Space Model for Automatic Indexing," Communications of the ACM, volume 18, pp. 613-620, 1975.
[33] Robert P. Schumaker and Hsinchun Chen, "Textual Analysis of Stock Market Prediction Using Financial News Articles," Proceedings of the Twelfth Americas Conference on Information Systems, Acapulco, Mexico, 2006.
[34] Sholom Weiss, Nitin Indurkhya, Tong Zhang and Fred Damerau, Text mining: predictive methods for analyzing unstructured information, Springer, pp. 35-91, 2005.
[35] Wikipedia,http://www.wikipedia.org/.
[36] Ian H. Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, pp. 76-80, pp. 88-96, pp. 149-151, pp. 244-252, pp. 296-304, 2000.
[37] Beat Wüthrich, Vincent Cho, S. Leung, D. Permunetilleke, K. Sankaran, and J. Zhang, "Daily Stock Market Forecast from Textual Web Data," Proceedings of the 1998 IEEE International Conference on Systems, Man, and Cybernetics, pp. 2720-2725, 1998.
描述 碩士
國立政治大學
資訊科學學系
94753014
95
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0094753014
資料類型 thesis
dc.contributor.advisor 王台平<br>劉昭麟zh_TW
dc.contributor.advisor Wang, Tai-Ping<br>Liu, Chao-Linen_US
dc.contributor.author (Authors) 陳俊達zh_TW
dc.contributor.author (Authors) Chen, Jiun-daen_US
dc.creator (作者) 陳俊達zh_TW
dc.creator (作者) Chen, Jiun-daen_US
dc.date (日期) 2006en_US
dc.date.accessioned 17-Sep-2009 14:02:52 (UTC+8)-
dc.date.available 17-Sep-2009 14:02:52 (UTC+8)-
dc.date.issued (上傳時間) 17-Sep-2009 14:02:52 (UTC+8)-
dc.identifier (Other Identifiers) G0094753014en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/32680-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學學系zh_TW
dc.description (描述) 94753014zh_TW
dc.description (描述) 95zh_TW
dc.description.abstract (摘要) 股價的漲跌變化是由於證券市場中眾多不同投資人及其投資決策後所產生的結果。然而,影響股價變動的因素眾多且複雜,新聞也屬於其中一種,新聞事件不但是投資人用來得知該股票上市公司的相關營運資訊的主要媒介,同時也是影響投資人決定或變更其股票投資策略的主要因素之一。本研究提出以新聞文件做為股價漲跌預測系統的基礎架構,透過文字探勘技術及分類技術來建置出能預測當日個股收盤股價漲跌趨勢之系統。
本研究共提出三種分類模型,分別是簡易貝氏模型、k最近鄰居模型以及混合模型,並設計了三組實驗,分別是分類器效能的比較、新聞樣本資料深度的比較、以及新聞樣本資料廣度的比較來檢驗系統的預測效能。實驗結果顯示,本研究所提出的分類模型可以有效改善相關研究中整體正確率高但各個類別的預測效能卻差異甚大的情況。而對於影響投資人獲利與否的關鍵類別"漲"及類別"跌"的平均預測效能上,本研究所提出的這三種分類模型亦同時具有良好的成效,可以做為投資人進行投資決策時的有效參考依據。
zh_TW
dc.description.abstract (摘要) Stocks` closing price levels can provide hints about investors` aggregate demands and aggregate supplies in the stock trading markets. If the level of a stock`s closing price is higher than its previous closing price, it indicates that the aggregate demand is stronger than the aggregate supply in this trading day. Otherwise, the aggregate demand is weaker than the aggregate supply. It would be profitable if we can predict the individual stock`s closing price level. For example, in case that one stock`s current price is lower than its previous closing price. We can do the proper strategies(buy or sell) to gain profit if we can predict the stock`s closing price level correctly in advance.
In this thesis, we propose and evaluate three models for predicting individual stock`s closing price in the Taiwan stock market. These models include a naïve Bayes model, a k-nearest neighbors model, and a hybrid model. Experimental results show the proposed methods perform better than the NewsCATS system for the "UP" and "DOWN" categories.
en_US
dc.description.tableofcontents 第一章 緒論........................v....1
1.1 研究動機及目的......................1
1.2 研究方法與成果......................2
1.3 論文架構...........................3
第二章 文獻探討.........................4
2.1 基本分析...........................4
2.2 技術分析...........................5
2.3 效率市場..........................13
2.4 投資分析的資料來源.................14
2.5 新聞對股價指數的預測...............16
2.6 新聞對個別股票股價的預測............16
2.7 中文文件前處理.....................19
2.8 分類器方法簡介.....................22
第三章 研究方法及系統架構...............26
3.1 研究目標及步驟.....................26
3.2 系統架構..........................26
3.3 分類器............................27
3.3.1 簡易貝氏模型.....................28
3.3.2 k最近鄰居模型....................29
3.3.3 混合模型........................33
第四章 實驗及分析......................35
4.1 資料來源..........................35
4.2 實驗設計..........................35
4.3 評估方法..........................37
4.4 參數設定..........................38
4.5 模擬NewsCATS系統..................55
4.6 實驗結果及分析.....................62
4.6.1 實驗A-分類器效能的比較...........62
4.6.2 實驗B-新聞樣本資料深度的比較......74
4.6.3 實驗C-新聞樣本資料廣度的比較......80
第五章 結論及未來工作...................93
5.1 結論..............................93
5.2 未來工作..........................94
參考文獻..............................96
附錄.................................100
zh_TW
dc.format.extent 424308 bytes-
dc.format.extent 73359 bytes-
dc.format.extent 134028 bytes-
dc.format.extent 146337 bytes-
dc.format.extent 225888 bytes-
dc.format.extent 591781 bytes-
dc.format.extent 310858 bytes-
dc.format.extent 1036121 bytes-
dc.format.extent 213487 bytes-
dc.format.extent 169649 bytes-
dc.format.extent 978544 bytes-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0094753014en_US
dc.subject (關鍵詞) 股價預測zh_TW
dc.subject (關鍵詞) 文字探勘zh_TW
dc.subject (關鍵詞) 簡易貝氏模型zh_TW
dc.subject (關鍵詞) k最近鄰居模型zh_TW
dc.subject (關鍵詞) 混合模型zh_TW
dc.subject (關鍵詞) Stock Price Predictionen_US
dc.subject (關鍵詞) text miningen_US
dc.subject (關鍵詞) naïve Bayesian modelsen_US
dc.subject (關鍵詞) k-nearest neighbors modelsen_US
dc.subject (關鍵詞) hybrid modelsen_US
dc.title (題名) 以文件分類技術預測股價趨勢zh_TW
dc.title (題名) Predicting Trends of Stock Prices with Text Classification Techniquesen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) [1] Yahoo!奇摩股市,http://tw.stock.yahoo.com/。zh_TW
dc.relation.reference (參考文獻) [2] 中文斷詞系統,http://ckipsvr.iis.sinica.edu.tw/。zh_TW
dc.relation.reference (參考文獻) [3] 中央研究院資訊科學所中文組實驗室中文詞知識庫小組,http://godel.iis.sinica.edu.tw/CKIP/index.htm。zh_TW
dc.relation.reference (參考文獻) [4] 中華民國證券櫃檯買賣中心,http://www.otc.org.tw/。zh_TW
dc.relation.reference (參考文獻) [5] 方世榮,統計學導論,華泰書局,頁39-81、215-231,1993。zh_TW
dc.relation.reference (參考文獻) [6] 王春笙,以技術指標預測台灣股市股價漲跌之實證研究-以類神經網路與複迴歸模式建構,台灣大學資訊管理研究所碩士論文,1996。zh_TW
dc.relation.reference (參考文獻) [7]王疏艷,基於決策樹方法的分類規則的挖掘,海鼎出版,2002,http://hd123.com/asprun/Message/MessageList.asp?gid=17658。zh_TW
dc.relation.reference (參考文獻) [8] 杜金龍,基本分析在台灣股市應用的訣竅,財訊出版社,頁9-30,2002。zh_TW
dc.relation.reference (參考文獻) [9] 邱浩政,量化研究與統計分析,五南圖書,頁3-11,2000。zh_TW
dc.relation.reference (參考文獻) [10] 施正宏,結合總體經濟指標及個股財報資料以預測個股漲跌-以台灣電子類股為例,中原大學資訊管理學系碩士論文,2004。zh_TW
dc.relation.reference (參考文獻) [11] 淺井涌二郎,投資劃線原理,投資月刊社,頁6-108,1978。zh_TW
dc.relation.reference (參考文獻) [12] 曾元顯,"關鍵詞自動擷取技術與相關詞回饋",中國圖書館學會會報59期,頁59-64,1997。zh_TW
dc.relation.reference (參考文獻) [13] 曾龍,資料採礦-概念與技術,維科圖書,頁279-330,2003。zh_TW
dc.relation.reference (參考文獻) [14] 臺灣證券交易所,http://www.tse.com.tw/。zh_TW
dc.relation.reference (參考文獻) [15] 謝劍平,現代投資學,智勝文化,頁402-519,1998。zh_TW
dc.relation.reference (參考文獻) [16] 謝德宗,投資學,華泰書局,頁235-253、324、403-418,1997。zh_TW
dc.relation.reference (參考文獻) [17] 鍾任明,運用文字探勘於日內股價漲跌趨勢預測之研究,中原大學資訊管理研究所碩士論文,2005。zh_TW
dc.relation.reference (參考文獻) [18] 鐘朝宏,投資學,五南圖書,頁243-368、400-441,1992。zh_TW
dc.relation.reference (參考文獻) [19] Helmut Braun and John S. Chandler, "Predicting Stock Market Behavior through Rule Induction: An Application of the Learning-from-Example Approach," Decision Sciences, volume 18, number 3, pp. 415-429, 1987.zh_TW
dc.relation.reference (參考文獻) [20] Man-Chung Chan, Chi-Cheong Wong, W. F. Tse, Bernard K.-S. Cheung, Gordon Y.-N. Tang, "Artificial Intelligence in Portfolio Management," Intelligent Data Engineering and Automated Learning, volume 2412 , pp. 403-409, 2002.zh_TW
dc.relation.reference (參考文獻) [21] Corinna Cortes and Vladimir Vapnik, "Support-Vector Networks," Machine Learning, Volume 20, Number 13, 1995.zh_TW
dc.relation.reference (參考文獻) [22] Eugene Fama, "Efficient Capital Markets: A Review of Theory and Empirical Work," The Journal of Finance Papers and Proceedings of the Twenty-Eighth Annual Meeting of the American Finance Association New York, volume 25, number 2, pp. 383-417, 1969.zh_TW
dc.relation.reference (參考文獻) [23] Gabriel Pui Cheong Fung, Jeffrey Xu Yu and Wai Lam, "News Sensitive Stock Trend Prediction," Proceedings of the Sixth Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 289-296, 2002.zh_TW
dc.relation.reference (參考文獻) [24] Győző Gidófalvi, "Using News Articles to Predict Stock Price Movements," Technical Report: CSE 254, Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA, 2001.zh_TW
dc.relation.reference (參考文獻) [25] Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, Second Edition, Morgan Kaufmann, pp. 614-626, 2006.zh_TW
dc.relation.reference (參考文獻) [26] John H. Holland, "Adaptation in Natural and Artificial Systems," University of Michigan Press, Ann Arbor, 1975.zh_TW
dc.relation.reference (參考文獻) [27] Hans Peter Luhn, "The Automatic Creation of Literature Abstracts," IBM of Research and Development, pp. 159-165, 1958.zh_TW
dc.relation.reference (參考文獻) [28] Wei-Yun Ma and Keh-Jiann Chen, "Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff," Proceedings of ACL, Second SIGHAN Workshop on Chinese Language Processing, pages 168-171, 2003.zh_TW
dc.relation.reference (參考文獻) [29] MarketThoughts.com,http://www.marketthoughts.com/dow_theory.html.zh_TW
dc.relation.reference (參考文獻) [30] Marc-André Mittermayer, "Forecasting Intraday Stock Price Trends with Text Mining Techniques," Proceedings of the Thirty-Seventh Annual Hawaii International Conference on System Sciences, Track 3, p. 30064b, 2004.zh_TW
dc.relation.reference (參考文獻) [31] Gerard Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley, 1989.zh_TW
dc.relation.reference (參考文獻) [32] Gerard Salton, A. Wong, and C. S. Yang, "A Vector Space Model for Automatic Indexing," Communications of the ACM, volume 18, pp. 613-620, 1975.zh_TW
dc.relation.reference (參考文獻) [33] Robert P. Schumaker and Hsinchun Chen, "Textual Analysis of Stock Market Prediction Using Financial News Articles," Proceedings of the Twelfth Americas Conference on Information Systems, Acapulco, Mexico, 2006.zh_TW
dc.relation.reference (參考文獻) [34] Sholom Weiss, Nitin Indurkhya, Tong Zhang and Fred Damerau, Text mining: predictive methods for analyzing unstructured information, Springer, pp. 35-91, 2005.zh_TW
dc.relation.reference (參考文獻) [35] Wikipedia,http://www.wikipedia.org/.zh_TW
dc.relation.reference (參考文獻) [36] Ian H. Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, pp. 76-80, pp. 88-96, pp. 149-151, pp. 244-252, pp. 296-304, 2000.zh_TW
dc.relation.reference (參考文獻) [37] Beat Wüthrich, Vincent Cho, S. Leung, D. Permunetilleke, K. Sankaran, and J. Zhang, "Daily Stock Market Forecast from Textual Web Data," Proceedings of the 1998 IEEE International Conference on Systems, Man, and Cybernetics, pp. 2720-2725, 1998.zh_TW