學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 整合文件探勘與類神經網路預測模型之研究 -以財經事件線索預測台灣股市為例
作者 歐智民
貢獻者 楊建民
歐智民
關鍵詞 事件偵測與追蹤
kNN分群
倒傳遞類神經網路預測模型
日期 2010
上傳時間 4-Sep-2013 17:00:34 (UTC+8)
摘要   隨著全球化與資訊科技之進步,大幅加快媒體傳播訊息之速度,使得與股票市場相關之新聞事件,無論在產量、產出頻率上,都較以往增加,進而對股票市場造成影響。現今投資者多已具備傳統的投資概念、觀察總體經濟之趨勢與指標、分析漲跌之圖表用以預測股票收盤價;除此之外,從大量新聞資料中,找出關鍵輔助投資之新聞事件更是需要培養的能力,而此正是投資者較為不熟悉的部分,故希望透過本文加以探討之。
  本研究使用2009年自由時報電子報之財經新聞(共5767篇)為資料來源,以文件距離為基礎之kNN技術分群,並採用時間區間之概念,用以增進分群之時效性;而分群之結果,再透過類別詞庫分類為正向、持平及負向新聞事件,與股票市場之量化資料,包括成交量、收盤價及3日收盤價,一併輸入於倒傳遞類神經網路之預測模型。自台灣經濟新報中取得半導體類股之交易資訊,將其分成訓練及測試資料,各包含168個及83個交易日,經由網路之迭代學習過程建立預測模型,並與原預測模型進行比較。
  由研究結果中,首先,類別詞庫可透過股票收盤價報酬率及篩選字詞出現頻率的方式建立,使投資者能透藉由分群與分類降低新聞文件的資訊量;其次,於倒傳遞類神經網路預測模型中加入分類後的新聞事件,依統計顯著性檢定,在顯著水準為95%及99%下,皆顯著改善隔日股票收盤價之預測方向正確性與準確率,換言之,於預測模型中加入新聞事件,有助於預測隔日收盤價。最後,本研究並指出一些未來研究方向。
參考文獻 英文文獻
1. K. Aas and L. Eikvil(1999), Text Categorization: a Survey, Technical Report, no.941, Norwegian Computing Center.
2. Abrahart R. J., See L. & Kneale P. E. (1998). New Tools for Neurobydrologists: Using Network Pruning and Nodel Breeding Algorithms to Discover Optimum Inuts and Architectures. In Proceedings of the 3rd International Conference on Geocomputation. University of Bristol.
3. Ahmad, K., Oliveira, P., Manomaisupat, P., Casey, M. & Taskay, T. (2002). Description of Events: An Analysis of Keywords and Indexical Names, Third International Conference on Language Resources and Evaluation, LREC 2002: Workshop on Event Modelling for Multilingual Document Linking, p29-35.
4. Allan, J., Papka, R. & Lavrenko. V. (1998). On-line New Event Detection and Tracking, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, p37-45.
5. Annexstein, F. (2002). Indexing and Representation: The Vector Space Model Retrieved, December 25, 2003, from the World Wide Web: http://www.ececs.uc.edu/~annexste/Courses/cs690/Indexing%20and%20Representation.ppt.
6. Armano G., Marchesi M., & Murru A. (2005). A hybrid genetic-neuralarchitecture forstock indexes forecasting,” Information Sciences, Vol.170, Issue 1, p3-33.
7. Chen, A. S., Leung, M. T., Daouk, H. (2003). Application of neural networks to an emerging financial market: forecasting and trading the Taiwan Stock Index. Computers & Operations Research. 30, 6.
8. Dawson C. W. and Wilby R. L. (2001). Hydrological Modeling Using Artificial Neural Networks. Progress in Physical Geography. 25(1), p80-108.
9. Fayyed, U., Piatetsky-Shapiro, G. & Smyth, P. (1996). The KDD Process of Extracting Useful Knowledge from Volumes of Data, Communication of the ACM, Vol.39, p27-34.
10. Ham F. M. & Kostanic I. (2001). Principles of Neurocomputing for Science & Engineering. McGraw-Hill: New York, NY.
11. Hsu, C. W., Chang, C. C., and Lin, C. J. (2010). A Practical Guide to Support Vector Classification http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
12. Hush, D. R. & Horne, B. G. (1993). Progress in supervised neural networks. IEEE Signal Process. Mag. (January 1993), p8-39.
13. Jing, L. P. , Huang, H. K., Shi, H. B. (2002). Improved Feature Selection Approach TFIDF in Text Mining. 1st International Conference on Machine Learning and Cybernetics, Beijing.
14. Joachims, T. (1998). Text Categorization with Support Vector Machines: Learning with Many Relevant Features, Proceedings of the European Conference on Machine Learning Springer.
15. Han, J. & Kamber, M. (2001). Data Mining: Concepts and Techniuqes, Morgan Kaufmann Publishers, San Francisco, CA.
16. Kim, K. J., Han, I. (2000). Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index.
17. Kurt, H. (2001). On-line new event detection and tracking in a multi-resource environment, MS Thesis, The Institute of Engineering and Science of Bilkent University.
18. Kwok, T.Y. & Yeung, D. Y. (1997). Constructive Algorithms for Struture Learning in Feedforward Neural Networks for Regression Problems. IEEE Transactions on Neural Networks. 3: 630-645.
19. Kwok, T. Y. and Yeung, D. Y. (1997). Constructive Algorithms for Structure Learning in Feedforward Neural Networks for Regression Problems. IEEE Transcations on Neural Networks. 3: 630-645.
20. Lavrenko, V., Schmill, M., Lawrie, D., Ogilvie, P., Jensen, D., and Allan, J. (2000). Language models for financial news recommendation. In Proceedings of CIKM 2000, p389-396, New York, N.Y., ACM Press.
21. Liu, H. & Motoda, H. (1998). Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic, Norwell, MA, USA.
22. MacQueen, J. (1967). Some Methods for Classification and analysis of multivariate observations. Proc. 5th Berkeley Symp. Math. Statist, Prob., 1:281-297.
23. Nguyen, D. & Widrow, B. (1990). Improving the Learning Speed of the 2-Layer Neural Networks by Choosing Initial Values of Adaptive Weights. In Proceedings of the International Joint Conference on Neural Networks. 3. San Diego, CA.
24. Nygren, K. (2004), Stock Prediction – A Neural Network Approach. Master Thesis, Royal Institute of Technology, KTH.
25. Popescu, A. (2001). Implementation of term weighting in a simple IR system, Personal course project, University of Helsinki.
26. Salton, G. (1989). Automatic Text Processing. Addison-Wesley, Reading, Mass.
27. Salton, G. & Gill, M. (1983). Introduction to Modern Information Retrieval, McGraw-Hill.
28. Salton, G., Wong, A. & Yang, C. S. (1975). A Vector Space Model for Automatic Indexing.
29. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), p1-47.
30. Yang, Y., Ault, T., and Pierce, T. (2000). Improving text categorization methods for event tracking , Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
31. Yang, Y. & Pedersen, J. O. (1997). A Comparative Study on Feature Selection in TextCategorization. Proceedings of the Fourteenth International Conference on Machine Learning, p412-420, Nashville, TN, USA.
32. Yang, Y., Pierce, T. & Carbonell, J. (1998). A Study on Retrospective And On-Line Event Detection , Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, p28-36.
33. Wu, Y. C. (2008). Predicting the Trend of Taiwan Weighted Stock Index with Text Mining Techniques, NCU IM.
34. Mittermayer, M.-A.(2004). Forecasting Intraday Stock Price Trends with Text Mining Techniques. In: Proceedings 37th Annual Hawaii Int. Conference on System Sciences (HICSS). Big Island, p64.

中文文獻
[1]林章德,2000,上市公司重大投資宣告對股價影響之研究,東海大學管理研究所碩士論文。
[2]林聖哲,2001,針對認購權證建構不同之人工智慧評價,實踐大學企業管理學系研究所碩士論文
[3]李春淋,2010,個股新聞對股價影響之研究-以台股為例,輔仁大學應用統計學系碩士論文。
[4]吳真蕙,2000,專業性報紙頭版新聞對股票價量的影響,中原大學會計系碩士論文。
[5]周宗南、劉瑞鑫,2005,演化式類神經網路應用於台股指數報酬率之預測,財經論文叢刊,第三期,p77-94
[6]胡舜禹,2009,結合PSO及K-Means聚類分析演算法的圖像分割,中山通訊工程研究所碩士在職專班
[7]袁立安,2007,混合式自動文件摘要方法,國立中山大學資訊管理研究所碩士論文
[8]陳稼興、楊孟龍,2000,類神經網路於股市波段預測及選股之應用
[9]章秉純、許清琦, Combining Unsupervised Feature Selection Strategy for Automatic Text Categorization, In Proceedings of the 6th Conference on Artificial Intelligence and Applications, November 9, 2001.
[10]張斐章、張麗秋,2005,類神經網路,台北市:東華書局
[11]黃孝文,2010,雲端運算服務環境下運用文字探勘於語意註解網頁文件分析之研究,國立政治大學資訊管理研究所碩士論文
[12]黃馨瑩、楊建民、李耀中,2009,財經新聞探勘影響股價趨勢之探討-以跨兩岸面板產業為例,
[13]楊踐為、李家豪、類惠貞,2007。應用時間序列分析法建構台灣證券市場之預測交易模型。中華管理評論國際學報,10,3
[14]鍾任明、李維平、吳澤民,2007。運用文字探勘於日內股價漲跌趨勢預測之研究。中華管理評論國際學報,10,1
[15]戴尚學,2003,運用事件偵測與追蹤技術於中文多文件摘要之研究,國立雲林科技大學資訊管理研究所碩士論文
[16]顧皓光,1996,網路文件自動分類,國立台灣大學資訊管理研究所碩士論文
[17]羅華強,2001,類神經網路,台北市:清蔚科技
網站資料
[1]A Tutorial on Clustering Algorithms (2011), 2011年2月3日取自 http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/kmeans.html
[2]自由時報電子報。2011年2月1日取自http://www.libertytimes.com.tw/index.htm
[3]中研院CKIP。2011年1月17日取自 http://ckipsvr.iis.sinica.edu.tw
[4]Yahoo API (2011)。2011年1月22日取自http://tw.developer.yahoo.com/cas
描述 碩士
國立政治大學
資訊管理研究所
98356033
99
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0098356033
資料類型 thesis
dc.contributor.advisor 楊建民zh_TW
dc.contributor.author (Authors) 歐智民zh_TW
dc.creator (作者) 歐智民zh_TW
dc.date (日期) 2010en_US
dc.date.accessioned 4-Sep-2013 17:00:34 (UTC+8)-
dc.date.available 4-Sep-2013 17:00:34 (UTC+8)-
dc.date.issued (上傳時間) 4-Sep-2013 17:00:34 (UTC+8)-
dc.identifier (Other Identifiers) G0098356033en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/60223-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊管理研究所zh_TW
dc.description (描述) 98356033zh_TW
dc.description (描述) 99zh_TW
dc.description.abstract (摘要)   隨著全球化與資訊科技之進步,大幅加快媒體傳播訊息之速度,使得與股票市場相關之新聞事件,無論在產量、產出頻率上,都較以往增加,進而對股票市場造成影響。現今投資者多已具備傳統的投資概念、觀察總體經濟之趨勢與指標、分析漲跌之圖表用以預測股票收盤價;除此之外,從大量新聞資料中,找出關鍵輔助投資之新聞事件更是需要培養的能力,而此正是投資者較為不熟悉的部分,故希望透過本文加以探討之。
  本研究使用2009年自由時報電子報之財經新聞(共5767篇)為資料來源,以文件距離為基礎之kNN技術分群,並採用時間區間之概念,用以增進分群之時效性;而分群之結果,再透過類別詞庫分類為正向、持平及負向新聞事件,與股票市場之量化資料,包括成交量、收盤價及3日收盤價,一併輸入於倒傳遞類神經網路之預測模型。自台灣經濟新報中取得半導體類股之交易資訊,將其分成訓練及測試資料,各包含168個及83個交易日,經由網路之迭代學習過程建立預測模型,並與原預測模型進行比較。
  由研究結果中,首先,類別詞庫可透過股票收盤價報酬率及篩選字詞出現頻率的方式建立,使投資者能透藉由分群與分類降低新聞文件的資訊量;其次,於倒傳遞類神經網路預測模型中加入分類後的新聞事件,依統計顯著性檢定,在顯著水準為95%及99%下,皆顯著改善隔日股票收盤價之預測方向正確性與準確率,換言之,於預測模型中加入新聞事件,有助於預測隔日收盤價。最後,本研究並指出一些未來研究方向。
zh_TW
dc.description.tableofcontents 摘要 I
Abstract II
表目錄 VI
公式目錄 VII
第一章 緒論 1
 第一節 研究背景與動機 1
 第二節 研究目的 3
第二章 文獻探討 4
 第一節 新聞與股價之相關性研究 4
 第二節 探勘技術 5
  2.1.資料探勘  5
  2.2.文字探勘 8
 第三節 事件偵測與追蹤 15
  3.1事件偵測 15
  3.2事件追蹤 16
第四節 類神經網路 16
  4.1概論 16
  4.2網路結構 17
  4.3股票市場之應用 18
第三章 研究設計 18
 第一節 新聞文件分群與分類 21
  1.1斷詞工具 21
  1.2新聞文件分群──事件偵測與追蹤 22
  1.3新聞事件分類 26
 第二節 倒傳遞類神經網路預測模型 26
 第三節 研究樣本與統計檢定 31
  3.1研究樣本 31
  3.2統計檢定 32
第四章 研究結果 33
 第一節 類別詞庫之建立 33
 第二節 倒傳遞類神經網路預測模型之參數建構 37
 第三節 預測模型之顯著性檢定 40
  3.1預測方向正確性 40
  3.2預測準確率 44
第五章 結論 48
 第一節 結論與建議 48
 第二節 未來研究方向 49
參考文獻 50
zh_TW
dc.format.extent 924556 bytes-
dc.format.mimetype application/pdf-
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0098356033en_US
dc.subject (關鍵詞) 事件偵測與追蹤zh_TW
dc.subject (關鍵詞) kNN分群zh_TW
dc.subject (關鍵詞) 倒傳遞類神經網路預測模型zh_TW
dc.title (題名) 整合文件探勘與類神經網路預測模型之研究 -以財經事件線索預測台灣股市為例zh_TW
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) 英文文獻
1. K. Aas and L. Eikvil(1999), Text Categorization: a Survey, Technical Report, no.941, Norwegian Computing Center.
2. Abrahart R. J., See L. & Kneale P. E. (1998). New Tools for Neurobydrologists: Using Network Pruning and Nodel Breeding Algorithms to Discover Optimum Inuts and Architectures. In Proceedings of the 3rd International Conference on Geocomputation. University of Bristol.
3. Ahmad, K., Oliveira, P., Manomaisupat, P., Casey, M. & Taskay, T. (2002). Description of Events: An Analysis of Keywords and Indexical Names, Third International Conference on Language Resources and Evaluation, LREC 2002: Workshop on Event Modelling for Multilingual Document Linking, p29-35.
4. Allan, J., Papka, R. & Lavrenko. V. (1998). On-line New Event Detection and Tracking, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, p37-45.
5. Annexstein, F. (2002). Indexing and Representation: The Vector Space Model Retrieved, December 25, 2003, from the World Wide Web: http://www.ececs.uc.edu/~annexste/Courses/cs690/Indexing%20and%20Representation.ppt.
6. Armano G., Marchesi M., & Murru A. (2005). A hybrid genetic-neuralarchitecture forstock indexes forecasting,” Information Sciences, Vol.170, Issue 1, p3-33.
7. Chen, A. S., Leung, M. T., Daouk, H. (2003). Application of neural networks to an emerging financial market: forecasting and trading the Taiwan Stock Index. Computers & Operations Research. 30, 6.
8. Dawson C. W. and Wilby R. L. (2001). Hydrological Modeling Using Artificial Neural Networks. Progress in Physical Geography. 25(1), p80-108.
9. Fayyed, U., Piatetsky-Shapiro, G. & Smyth, P. (1996). The KDD Process of Extracting Useful Knowledge from Volumes of Data, Communication of the ACM, Vol.39, p27-34.
10. Ham F. M. & Kostanic I. (2001). Principles of Neurocomputing for Science & Engineering. McGraw-Hill: New York, NY.
11. Hsu, C. W., Chang, C. C., and Lin, C. J. (2010). A Practical Guide to Support Vector Classification http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
12. Hush, D. R. & Horne, B. G. (1993). Progress in supervised neural networks. IEEE Signal Process. Mag. (January 1993), p8-39.
13. Jing, L. P. , Huang, H. K., Shi, H. B. (2002). Improved Feature Selection Approach TFIDF in Text Mining. 1st International Conference on Machine Learning and Cybernetics, Beijing.
14. Joachims, T. (1998). Text Categorization with Support Vector Machines: Learning with Many Relevant Features, Proceedings of the European Conference on Machine Learning Springer.
15. Han, J. & Kamber, M. (2001). Data Mining: Concepts and Techniuqes, Morgan Kaufmann Publishers, San Francisco, CA.
16. Kim, K. J., Han, I. (2000). Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index.
17. Kurt, H. (2001). On-line new event detection and tracking in a multi-resource environment, MS Thesis, The Institute of Engineering and Science of Bilkent University.
18. Kwok, T.Y. & Yeung, D. Y. (1997). Constructive Algorithms for Struture Learning in Feedforward Neural Networks for Regression Problems. IEEE Transactions on Neural Networks. 3: 630-645.
19. Kwok, T. Y. and Yeung, D. Y. (1997). Constructive Algorithms for Structure Learning in Feedforward Neural Networks for Regression Problems. IEEE Transcations on Neural Networks. 3: 630-645.
20. Lavrenko, V., Schmill, M., Lawrie, D., Ogilvie, P., Jensen, D., and Allan, J. (2000). Language models for financial news recommendation. In Proceedings of CIKM 2000, p389-396, New York, N.Y., ACM Press.
21. Liu, H. & Motoda, H. (1998). Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic, Norwell, MA, USA.
22. MacQueen, J. (1967). Some Methods for Classification and analysis of multivariate observations. Proc. 5th Berkeley Symp. Math. Statist, Prob., 1:281-297.
23. Nguyen, D. & Widrow, B. (1990). Improving the Learning Speed of the 2-Layer Neural Networks by Choosing Initial Values of Adaptive Weights. In Proceedings of the International Joint Conference on Neural Networks. 3. San Diego, CA.
24. Nygren, K. (2004), Stock Prediction – A Neural Network Approach. Master Thesis, Royal Institute of Technology, KTH.
25. Popescu, A. (2001). Implementation of term weighting in a simple IR system, Personal course project, University of Helsinki.
26. Salton, G. (1989). Automatic Text Processing. Addison-Wesley, Reading, Mass.
27. Salton, G. & Gill, M. (1983). Introduction to Modern Information Retrieval, McGraw-Hill.
28. Salton, G., Wong, A. & Yang, C. S. (1975). A Vector Space Model for Automatic Indexing.
29. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), p1-47.
30. Yang, Y., Ault, T., and Pierce, T. (2000). Improving text categorization methods for event tracking , Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
31. Yang, Y. & Pedersen, J. O. (1997). A Comparative Study on Feature Selection in TextCategorization. Proceedings of the Fourteenth International Conference on Machine Learning, p412-420, Nashville, TN, USA.
32. Yang, Y., Pierce, T. & Carbonell, J. (1998). A Study on Retrospective And On-Line Event Detection , Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, p28-36.
33. Wu, Y. C. (2008). Predicting the Trend of Taiwan Weighted Stock Index with Text Mining Techniques, NCU IM.
34. Mittermayer, M.-A.(2004). Forecasting Intraday Stock Price Trends with Text Mining Techniques. In: Proceedings 37th Annual Hawaii Int. Conference on System Sciences (HICSS). Big Island, p64.

中文文獻
[1]林章德,2000,上市公司重大投資宣告對股價影響之研究,東海大學管理研究所碩士論文。
[2]林聖哲,2001,針對認購權證建構不同之人工智慧評價,實踐大學企業管理學系研究所碩士論文
[3]李春淋,2010,個股新聞對股價影響之研究-以台股為例,輔仁大學應用統計學系碩士論文。
[4]吳真蕙,2000,專業性報紙頭版新聞對股票價量的影響,中原大學會計系碩士論文。
[5]周宗南、劉瑞鑫,2005,演化式類神經網路應用於台股指數報酬率之預測,財經論文叢刊,第三期,p77-94
[6]胡舜禹,2009,結合PSO及K-Means聚類分析演算法的圖像分割,中山通訊工程研究所碩士在職專班
[7]袁立安,2007,混合式自動文件摘要方法,國立中山大學資訊管理研究所碩士論文
[8]陳稼興、楊孟龍,2000,類神經網路於股市波段預測及選股之應用
[9]章秉純、許清琦, Combining Unsupervised Feature Selection Strategy for Automatic Text Categorization, In Proceedings of the 6th Conference on Artificial Intelligence and Applications, November 9, 2001.
[10]張斐章、張麗秋,2005,類神經網路,台北市:東華書局
[11]黃孝文,2010,雲端運算服務環境下運用文字探勘於語意註解網頁文件分析之研究,國立政治大學資訊管理研究所碩士論文
[12]黃馨瑩、楊建民、李耀中,2009,財經新聞探勘影響股價趨勢之探討-以跨兩岸面板產業為例,
[13]楊踐為、李家豪、類惠貞,2007。應用時間序列分析法建構台灣證券市場之預測交易模型。中華管理評論國際學報,10,3
[14]鍾任明、李維平、吳澤民,2007。運用文字探勘於日內股價漲跌趨勢預測之研究。中華管理評論國際學報,10,1
[15]戴尚學,2003,運用事件偵測與追蹤技術於中文多文件摘要之研究,國立雲林科技大學資訊管理研究所碩士論文
[16]顧皓光,1996,網路文件自動分類,國立台灣大學資訊管理研究所碩士論文
[17]羅華強,2001,類神經網路,台北市:清蔚科技
網站資料
[1]A Tutorial on Clustering Algorithms (2011), 2011年2月3日取自 http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/kmeans.html
[2]自由時報電子報。2011年2月1日取自http://www.libertytimes.com.tw/index.htm
[3]中研院CKIP。2011年1月17日取自 http://ckipsvr.iis.sinica.edu.tw
[4]Yahoo API (2011)。2011年1月22日取自http://tw.developer.yahoo.com/cas
zh_TW