學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 應用網路新聞文字探勘於預測台灣股價趨勢之研究
A study of forecasting Taiwan stock price trends by applying news text mining technique
作者 陳人華
Chen, Ren Hua
貢獻者 廖四郎
陳人華
Chen, Ren Hua
關鍵詞 文字探勘
svm
新聞
股市
日期 2016
上傳時間 1-Sep-2016 23:47:06 (UTC+8)
摘要 股市新聞是散戶投資人重要的消息來源管道,近年來集中市場裡散戶投資人交易占比雖然下滑,但仍有過半的比重,而過去文獻也一再指出新聞媒體的報導確實會影響股票的報酬,若能夠將新聞中的資訊萃取出來並用來建構交易策略,無論是單獨使用或者和其他策略相結合,均可帶給投資人額外的幫助。
本研究運用支援向量機演算法(Support Vector Machine, SVM)進行自動分類及預測新聞發布後的股價趨勢,藉由應用張玉芳等人(2006)提出的改良式TF-IDF法,挑選新聞特徵詞的過程將會更準確,本研究從兩個不同的來源分別獲取數千篇新聞資料,包括鉅亨網和台灣經濟新報(TEJ),透過分析大量的新聞資料使結果更具代表性與穩定性,然而實證結果卻發現預測模型的精確度仍然不足,因此本研究最終未能透過模型證明新聞內容對股價的關係。
Stock market news is an important source of information for individual investors. In Taiwan exchange market, individual investors participation is still above 50% though it was on a decline for resent years. Some past research showed that news do affect returns of stocks. If we can find a way to extract the information in the news and build a trading strategy based on it, investors will gain additional profit from using the strategy─whether they combine the strategy with another.
This study use SVM algorithm for automatic classification and for predicting Taiwan stock price trends after a news published. By applying the improved TF-IDF method developed by Chang et al., the process of characteristic selection become more accurate. This study analyze thousands of news articles which come from two different source, cnYES and Taiwan Economic Journal (TEJ), in order to make the predicting model representative and stable. However, the empirical results show that the precision of the model isn’t good enough. This study find no evidence that the information in news contents associate with Taiwan stock returns.
參考文獻 1.Barber, B. M., & Odean, T. (2008). All that glitters: The effect of attention and news on the buying behavior of individual and institutional investors. Review of Financial Studies, 21(2), 785-818.
2.Chen, K. J., & Liu, S. H. (1992, August). Word identification for Mandarin Chinese sentences. In Proceedings of the 14th conference on Computational linguistics-Volume 1 (pp. 101-107). Association for Computational Linguistics.
3.Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning,20(3), 273-297.
4.Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of machine learning research, 3(Mar), 1289-1305.
5.Gidofalvi, G., & Elkan, C. (2001). Using news articles to predict stock price movements. Department of Computer Science and Engineering, University of California, San Diego.
6.Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification.
7.Lavrenko, V., Schmill, M., Lawrie, D., Ogilvie, P., Jensen, D., & Allan, J. (2000, November). Language models for financial news recommendation. InProceedings of the ninth international conference on Information and knowledge management (pp. 389-396). ACM.
8.Merton, R. C. (1987). A simple model of capital market equilibrium with incomplete information. The journal of finance, 42(3), 483-510.
9.Mittermayer, M. A. (2004). Forecasting intraday stock price trends with text mining techniques. In System Sciences, 2004. Proceedings of the 37th Annual Hawaii International Conference on (pp. 10-pp). IEEE.
10.Nie, J. Y., Brisebois, M., & Ren, X. (1996). On Chinese text retrieval. In Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 225-233). ACM.
11.Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5), 513-523.
12.Salton, G., & McGill, M. J. (1986). Introduction to modern information retrieval.
13.Sproat, R. (1990). A STATISTICAL METHOD FOR FINDING WORD BOUNDARIES IN CHINESE TEXT.
14.Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. The Journal of Finance, 62(3), 1139-1168.
15.Witten, I. H. (2005). Text mining. Practical handbook of Internet computing, 14-1.
16.Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information retrieval, 1(1-2), 69-90.
17.池祥萱, 林煜恩, 陳韋如 & 周賓凰. (2009). Does CEO Media Coverage Affect Firm Performance?. 交大管理學報, 1, 139-173.
18.張玉芳, 彭時名 & 呂佳. (2006). 基於文本分類 TFIDF 方法的改進與應用. 電腦工程, 32(19), 76-78.
19.鍾任明, 李維平, & 吳澤民. (2005). 運用文字探勘於日內股價
描述 碩士
國立政治大學
金融研究所
103352019
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0103352019
資料類型 thesis
dc.contributor.advisor 廖四郎zh_TW
dc.contributor.author (Authors) 陳人華zh_TW
dc.contributor.author (Authors) Chen, Ren Huaen_US
dc.creator (作者) 陳人華zh_TW
dc.creator (作者) Chen, Ren Huaen_US
dc.date (日期) 2016en_US
dc.date.accessioned 1-Sep-2016 23:47:06 (UTC+8)-
dc.date.available 1-Sep-2016 23:47:06 (UTC+8)-
dc.date.issued (上傳時間) 1-Sep-2016 23:47:06 (UTC+8)-
dc.identifier (Other Identifiers) G0103352019en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/101083-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 金融研究所zh_TW
dc.description (描述) 103352019zh_TW
dc.description.abstract (摘要) 股市新聞是散戶投資人重要的消息來源管道,近年來集中市場裡散戶投資人交易占比雖然下滑,但仍有過半的比重,而過去文獻也一再指出新聞媒體的報導確實會影響股票的報酬,若能夠將新聞中的資訊萃取出來並用來建構交易策略,無論是單獨使用或者和其他策略相結合,均可帶給投資人額外的幫助。
本研究運用支援向量機演算法(Support Vector Machine, SVM)進行自動分類及預測新聞發布後的股價趨勢,藉由應用張玉芳等人(2006)提出的改良式TF-IDF法,挑選新聞特徵詞的過程將會更準確,本研究從兩個不同的來源分別獲取數千篇新聞資料,包括鉅亨網和台灣經濟新報(TEJ),透過分析大量的新聞資料使結果更具代表性與穩定性,然而實證結果卻發現預測模型的精確度仍然不足,因此本研究最終未能透過模型證明新聞內容對股價的關係。
zh_TW
dc.description.abstract (摘要) Stock market news is an important source of information for individual investors. In Taiwan exchange market, individual investors participation is still above 50% though it was on a decline for resent years. Some past research showed that news do affect returns of stocks. If we can find a way to extract the information in the news and build a trading strategy based on it, investors will gain additional profit from using the strategy─whether they combine the strategy with another.
This study use SVM algorithm for automatic classification and for predicting Taiwan stock price trends after a news published. By applying the improved TF-IDF method developed by Chang et al., the process of characteristic selection become more accurate. This study analyze thousands of news articles which come from two different source, cnYES and Taiwan Economic Journal (TEJ), in order to make the predicting model representative and stable. However, the empirical results show that the precision of the model isn’t good enough. This study find no evidence that the information in news contents associate with Taiwan stock returns.
en_US
dc.description.tableofcontents 第一章 緒論 1
第一節 研究背景 1
第二節 研究目的與架構 2
第二章 文獻回顧 3
第一節 新聞媒體與股價 3
第二節 文字探勘 4
一、 中文斷詞 5
二、 特徵詞選取 5
第三章 研究流程與方法 6
第一節 研究流程 6
第二節 研究方法 8
一、 資料來源 8
二、 中文斷詞 8
三、 特徵詞選取 9
四、 新聞漲跌類別標籤 10
五、 分類模型─支援向量機 10
六、 評估分類成效 13
第四章 實驗設計與實證結果 15
第一節 實驗設計 15
第二節 實證結果 16
實驗一:天數與門檻值之影響 16
實驗二:不同新聞來源之比較 20
實驗三:建構交易策略 23
第五章 結論 25
參考文獻 26
zh_TW
dc.format.extent 1301587 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0103352019en_US
dc.subject (關鍵詞) 文字探勘zh_TW
dc.subject (關鍵詞) svmzh_TW
dc.subject (關鍵詞) 新聞zh_TW
dc.subject (關鍵詞) 股市zh_TW
dc.title (題名) 應用網路新聞文字探勘於預測台灣股價趨勢之研究zh_TW
dc.title (題名) A study of forecasting Taiwan stock price trends by applying news text mining techniqueen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) 1.Barber, B. M., & Odean, T. (2008). All that glitters: The effect of attention and news on the buying behavior of individual and institutional investors. Review of Financial Studies, 21(2), 785-818.
2.Chen, K. J., & Liu, S. H. (1992, August). Word identification for Mandarin Chinese sentences. In Proceedings of the 14th conference on Computational linguistics-Volume 1 (pp. 101-107). Association for Computational Linguistics.
3.Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning,20(3), 273-297.
4.Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of machine learning research, 3(Mar), 1289-1305.
5.Gidofalvi, G., & Elkan, C. (2001). Using news articles to predict stock price movements. Department of Computer Science and Engineering, University of California, San Diego.
6.Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification.
7.Lavrenko, V., Schmill, M., Lawrie, D., Ogilvie, P., Jensen, D., & Allan, J. (2000, November). Language models for financial news recommendation. InProceedings of the ninth international conference on Information and knowledge management (pp. 389-396). ACM.
8.Merton, R. C. (1987). A simple model of capital market equilibrium with incomplete information. The journal of finance, 42(3), 483-510.
9.Mittermayer, M. A. (2004). Forecasting intraday stock price trends with text mining techniques. In System Sciences, 2004. Proceedings of the 37th Annual Hawaii International Conference on (pp. 10-pp). IEEE.
10.Nie, J. Y., Brisebois, M., & Ren, X. (1996). On Chinese text retrieval. In Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 225-233). ACM.
11.Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5), 513-523.
12.Salton, G., & McGill, M. J. (1986). Introduction to modern information retrieval.
13.Sproat, R. (1990). A STATISTICAL METHOD FOR FINDING WORD BOUNDARIES IN CHINESE TEXT.
14.Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. The Journal of Finance, 62(3), 1139-1168.
15.Witten, I. H. (2005). Text mining. Practical handbook of Internet computing, 14-1.
16.Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information retrieval, 1(1-2), 69-90.
17.池祥萱, 林煜恩, 陳韋如 & 周賓凰. (2009). Does CEO Media Coverage Affect Firm Performance?. 交大管理學報, 1, 139-173.
18.張玉芳, 彭時名 & 呂佳. (2006). 基於文本分類 TFIDF 方法的改進與應用. 電腦工程, 32(19), 76-78.
19.鍾任明, 李維平, & 吳澤民. (2005). 運用文字探勘於日內股價
zh_TW