學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 以文字探勘為基礎之財務風險分析方法研究
Exploring Financial Risk via Text Mining Approaches
作者 劉澤
貢獻者 蔡銘峰
劉澤
關鍵詞 文字探勘
財務風險
Text Mining
Financial Risk
日期 2015
上傳時間 1-Oct-2015 14:17:50 (UTC+8)
摘要 近年來有許多研究將機器學習應用於財務方面的股價走勢與風險預 測。透過分析股票價格、財報的文字資訊、財經新聞或者更即時的推 特推文,都有不同的應用方式可以做出一定程度的投資風險評估與股 價走勢預測。在這篇論文中,我們著重在財務報表中的文字資訊,並 利用文字資訊於財務風險評估的問題上。我們以財報中的文字資訊預 測上市公司的風險程度,在此論文中我們選用股價波動度作為衡量財 務風險的評量方法。在文字的處理上,我們首先利用財金領域的情緒 字典改善原有的文字模型,情緒分析的研究指出情緒字能更有效率地 反應文章中的意見或是對於事件的看法,因而能有效地降低文字資訊 的雜訊並且提升財報文字資訊預測時的準確率。其次,我們嘗試以權 重的方式將股價與投資報酬率等數值資訊帶入機器學習模型中,在學 習模型時我們根據公司財報中的數值資訊,給予不同公司財報中的文 字資訊權重,並且透過不同權重設定的支持向量機將財報中的文字資 訊結合。根據我們的實驗結果顯示,財務情緒字典能有效地代表財報 中的文字資訊,同時,財務情緒字與公司的風險高度相關。在財務情 緒字以權重的方式將股價與投資報酬率結合的實驗結果中,數值資訊 顯著地提升了風險預測的準確率。
In recent years, there have been some studies using machine learning techniques to predict stock tendency and investment risks in finance. There have also been some applications that analyze the textual information in fi- nancial reports, financial news, or even twitters on social network to provide useful information for stock investors. In this paper, we focus on the problem that uses the textual information in financial reports and numerical informa- tion of companies to predict the financial risk. We use the textual information in financial report of companies to predict the financial risk in the following year. We utilize stock volatility to measure financial risk. In the first part of the thesis, we use a finance-specific sentiment lexicon to improve the pre- diction models that are trained only textual information of financial reports. Then we also provide a sentiment analysis to the results. In the second part of the thesis, we attempt to combine the textual information and the numeri- cal information, such as stock returns to further improve the performance of the prediction models. In specific, in the proposed approach each company instance associated with its financial textual information will be weighted by its stock returns by using the cost-sensitive learning techniques. Our experi- mental results show that, finance-specific sentiment lexicon models conduct comparable performance to those on the original texts, which confirms the importance of financial sentiment words on risk prediction. More impor- tantly, the learned models suggest strong correlations between financial sen- timent words and risk of companies. In addition, our cost-sensitive results significantly improve the cost-insensitive results. As a result, these findings identify the impact of sentiment words in financial reports, and the numerical information can be utilized as the cost weights of learning techniques.
參考文獻 [1] J. Bae, C.-J. Kim, and C. R. Nelson. Why are stock returns and volatility negatively correlated? Journal of Empirical Finance, 14(1):41–58, 2007.
[2] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/ ̃cjlin/libsvm.
[3] K.-T. Chen, T.-J. Chen, and J.-C. Yen. Predicting future earnings change using numeric and textual information in financial reports. In Intelligence and Security Informatics, pages 54–63. Springer, 2009.
[4] N. Chen, A. S. Vieira, J. Duarte, B. Ribeiro, and J. C. Neves. Cost-sensitive learn- ing vector quantization for financial distress prediction. In Progress in Artificial Intelligence, pages 374–385. Springer, 2009.
[5] R. Engle. Risk and volatility: Econometric models and financial practice. American Economic Review, pages 405–420, 2004.
[6] R. Feldman. Techniques and applications for sentiment analysis. Communications of the ACM, 56(4):82–89, 2013.
[7] G. P. C. Fung, J. X. Yu, and W. Lam. Stock prediction: Integrating text mining approach using real-time news. In Computational Intelligence for Financial Engi- neering, pages 395–402. IEEE, 2003.
[8] D. Garcia. Sentiment during recessions. The Journal of Finance, 68(3):1267–1300, 2013.
[9] T.Joachims.Makinglargescalesvmlearningpractical.Technicalreport,Universita ̈t Dortmund, 1999.
[10] T. Joachims. Training linear svms in linear time. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 217–226. ACM, 2006.
29
[11] S. Kogan, D. Levin, B. R. Routledge, J. S. Sagi, and N. A. Smith. Predicting risk from financial reports with regression. In The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 272–280. ACL, 2009.
[12] A. J. Lee, M.-C. Lin, R.-T. Kao, and K.-T. Chen. An effective clustering approach to stock market prediction. In Pacific Asia Conference on Information Systems, pages 345–354, 2010.
[13] H.-T. Lin. A simple cost-sensitive multiclass classification algorithm using one- versus-one comparisons. National Taiwan University, Tech. Rep, 2010.
[14] T. Loughran and B. McDonald. When is a liability not a liability? textual analysis, dictionaries, and 10-ks. The Journal of Finance, 66(1):35–65, 2011.
[15] S. M. Mohammad and P. D. Turney. Emotions evoked by common words and phrases: Using mechanical turk to create an emotion lexicon. In Workshop on Com- putational Approaches to Analysis and Generation of Emotion in Text, pages 26–34. ACL, 2010.
[16] R. Narayanan, B. Liu, and A. Choudhary. Sentiment analysis of conditional sen- tences. In Conference on Empirical Methods in Natural Language Processing, vol- ume 1, pages 180–189. ACL, 2009.
[17] A. Nikfarjam, E. Emadzadeh, and S. Muthaiyah. Text mining approaches for stock market prediction. In International Conference on Computer and Automation Engi- neering, volume 4, pages 256–260. IEEE, 2010.
[18] A. Pak and P. Paroubek. Twitter as a corpus for sentiment analysis and opinion min- ing. In Language Resources and Evaluation Conference, volume 10, pages 1320– 1326, 2010.
[19] B.PangandL.Lee.Opinionminingandsentimentanalysis.Foundationsandtrends in information retrieval, 2(1-2):1–135, 2008.
[20] M. A. Petersen. Information: Hard and soft. Technical report, working paper, North- western University, 2004.
[21] R. P. Schumaker and H. Chen. Textual analysis of stock market prediction using breaking financial news: The azfin text system. ACM Transactions on Information Systems, 27(2):1–29, 2009.
30
[22] A. Smola and V. Vapnik. Support vector regression machines. Advances in neural information processing systems, 9:155–161, 1997.
[23] S. Takahashi, M. Takahashi, H. Takahashi, and K. Tsuda. Analysis of stock price return using textual data and numerical data through text mining. In Knowledge- Based Intelligent Information and Engineering Systems, pages 310–316. Springer, 2006.
[24] M.-F. Tsai and C.-J. Wang. Risk ranking from financial reports. In Advances in Information Retrieval, pages 804–807. Springer, 2013.
[25] R. S. Tsay. Analysis of financial time series, volume 543. John Wiley & Sons, 2005.
[26] B.Wuthrich,V.Cho,S.Leung,D.Permunetilleke,K.Sankaran,andJ.Zhang.Daily stock market forecast from textual web data. In International Conference on Sys- tems, Man, and Cybernetics, volume 3, pages 2720–2725. IEEE, 1998.
描述 碩士
國立政治大學
資訊科學學系
101753020
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0101753020
資料類型 thesis
dc.contributor.advisor 蔡銘峰zh_TW
dc.contributor.author (Authors) 劉澤zh_TW
dc.creator (作者) 劉澤zh_TW
dc.date (日期) 2015en_US
dc.date.accessioned 1-Oct-2015 14:17:50 (UTC+8)-
dc.date.available 1-Oct-2015 14:17:50 (UTC+8)-
dc.date.issued (上傳時間) 1-Oct-2015 14:17:50 (UTC+8)-
dc.identifier (Other Identifiers) G0101753020en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/78753-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學學系zh_TW
dc.description (描述) 101753020zh_TW
dc.description.abstract (摘要) 近年來有許多研究將機器學習應用於財務方面的股價走勢與風險預 測。透過分析股票價格、財報的文字資訊、財經新聞或者更即時的推 特推文,都有不同的應用方式可以做出一定程度的投資風險評估與股 價走勢預測。在這篇論文中,我們著重在財務報表中的文字資訊,並 利用文字資訊於財務風險評估的問題上。我們以財報中的文字資訊預 測上市公司的風險程度,在此論文中我們選用股價波動度作為衡量財 務風險的評量方法。在文字的處理上,我們首先利用財金領域的情緒 字典改善原有的文字模型,情緒分析的研究指出情緒字能更有效率地 反應文章中的意見或是對於事件的看法,因而能有效地降低文字資訊 的雜訊並且提升財報文字資訊預測時的準確率。其次,我們嘗試以權 重的方式將股價與投資報酬率等數值資訊帶入機器學習模型中,在學 習模型時我們根據公司財報中的數值資訊,給予不同公司財報中的文 字資訊權重,並且透過不同權重設定的支持向量機將財報中的文字資 訊結合。根據我們的實驗結果顯示,財務情緒字典能有效地代表財報 中的文字資訊,同時,財務情緒字與公司的風險高度相關。在財務情 緒字以權重的方式將股價與投資報酬率結合的實驗結果中,數值資訊 顯著地提升了風險預測的準確率。zh_TW
dc.description.abstract (摘要) In recent years, there have been some studies using machine learning techniques to predict stock tendency and investment risks in finance. There have also been some applications that analyze the textual information in fi- nancial reports, financial news, or even twitters on social network to provide useful information for stock investors. In this paper, we focus on the problem that uses the textual information in financial reports and numerical informa- tion of companies to predict the financial risk. We use the textual information in financial report of companies to predict the financial risk in the following year. We utilize stock volatility to measure financial risk. In the first part of the thesis, we use a finance-specific sentiment lexicon to improve the pre- diction models that are trained only textual information of financial reports. Then we also provide a sentiment analysis to the results. In the second part of the thesis, we attempt to combine the textual information and the numeri- cal information, such as stock returns to further improve the performance of the prediction models. In specific, in the proposed approach each company instance associated with its financial textual information will be weighted by its stock returns by using the cost-sensitive learning techniques. Our experi- mental results show that, finance-specific sentiment lexicon models conduct comparable performance to those on the original texts, which confirms the importance of financial sentiment words on risk prediction. More impor- tantly, the learned models suggest strong correlations between financial sen- timent words and risk of companies. In addition, our cost-sensitive results significantly improve the cost-insensitive results. As a result, these findings identify the impact of sentiment words in financial reports, and the numerical information can be utilized as the cost weights of learning techniques.en_US
dc.description.tableofcontents 致謝 3
中文摘要 4
Abstract 5
1 Introduction 1
2 Related Work 5
2.1 FinancialRiskPrediction.......................... 5
2.2 SentimentAnalysis ............................. 6
2.3 Cost-SensitiveClassification ........................ 6
3 Methodology 9
3.1 DefinitionofFinancialTerms........................ 9
3.1.1 DailyStockReturns ........................ 9
3.1.2 StockReturnVolatility....................... 10
3.2 Finance-SpecificSentimentLexicon .................... 10
3.3 ProblemFormulation ............................ 12
3.3.1 RegressionTask .......................... 12
3.3.2 RankingTask............................ 12
3.3.3 Cost-SensitiveTask......................... 13
4 Experimental Results 17
4.1 ExperimentalSettings............................ 17
4.1.1 Dataset ............................... 18
4.1.2 ExtractedFeatures ......................... 18
4.1.3 EvaluationMetrics ......................... 20
4.1.4 ParameterSettings ......................... 20
4.2 Finance-Specific Sentiment Lexicon Based Model . . . . . . . . . . . . . 21
4.3 Cost-SensitiveBasedModel ........................ 23
5 Conclusions 27
Bibliography 29
zh_TW
dc.format.extent 2201428 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0101753020en_US
dc.subject (關鍵詞) 文字探勘zh_TW
dc.subject (關鍵詞) 財務風險zh_TW
dc.subject (關鍵詞) Text Miningen_US
dc.subject (關鍵詞) Financial Risken_US
dc.title (題名) 以文字探勘為基礎之財務風險分析方法研究zh_TW
dc.title (題名) Exploring Financial Risk via Text Mining Approachesen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) [1] J. Bae, C.-J. Kim, and C. R. Nelson. Why are stock returns and volatility negatively correlated? Journal of Empirical Finance, 14(1):41–58, 2007.
[2] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/ ̃cjlin/libsvm.
[3] K.-T. Chen, T.-J. Chen, and J.-C. Yen. Predicting future earnings change using numeric and textual information in financial reports. In Intelligence and Security Informatics, pages 54–63. Springer, 2009.
[4] N. Chen, A. S. Vieira, J. Duarte, B. Ribeiro, and J. C. Neves. Cost-sensitive learn- ing vector quantization for financial distress prediction. In Progress in Artificial Intelligence, pages 374–385. Springer, 2009.
[5] R. Engle. Risk and volatility: Econometric models and financial practice. American Economic Review, pages 405–420, 2004.
[6] R. Feldman. Techniques and applications for sentiment analysis. Communications of the ACM, 56(4):82–89, 2013.
[7] G. P. C. Fung, J. X. Yu, and W. Lam. Stock prediction: Integrating text mining approach using real-time news. In Computational Intelligence for Financial Engi- neering, pages 395–402. IEEE, 2003.
[8] D. Garcia. Sentiment during recessions. The Journal of Finance, 68(3):1267–1300, 2013.
[9] T.Joachims.Makinglargescalesvmlearningpractical.Technicalreport,Universita ̈t Dortmund, 1999.
[10] T. Joachims. Training linear svms in linear time. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 217–226. ACM, 2006.
29
[11] S. Kogan, D. Levin, B. R. Routledge, J. S. Sagi, and N. A. Smith. Predicting risk from financial reports with regression. In The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 272–280. ACL, 2009.
[12] A. J. Lee, M.-C. Lin, R.-T. Kao, and K.-T. Chen. An effective clustering approach to stock market prediction. In Pacific Asia Conference on Information Systems, pages 345–354, 2010.
[13] H.-T. Lin. A simple cost-sensitive multiclass classification algorithm using one- versus-one comparisons. National Taiwan University, Tech. Rep, 2010.
[14] T. Loughran and B. McDonald. When is a liability not a liability? textual analysis, dictionaries, and 10-ks. The Journal of Finance, 66(1):35–65, 2011.
[15] S. M. Mohammad and P. D. Turney. Emotions evoked by common words and phrases: Using mechanical turk to create an emotion lexicon. In Workshop on Com- putational Approaches to Analysis and Generation of Emotion in Text, pages 26–34. ACL, 2010.
[16] R. Narayanan, B. Liu, and A. Choudhary. Sentiment analysis of conditional sen- tences. In Conference on Empirical Methods in Natural Language Processing, vol- ume 1, pages 180–189. ACL, 2009.
[17] A. Nikfarjam, E. Emadzadeh, and S. Muthaiyah. Text mining approaches for stock market prediction. In International Conference on Computer and Automation Engi- neering, volume 4, pages 256–260. IEEE, 2010.
[18] A. Pak and P. Paroubek. Twitter as a corpus for sentiment analysis and opinion min- ing. In Language Resources and Evaluation Conference, volume 10, pages 1320– 1326, 2010.
[19] B.PangandL.Lee.Opinionminingandsentimentanalysis.Foundationsandtrends in information retrieval, 2(1-2):1–135, 2008.
[20] M. A. Petersen. Information: Hard and soft. Technical report, working paper, North- western University, 2004.
[21] R. P. Schumaker and H. Chen. Textual analysis of stock market prediction using breaking financial news: The azfin text system. ACM Transactions on Information Systems, 27(2):1–29, 2009.
30
[22] A. Smola and V. Vapnik. Support vector regression machines. Advances in neural information processing systems, 9:155–161, 1997.
[23] S. Takahashi, M. Takahashi, H. Takahashi, and K. Tsuda. Analysis of stock price return using textual data and numerical data through text mining. In Knowledge- Based Intelligent Information and Engineering Systems, pages 310–316. Springer, 2006.
[24] M.-F. Tsai and C.-J. Wang. Risk ranking from financial reports. In Advances in Information Retrieval, pages 804–807. Springer, 2013.
[25] R. S. Tsay. Analysis of financial time series, volume 543. John Wiley & Sons, 2005.
[26] B.Wuthrich,V.Cho,S.Leung,D.Permunetilleke,K.Sankaran,andJ.Zhang.Daily stock market forecast from textual web data. In International Conference on Sys- tems, Man, and Cybernetics, volume 3, pages 2720–2725. IEEE, 1998.
zh_TW