探索美國財務報表的主觀性詞彙與盈餘的關聯性:意見分析之應用

Publications-Theses

Article View/Open

pdf(623)

Publication Export

Google Scholar^TM

題名	探索美國財務報表的主觀性詞彙與盈餘的關聯性:意見分析之應用 Exploring the relationships between annual earnings and subjective expressions in US financial statements: opinion analysis applications
作者	陳建良 Chen, Chien Liang
貢獻者	劉昭麟<br>張元晨 Liu, Chao Lin<br>Chang, Yuan Chen 陳建良 Chen, Chien Liang
關鍵詞	意見探勘自然語言處理語意分析財務報表文字探勘資訊擷取 opinion mining natural language processing sentiment analysis financial text mining information extraction
日期	2010
上傳時間	4-Sep-2013 17:10:48 (UTC+8)
摘要	財務報表中的主觀性詞彙往往影響市場中的參與者對於報導公司價值和獲利能力衡量的決策判斷。因此，公司的管理階層往往有高度的動機小心謹慎的選擇用詞以隱藏負面的消息而宣揚正面的消息。然而使用人工方式從文字量極大的財務報表挖掘有用的資訊往往不可行，因此本研究採用人工智慧方法驗證美國財務報表中的主觀性多字詞 (subjective MWEs) 和公司的財務狀況是否具有關聯性。多字詞模型往往比傳統的單字詞模型更能掌握句子中的語意情境，因此本研究應用條件隨機域模型 (conditional random field) 辨識多字詞形式的意見樣式。另外，本研究的實證結果發現一些跡象可以印證一般人對於財務報表的文字揭露往往與真實的財務數字存在有落差的印象；更發現在負向的盈餘變化情況下，公司管理階層通常輕描淡寫當下的短拙卻堅定地承諾璀璨的未來。 Subjective assertions in financial statements influence the judgments of market participants when they assess the value and profitability of the reporting corporations. Hence, the managements of corporations may attempt to conceal the negative and to accentuate the positive with "prudent" wording. To excavate this accounting phenomenon hidden behind financial statements, we designed an artificial intelligence based strategy to investigate the linkage between financial status measured by annual earnings and subjective multi-word expressions (MWEs). We applied the conditional random field (CRF) models to identify opinion patterns in the form of MWEs, and our approach outperformed previous work employing unigram models. Moreover, our novel algorithms take the lead to discover the evidences that support the common belief that there are inconsistencies between the implications of the written statements and the reality indicated by the figures in the financial statements. Unexpected negative earnings are often accompanied by ambiguous and mild statements and sometimes by promises of glorious future.
參考文獻	[1] W. Antweiler and M. Z. Frank, “Is all that Talk just Noise? The Information Content of Internet Stock Message Boards,” Journal of Finance, 59(3), pp. 1259-1294, 2004. [2] Apache Lucene 3.0.0, http://lucene.apache.org/java/docs/index.html. [3] Automatic Statistical SEmantic Role Tagger-v0.14b (ASSERT), http://cemantix.org/assert.html. [4] Charniak Parser, http://www.cs.brown.edu/~ec/#software. [5] Y. Choi, C. Cardie, E. Riloff and S. Patwardhan, “Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns,” Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 355-362, 2005 [6] M. J. Collins, Head-Driven Statistical Models for Natural Language Parsing, Ph.D. thesis, University of Pennsylvania, 1999. [7] Electronic Data Gathering, Analysis and Retrieval system (EDGAR), http://www.sec.gov/edgar.shtml. [8] FrameNet, http://framenet.icsi.berkeley.edu. [9] D. Gildea and D. Jurafsky, “Automatic Labeling of Semantic Role,” Computational Linguistics, 28(3), pp. 245-288, 2002. [10] W. H. Greene, Econometric Analysis, Pearson Prentice Hall, 2008. [11] Illinois Chunker, http://cogcomp.cs.illinois.edu/page/software. [12] S.-M. Kim and E. Hovy, “Identifying Opinion Holders for Question Answering in Opinion Texts,” Proceedings of AAAI Workshop on Question Answering in Restricted Domains, pp. 20-26, 2005. [13] J. D. Lafferty, A. McCallum and F. C. N. Pereira, “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data,” Proceedings of the International Conference on Machine Learning, pp. 282-289, 2001. [14] F. Li, “Do Stock Market Investors Understand The Risk Sentiment Of Corporate Annual Reports?” University of Michigan Working Paper, 2006. [15] D. Lin, “Automatic Retrieval and Clustering of Similar Words.” Proceedings of the International Conference on Computational Linguistics (COLING)), pp. 768-774, 1998. [16] LingPipe 3.9 sentence model, http://alias-i.com/lingpipe. [17] B. Liu, “Sentiment Analysis and Subjectivity,” Handbook of Natural Language Processing, N. Indurkhya and F. J. Damerau (editors), CRC press , Second Edition, 2010. [18] T. Loughran and B. McDonald, “When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks,” Journal of Finance, 66(1), pp. 67-97, 2011. [19] MAchine Learning for LanguagE Toolkit-2.0.6 (MALLET), http://mallet.cs.umass.edu. [20] C. D. Manning, P. Raghavan and H. Schütze, Introduction to Information Retrieval, Cambridge University Press, 2009. [21] Multi-Perspective Question Answering 2.0 (MPQA), http://www.cs.pitt.edu/mpqa. [22] B. Pang, L. Lee and S. Vaithyanathan, “Thumbs up? Sentiment Classification Using Machine Learning Techniques,” Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 79-86, 2002. [23] F. Peng, F. Feng and A. McCallum, “Chinese Segmentation and New Word Detection using Conditional Random Fields,” Proceedings of the conference on Computational Linguistics, 2004. [24] R.W. Picard, E. Vyzas and J. Healey, “Toward Machine Emotional Intelligence: Analysis of Affective Physiological State,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(10), pp. 1175-1191, 2001. [25] S. Pradhan, W. Ward, K. Hacioglu, J. Martin and D. Jurafsky, “Shallow Semantic Parsing Using Support Vector Machines,” Proceedings of the Human Language Technology Conference/North American Chapter of the ACL, 2004. [26] L. A. Ramshaw and M. P. Marcus, “Text Chunking Using Transformation-based Learning,” Proceedings of the ACL Workshop on Very Large Corpora, pp 82–94, 1995. [27] E. Riloff and J. Wiebe, “Learning Extraction Patterns for Subjective Expressions,” Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 25-32, 2003. [28] J. Ronen and V. Yaari, Earnings Management: Emerging Insights in Theory, Practice, and Research, Springer-Verlag, 2008. [29] Standard & Poor’s Compustat Research Insight 8.4.1, http://www.compustat.com. [30] Stanford Dependencies manual, http://nlp.stanford.edu/software/dependencies_manual.pdf. [31] Stanford NLP Toolkits, http://nlp.stanford.edu/software. [32] Stata dataset of Compustat Quarterly Match to SEC Filings, http://faculty.chicagobooth.edu/amir.sufi/data.htm. [33] Stata/MP 11.2, http://www.stata.com. [34] P. C. Tetlock, “Giving Content to Investor Sentiment: The Role of Media in the Stock Market,” Journal of Finance, 62(3), pp.1139-1168, 2007. [35] P. C. Tetlock, M. Saar-Tsechansky and S. Macskassy, “More than Words: Quantifying Language to Measure Firms` Fundamentals,” Journal of Finance, 63(3), pp. 1437-1467, 2008. [36] J. Wiebe, R. Bruce and T. O’Hara, “Development and Use of a Gold Standard Data Set for Subjectivity Classifications,” Proceedings of the Annual Meeting of the ACL, pp. 246-253, 1999. [37] T. Wilson, J. Wiebe and P. Hoffmann, “Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis,” Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347-354, 2005.
描述	碩士國立政治大學資訊科學學系 98753013 99
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0987530132
資料類型	thesis

dc.contributor.advisor	劉昭麟<br>張元晨	zh_TW
dc.contributor.advisor	Liu, Chao Lin<br>Chang, Yuan Chen	en_US
dc.contributor.author (Authors)	陳建良	zh_TW
dc.contributor.author (Authors)	Chen, Chien Liang	en_US
dc.creator (作者)	陳建良	zh_TW
dc.creator (作者)	Chen, Chien Liang	en_US
dc.date (日期)	2010	en_US
dc.date.accessioned	4-Sep-2013 17:10:48 (UTC+8)	-
dc.date.available	4-Sep-2013 17:10:48 (UTC+8)	-
dc.date.issued (上傳時間)	4-Sep-2013 17:10:48 (UTC+8)	-
dc.identifier (Other Identifiers)	G0987530132	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/60264	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊科學學系	zh_TW
dc.description (描述)	98753013	zh_TW
dc.description (描述)	99	zh_TW
dc.description.abstract (摘要)	財務報表中的主觀性詞彙往往影響市場中的參與者對於報導公司價值和獲利能力衡量的決策判斷。因此，公司的管理階層往往有高度的動機小心謹慎的選擇用詞以隱藏負面的消息而宣揚正面的消息。然而使用人工方式從文字量極大的財務報表挖掘有用的資訊往往不可行，因此本研究採用人工智慧方法驗證美國財務報表中的主觀性多字詞 (subjective MWEs) 和公司的財務狀況是否具有關聯性。多字詞模型往往比傳統的單字詞模型更能掌握句子中的語意情境，因此本研究應用條件隨機域模型 (conditional random field) 辨識多字詞形式的意見樣式。另外，本研究的實證結果發現一些跡象可以印證一般人對於財務報表的文字揭露往往與真實的財務數字存在有落差的印象；更發現在負向的盈餘變化情況下，公司管理階層通常輕描淡寫當下的短拙卻堅定地承諾璀璨的未來。	zh_TW
dc.description.abstract (摘要)	Subjective assertions in financial statements influence the judgments of market participants when they assess the value and profitability of the reporting corporations. Hence, the managements of corporations may attempt to conceal the negative and to accentuate the positive with "prudent" wording. To excavate this accounting phenomenon hidden behind financial statements, we designed an artificial intelligence based strategy to investigate the linkage between financial status measured by annual earnings and subjective multi-word expressions (MWEs). We applied the conditional random field (CRF) models to identify opinion patterns in the form of MWEs, and our approach outperformed previous work employing unigram models. Moreover, our novel algorithms take the lead to discover the evidences that support the common belief that there are inconsistencies between the implications of the written statements and the reality indicated by the figures in the financial statements. Unexpected negative earnings are often accompanied by ambiguous and mild statements and sometimes by promises of glorious future.	en_US
dc.description.tableofcontents	CHAPTER 1 Introduction 1 1.1 Background 1 1.2 Methodology overview 2 1.3 Contributions 4 1.4 Organization 5 CHAPTER 2 Literature Review 7 2.1 Finance literature review 7 2.2 Computer science literature review 9 CHAPTER 3 Financial Data and Corpora 16 3.1 Annotated corpus: MPQA 16 3.2 Financial statements preprocessing 19 3.3 Quantitative financial data and data merging 20 CHAPTER 4 Models for Opinion Patterns Identification 23 4.1 Conditional random fields 24 4.2 Feature sets and linear chain CRF data view 28 4.2.1 Morphological and orthographical features 29 4.2.2 Predicate-argument structure features 32 4.2.3 Syntactic features 33 4.2.4 Simple semantic features 38 CHAPTER 5 Linkages between Earnings and Subjective MWEs 41 5.1 Dependent variable: standardized unexpected earnings 41 5.2 Explanatory variables: MWEf-idf and control variables 42 5.3 Multinomial logistic regression 44 5.4 Strategies of discriminative MWE identification 45 CHAPTER 6 Experimental evaluation of CRF models 48 6.1 Design of the experiments 48 6.2 Experimental results 51 CHAPTER 7 Empirical study of earnings and subjective MWEs 57 7.1 Opinion patterns extraction from financial statements 57 7.2 Empirical results of small dataset 61 7.3 Robustness tests of large dataset 63 7.4 Analysis of the economic meanings of subjective MWEs 66 CHAPTER 8 Conclusions 71 8.1 Discussions 72 8.2 Future work 73 References 75 Appendix 79	zh_TW
dc.format.extent	1293510 bytes	-
dc.format.mimetype	application/pdf	-
dc.language.iso	en_US	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0987530132	en_US
dc.subject (關鍵詞)	意見探勘	zh_TW
dc.subject (關鍵詞)	自然語言處理	zh_TW
dc.subject (關鍵詞)	語意分析	zh_TW
dc.subject (關鍵詞)	財務報表文字探勘	zh_TW
dc.subject (關鍵詞)	資訊擷取	zh_TW
dc.subject (關鍵詞)	opinion mining	en_US
dc.subject (關鍵詞)	natural language processing	en_US
dc.subject (關鍵詞)	sentiment analysis	en_US
dc.subject (關鍵詞)	financial text mining	en_US
dc.subject (關鍵詞)	information extraction	en_US
dc.title (題名)	探索美國財務報表的主觀性詞彙與盈餘的關聯性:意見分析之應用	zh_TW
dc.title (題名)	Exploring the relationships between annual earnings and subjective expressions in US financial statements: opinion analysis applications	en_US
dc.type (資料類型)	thesis	en
dc.relation.reference (參考文獻)	[1] W. Antweiler and M. Z. Frank, “Is all that Talk just Noise? The Information Content of Internet Stock Message Boards,” Journal of Finance, 59(3), pp. 1259-1294, 2004. [2] Apache Lucene 3.0.0, http://lucene.apache.org/java/docs/index.html. [3] Automatic Statistical SEmantic Role Tagger-v0.14b (ASSERT), http://cemantix.org/assert.html. [4] Charniak Parser, http://www.cs.brown.edu/~ec/#software. [5] Y. Choi, C. Cardie, E. Riloff and S. Patwardhan, “Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns,” Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 355-362, 2005 [6] M. J. Collins, Head-Driven Statistical Models for Natural Language Parsing, Ph.D. thesis, University of Pennsylvania, 1999. [7] Electronic Data Gathering, Analysis and Retrieval system (EDGAR), http://www.sec.gov/edgar.shtml. [8] FrameNet, http://framenet.icsi.berkeley.edu. [9] D. Gildea and D. Jurafsky, “Automatic Labeling of Semantic Role,” Computational Linguistics, 28(3), pp. 245-288, 2002. [10] W. H. Greene, Econometric Analysis, Pearson Prentice Hall, 2008. [11] Illinois Chunker, http://cogcomp.cs.illinois.edu/page/software. [12] S.-M. Kim and E. Hovy, “Identifying Opinion Holders for Question Answering in Opinion Texts,” Proceedings of AAAI Workshop on Question Answering in Restricted Domains, pp. 20-26, 2005. [13] J. D. Lafferty, A. McCallum and F. C. N. Pereira, “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data,” Proceedings of the International Conference on Machine Learning, pp. 282-289, 2001. [14] F. Li, “Do Stock Market Investors Understand The Risk Sentiment Of Corporate Annual Reports?” University of Michigan Working Paper, 2006. [15] D. Lin, “Automatic Retrieval and Clustering of Similar Words.” Proceedings of the International Conference on Computational Linguistics (COLING)), pp. 768-774, 1998. [16] LingPipe 3.9 sentence model, http://alias-i.com/lingpipe. [17] B. Liu, “Sentiment Analysis and Subjectivity,” Handbook of Natural Language Processing, N. Indurkhya and F. J. Damerau (editors), CRC press , Second Edition, 2010. [18] T. Loughran and B. McDonald, “When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks,” Journal of Finance, 66(1), pp. 67-97, 2011. [19] MAchine Learning for LanguagE Toolkit-2.0.6 (MALLET), http://mallet.cs.umass.edu. [20] C. D. Manning, P. Raghavan and H. Schütze, Introduction to Information Retrieval, Cambridge University Press, 2009. [21] Multi-Perspective Question Answering 2.0 (MPQA), http://www.cs.pitt.edu/mpqa. [22] B. Pang, L. Lee and S. Vaithyanathan, “Thumbs up? Sentiment Classification Using Machine Learning Techniques,” Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 79-86, 2002. [23] F. Peng, F. Feng and A. McCallum, “Chinese Segmentation and New Word Detection using Conditional Random Fields,” Proceedings of the conference on Computational Linguistics, 2004. [24] R.W. Picard, E. Vyzas and J. Healey, “Toward Machine Emotional Intelligence: Analysis of Affective Physiological State,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(10), pp. 1175-1191, 2001. [25] S. Pradhan, W. Ward, K. Hacioglu, J. Martin and D. Jurafsky, “Shallow Semantic Parsing Using Support Vector Machines,” Proceedings of the Human Language Technology Conference/North American Chapter of the ACL, 2004. [26] L. A. Ramshaw and M. P. Marcus, “Text Chunking Using Transformation-based Learning,” Proceedings of the ACL Workshop on Very Large Corpora, pp 82–94, 1995. [27] E. Riloff and J. Wiebe, “Learning Extraction Patterns for Subjective Expressions,” Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 25-32, 2003. [28] J. Ronen and V. Yaari, Earnings Management: Emerging Insights in Theory, Practice, and Research, Springer-Verlag, 2008. [29] Standard & Poor’s Compustat Research Insight 8.4.1, http://www.compustat.com. [30] Stanford Dependencies manual, http://nlp.stanford.edu/software/dependencies_manual.pdf. [31] Stanford NLP Toolkits, http://nlp.stanford.edu/software. [32] Stata dataset of Compustat Quarterly Match to SEC Filings, http://faculty.chicagobooth.edu/amir.sufi/data.htm. [33] Stata/MP 11.2, http://www.stata.com. [34] P. C. Tetlock, “Giving Content to Investor Sentiment: The Role of Media in the Stock Market,” Journal of Finance, 62(3), pp.1139-1168, 2007. [35] P. C. Tetlock, M. Saar-Tsechansky and S. Macskassy, “More than Words: Quantifying Language to Measure Firms` Fundamentals,” Journal of Finance, 63(3), pp. 1437-1467, 2008. [36] J. Wiebe, R. Bruce and T. O’Hara, “Development and Use of a Gold Standard Data Set for Subjectivity Classifications,” Proceedings of the Annual Meeting of the ACL, pp. 246-253, 1999. [37] T. Wilson, J. Wiebe and P. Hoffmann, “Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis,” Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347-354, 2005.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM