dc.contributor.advisor | 林我聰<br>諶家蘭 | zh_TW |
dc.contributor.advisor | Lin, Woo-Tsong<br>Seng, Jia-Lang | en_US |
dc.contributor.author (Authors) | 賴哲霆 | zh_TW |
dc.contributor.author (Authors) | Lai,Jhe-Ting | en_US |
dc.creator (作者) | 賴哲霆 | zh_TW |
dc.creator (作者) | Lai,Jhe-Ting | en_US |
dc.date (日期) | 2006 | en_US |
dc.date.accessioned | 14-Sep-2009 09:14:11 (UTC+8) | - |
dc.date.available | 14-Sep-2009 09:14:11 (UTC+8) | - |
dc.date.issued (上傳時間) | 14-Sep-2009 09:14:11 (UTC+8) | - |
dc.identifier (Other Identifiers) | G0094356025 | en_US |
dc.identifier.uri (URI) | https://nccur.lib.nccu.edu.tw/handle/140.119/31091 | - |
dc.description (描述) | 碩士 | zh_TW |
dc.description (描述) | 國立政治大學 | zh_TW |
dc.description (描述) | 資訊管理研究所 | zh_TW |
dc.description (描述) | 94356025 | zh_TW |
dc.description (描述) | 95 | zh_TW |
dc.description.abstract (摘要) | 由於近幾年來網際網路電子資源的數量大量成長下,搜尋引擎技術的誕生為使用者帶來檢索資料文件上極高的便利與效率。但網路資源和使用者大量成長下,現有的關鍵字檢索技術已無法滿足使用者需求。然而「資訊擷取」就是將從檢索文件中擷取重要特定訊息或產生資訊間特定關係的一種技術。其不僅從文件中能過濾不必要的資訊,而且產生有興趣或特定的重要訊息和摘要。 企業評價即為一套收集、分析與應用財務或非財務資訊來評價企業的價值,其評估的結果可做為企業決策和無形資產買賣訂價之依據。目前在國內企業的財務報表、財務附註和財經新聞內容皆有與企業評價所需重要訊息和資料,並以網頁和PDF格式呈現。因此,本研究將對國內企業財務報表、財務附註和財經新聞為資料來源,以企業評價概念基礎下建立中文財務項資料的資訊擷取系統。從這些不同的異質資料來源中,擷取正確的財務項資料與其所對應之企業評價模型,以達成自動擷取企業評價資料。使用者能在最短的時間內取得相關有效評價資訊和學習評價模型,使資訊處理品質能夠提昇正確性和效率性。 | zh_TW |
dc.description.abstract (摘要) | Due to an increase in the wealth of electronic resources on the Internet in the past several years, the birth of the search engine has brought the utmost convenience and efficiency for users. However, searching for data by keyword retrieval techniques in information retrieval is not contented with some users’ specific demands due to a large number of network resources and users on the Internet. Information extraction (IE) is an improvement method which extracts the important specific event or produces specific relations among information from documents. IE can not only filter unnecessary information in any documents but also produce specific important messages and summaries that users are interested in. Business valuation is collecting, analyzing, and applying to financial or non- financial integral information to appraise the business value. The evaluated results are used in the commerce pricing for the business decision and intangible assets. There are specific information and events about business valuation stored in the Chinese financial statements, notes to financial statements, and financial news of Taiwan’s companies at present and data is presented by the HTML and PDF files. Hence, we developed an information extraction system of Chinese financial data for business valuation from the domestic business financial statements, notes to financial statements, and financial news as our data sources. We extracted the correct financial data and their corresponding business valuation model to achieve an automatic extraction in the financial data from these different heterogeneous data sources. Users can collect the relevant valid valuation information and learn valuation models concepts within a very short time to improve accuracy and efficiency in information processing quality. | en_US |
dc.description.tableofcontents | Table of Contents v List of Tables vii List of Figures ix Chapter 1 Introduction 1 1.1 Research Background and Motivation 1 1.2 Research Objective 2 1.3 Research Scope 2 1.4 Research Issue 4 1.5 Research Flow 4 1.6 Organization of Thesis 6 Chapter 2 Literature Review 7 2.1 Chinese Information Extraction 7 2.2 Chinese Word Segmentation 8 2.2.1 Chinese Segmentation Methods 10 2.2.2 Unknown Word Extraction 14 2.2.3 Part-of-Speech Tagging Models 18 2.2.4 Segmentation Models Based on Named Entities 20 2.3 Information Extraction Methods 22 2.3.1 Keyword Extraction for Pure-Text Data 22 2.3.2 Structural Extraction for Tabular Data 23 2.4 Business Valuation Models 25 2.4.1 Income-Based Approach 26 2.4.2 Market-Based Approach 31 2.4.3 Asset-Based Approach 32 2.5 Summary 32 Chapter 3 Research Model 35 3.1 Information Extraction on Financial Statements 36 3.1.1 Financial Statements 36 3.1.2 Keyword Extraction on Financial Statements 38 3.2 Information Extraction on Notes to Financial Statements 41 3.2.1 Notes to Financial Statements 41 3.2.2 PDF Converting Processing 42 3.2.3 Keyword Extraction on Notes to Financial Statements 42 3.3 Information Extraction on Financial News 43 3.3.1 Financial News 43 3.3.2 Chinese Word Segmentation System Model 43 3.3.3 Chinese Keyword Analyzing Model on Financial Data 44 3.3.3.1 Account Name Analyzing 45 3.3.3.2 Organization Name Analyzing 46 3.3.3.3 Time Analyzing 49 3.3.3.4 Money and Percent Analyzing 51 3.3.4 Keyword Extraction on Financial News 54 3.4 Valuation Model Analyzing Based on Concept Hierarchy 55 3.5 Summary 60 Chapter 4 Prototype Development 61 4.1 Prototype Platform and Architecture 61 4.2 Prototype System Design 62 4.2.1 Web Crawler Design 62 4.2.2 Domain Lexicon Tool 63 4.2.3 PDF Converting Tool 65 4.2.4 Knowledgebase Design 65 4.2.5 Information Extraction System Function Design 67 Chapter 5 Research Experiment 69 5.1 Experiment Design 69 5.2 Experiment Evaluation 69 5.3 Experiment Results 70 5.3.1 Experiment I: Financial Statements 70 5.3.2 Experiment II: Notes to Financial Statements 75 5.3.3 Experiment III: Financial News 78 Chapter 6 Research Implication and Discussion 87 6.1 Managerial Findings and Implications 87 6.2 Technological Findings and Implications 88 Chapter 7 Conclusion and Future Work 89 7.1 Conclusion 89 7.2 Future Work 89 References 91 Appendix A 95 Appendix B 97 | zh_TW |
dc.language.iso | en_US | - |
dc.source.uri (資料來源) | http://thesis.lib.nccu.edu.tw/record/#G0094356025 | en_US |
dc.subject (關鍵詞) | 資訊擷取 | zh_TW |
dc.subject (關鍵詞) | 企業評價 | zh_TW |
dc.subject (關鍵詞) | 財務項資料 | zh_TW |
dc.subject (關鍵詞) | Information Extraction | en_US |
dc.subject (關鍵詞) | Business Valuation | en_US |
dc.subject (關鍵詞) | Financial Data | en_US |
dc.title (題名) | 應用資訊擷取技術於企業評價財務項資料之取得 | zh_TW |
dc.title (題名) | An Application of Information Extraction in Collecting Financial Data for Business Valuation | en_US |
dc.type (資料類型) | thesis | en |
dc.relation.reference (參考文獻) | 1.卜小蝶 (1996)。圖書資訊檢索技術。文華圖書館管理資訊股份有限公司。 | zh_TW |
dc.relation.reference (參考文獻) | 2.中央研究院資訊科學所中文詞知識庫小組網站(Chinese Knowledge and Information Processing Group Website)。http://ckip.iis.sinica.edu.tw/CKIP | zh_TW |
dc.relation.reference (參考文獻) | 3.朱怡霖 (2002)。中文斷詞及專有名詞辨識之研究。國立台灣大學資訊工程研究所碩士論文,台北市。 | zh_TW |
dc.relation.reference (參考文獻) | 4.吳岱儒 (2003)。財務管理。全華科技圖書股份有限公司。 | zh_TW |
dc.relation.reference (參考文獻) | 5.吳啟銘 (2001)。企業評價:個案實證分析。智勝文化事業有限公司。 | zh_TW |
dc.relation.reference (參考文獻) | 6.洪國賜、盧聯生 (2001)。財務報表分析。三民書局。 | zh_TW |
dc.relation.reference (參考文獻) | 7.黃佳新 (2004)。關鍵字擷取與文件分類因子分析。國立清華大學工業工程與管理系碩士論文,新竹市。 | zh_TW |
dc.relation.reference (參考文獻) | 8.黃燕萍 (1999)。中文社會新聞文件資訊擷取。國立雲林科技大學資訊管理系碩士論文,雲林縣。 | zh_TW |
dc.relation.reference (參考文獻) | 9.葉政輝 (2002)。以語料為基礎的中文專有名詞的之研究。國立交通大學資訊科學所碩士論文,新竹市。 | zh_TW |
dc.relation.reference (參考文獻) | 10.Atlam, El-S., Fuketa, M., Kashiji, S., Nakata, H., & Aoe, J. (2002). A new method for construction filed association terms using co-occurrence words and declinable words information. IEEE International Conference on Systems, Man and Cybernetics, 4, pp. 1217-1224. | zh_TW |
dc.relation.reference (參考文獻) | 11.Baeza-Yates, R. & Ribeiro-Neto, B. (1999). Modern information retrieval. Addision Wesley Longman Publishing Co. Inc. | zh_TW |
dc.relation.reference (參考文獻) | 12.Cercone, N., Huang, X., Peng, F., & Schurmans, D. (2003). Applying machine learning to text segmentation for information retrieval. Information Retrieval, 6(3), pp. 333-362. | zh_TW |
dc.relation.reference (參考文獻) | 13.Chen, A., Gey, F. C., He, J., Meggs, J., & Xu, L. (1997). Chinese Text Retrieval Without Using a Dictionary. Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49. | zh_TW |
dc.relation.reference (參考文獻) | 14.Chen, F. Y., Chen, K. J., Huang, C. R., & Tsai, P. F. (1999). Sinica treebank. Computational Linguistics and Chinese Language Processing, 4(2), pp. 87-104. | zh_TW |
dc.relation.reference (參考文獻) | 15.Chen, K. J. & Bai, M. H. (1998). Unknown word detection for Chinese by a corpus-based learning method. International Journal of Computational Linguistics and Chinese Language Processing, 3(1), pp. 27-44. | zh_TW |
dc.relation.reference (參考文獻) | 16.Chen, K. J. & Liu, S. H. (1992). Word identification for Mandarin Chinese sentences. Proceedings of the 14th Conference on Computational Linguistics, 1, pp. 101-107. | zh_TW |
dc.relation.reference (參考文獻) | 17.Chen, K. J. & Ma, W. Y. (2001). Construction and management for Chinese corpus. Proceedings of Research on Computational Linguistics Conference, pp.175-191. | zh_TW |
dc.relation.reference (參考文獻) | 18.Chen, K. J. & Ma, W. Y. (2002). Unknown word extraction for Chinese documents. Proceedings of the 19th International Conference on Computational Linguistics, 1, pp. 1-7. | zh_TW |
dc.relation.reference (參考文獻) | 19.Chen, K. J. & Ma, W. Y. (2003). A bottom-up merging algorithm for Chinese unknown word extraction. Proceedings of SIGHAN, pp. 31-38 | zh_TW |
dc.relation.reference (參考文獻) | 20.Chen, K. J. & Ma, W. Y. (2005). Design of CKIP Chinese word segmentation system. Chinese and Oriental Languages Information Processing Society, 14(3), pp. 235-249. | zh_TW |
dc.relation.reference (參考文獻) | 21.Chen, K. J. & Tsai, Y. F. (2003). Context-rule model for pos tagging. Proceedings of PACLIC 17, pp.146-151. | zh_TW |
dc.relation.reference (參考文獻) | 22.Chien, L. F. & Pu, H. T. (1996). Important issues on Chinese retrieval. Computational Linguistics and Chinese Language Processing, 1(1), pp.205-221. | zh_TW |
dc.relation.reference (參考文獻) | 23.Fu, G. & Luke, K. K. (2003). A two-stage statistical word segmentation system for Chinese. Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, 17, Association for Computational Linguistics, pp. 156-159. | zh_TW |
dc.relation.reference (參考文獻) | 24.Gao, J., Li, M., & Huang, C. N. (2003). Improve source-channel models for Chinese word segmentation. Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, 1(3), pp. 272-279. | zh_TW |
dc.relation.reference (參考文獻) | 25.Goldstein, R.C. & Storey, V.C. (1994). Materialization. IEEE Transactions on Knowledge and Data Engineering, 6(5), pp.835-842. | zh_TW |
dc.relation.reference (參考文獻) | 26.Han, J., Cai, Y. & Cercone N., (1993). Data-driven discovery of quantitative rules in relation databases, IEEE Transactions on Knowledge and Data Engineering, 5(1), pp. 29-40. | zh_TW |
dc.relation.reference (參考文獻) | 27.Hsieh, Y. M., Yang, D. C., & Chen, K. J. (2006). Improve parsing performance by self-learning. Proceedings of ROCLING XVIII, pp 63-76. | zh_TW |
dc.relation.reference (參考文獻) | 28.Krupl, B., Herzog, M., & Gatterbauer, W. (2005). Using visual cues for extraction of tabular data from arbitrary HTML documents. Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, pp. 1000-1001. | zh_TW |
dc.relation.reference (參考文獻) | 29.Lee, R. C. T., Chang, R. C., Tseng, S. S. & Tsai, Y. T. (1999). Introduction to the design and analysis of algorithm (1). UNALIS Corp., pp. 419-423. | zh_TW |
dc.relation.reference (參考文獻) | 30.Li, W., Wong, K. F., & Yuan, C. (2003). A design of temporal event extraction from Chinese financial news. International Journal of Computer Processing of Oriental Languages, 16(1), pp. 21-39. | zh_TW |
dc.relation.reference (參考文獻) | 31.Liu, J., Nissim, D., & Thomas, J. (2002). Equity valuation using multiples. Journal of Account Research, 40(1). | zh_TW |
dc.relation.reference (參考文獻) | 32.Liu, T. & Wang, Z. (2005). Chinese unknown word identification based on local bi-gram model. International Journal of Computer Processing of Oriental Languages, 18(3), pp. 185-196. | zh_TW |
dc.relation.reference (參考文獻) | 33.Liu, Y., Mitra, P., Giles, C.L., & Bai, K.(2006). Automatic extraction of table metadata from digital documents. Digital Libraries,2006. JCDL’06. Proceedings of the 6th ACM/IEEE-CS Joint Conference on. | zh_TW |
dc.relation.reference (參考文獻) | 34.Lochovsky, F. H. & Wang, J. (2003). Data extraction and label assignment for Web database. Proceedings of the 12th International Conference on World Wide Web, pp. 187-196. | zh_TW |
dc.relation.reference (參考文獻) | 35.Maier D. (1978). The complexity of some problems on subsequences and supersequences. Journal of the ACM, 25(2), pp. 322-336. | zh_TW |
dc.relation.reference (參考文獻) | 36.Manning, C.D., Raghavan P., & Schutze, H. (2007). An introduction to information retriveal. Cambrige University Press Camvidge.England. | zh_TW |
dc.relation.reference (參考文獻) | 37.Nguyen, N. G., Hanny, Y. L. & Vo, T. T. (2005). An information extraction engine for Web discussion forums. Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, pp 978-979. | zh_TW |
dc.relation.reference (參考文獻) | 38.Peng, F., Huang, X., Schuurmans, D., & Cercone, N. (2002). Investigating the relationship between word segmentation performance and retrieval performance in Chinese IR. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 369-370. | zh_TW |
dc.relation.reference (參考文獻) | 39.Rosenfeld,B., Feldman, R., & Aumann, Y. (2002). Structural extraction from visual layout of documents. Proceedings of the eleventh international conference on Information and knowledge management, pp. 203-210. | zh_TW |
dc.relation.reference (參考文獻) | 40.Teahan, W.J., McNab, R., Wen, Y., & Witten, I. H. (2001). A compression-based algorithm for Chinese word segmentation. Computational Linguistics, 26(3), pp. 375–393. | zh_TW |
dc.relation.reference (參考文獻) | 41.Tseng, H. & Chen, K. J. (2002). Design of Chinese morphological analyzer. Proceeding of the First SIGHAN Workshop on Chinese Language Process, 18, pp. 1-7. | zh_TW |
dc.relation.reference (參考文獻) | 42.Wang, H. (2002). A study on noun sense disambiguation based on syntagmatic features. Computational Linguistics and Chinese Language Processing, 7(2), pp. 77-88. | zh_TW |
dc.relation.reference (參考文獻) | 43.Wong, K. & Xia, Y. (2005). An overview of temporal information extraction. International Journal of Computer Oriental Languages, 18(2), pp.137-152 | zh_TW |
dc.relation.reference (參考文獻) | 44.You, J.M. & Chen, K.J. (2004). Automatic semantic role assignment for a tree structure. Proceedings of the 3rd SIGHAN Workshop on Chinese Language Processing, ACL-04, Barcelona. | zh_TW |
dc.relation.reference (參考文獻) | 45.Zhai, Y. & Liu, B.(2005). Web data extraction based on partial tree alignment. Proceedings of the 14th international conference on World Wide Web, pp.76-85. | zh_TW |
dc.relation.reference (參考文獻) | 46.Zhang, J., Gao, J., & Zhou, M. (2000). Extraction of Chinese compound words -An experimental study on a very large corpus. Proceedings of the Second Workshop on Chinese Language Processing: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, 12, pp. 132-139. | zh_TW |
dc.relation.reference (參考文獻) | 47.Zhou, G. & Su J. (2003). Chinese efficient analyser integrating word segmentation, Part-Of-Speech Tagging, Partial Parsing and Full ParsingParsing. Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, 17, pp. 78-83. | zh_TW |