學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 應用資訊擷取技術於企業評價財務項資料之取得
An Application of Information Extraction in Collecting Financial Data for Business Valuation
作者 賴哲霆
Lai,Jhe-Ting
貢獻者 林我聰<br>諶家蘭
Lin, Woo-Tsong<br>Seng, Jia-Lang
賴哲霆
Lai,Jhe-Ting
關鍵詞 資訊擷取
企業評價
財務項資料
Information Extraction
Business Valuation
Financial Data
日期 2006
上傳時間 14-Sep-2009 09:14:11 (UTC+8)
摘要 由於近幾年來網際網路電子資源的數量大量成長下,搜尋引擎技術的誕生為使用者帶來檢索資料文件上極高的便利與效率。但網路資源和使用者大量成長下,現有的關鍵字檢索技術已無法滿足使用者需求。然而「資訊擷取」就是將從檢索文件中擷取重要特定訊息或產生資訊間特定關係的一種技術。其不僅從文件中能過濾不必要的資訊,而且產生有興趣或特定的重要訊息和摘要。
      企業評價即為一套收集、分析與應用財務或非財務資訊來評價企業的價值,其評估的結果可做為企業決策和無形資產買賣訂價之依據。目前在國內企業的財務報表、財務附註和財經新聞內容皆有與企業評價所需重要訊息和資料,並以網頁和PDF格式呈現。因此,本研究將對國內企業財務報表、財務附註和財經新聞為資料來源,以企業評價概念基礎下建立中文財務項資料的資訊擷取系統。從這些不同的異質資料來源中,擷取正確的財務項資料與其所對應之企業評價模型,以達成自動擷取企業評價資料。使用者能在最短的時間內取得相關有效評價資訊和學習評價模型,使資訊處理品質能夠提昇正確性和效率性。
Due to an increase in the wealth of electronic resources on the Internet in the past several years, the birth of the search engine has brought the utmost convenience and efficiency for users. However, searching for data by keyword retrieval techniques in information retrieval is not contented with some users’ specific demands due to a large number of network resources and users on the Internet. Information extraction (IE) is an improvement method which extracts the important specific event or produces specific relations among information from documents. IE can not only filter unnecessary information in any documents but also produce specific important messages and summaries that users are interested in.
      Business valuation is collecting, analyzing, and applying to financial or non- financial integral information to appraise the business value. The evaluated results are used in the commerce pricing for the business decision and intangible assets. There are specific information and events about business valuation stored in the Chinese financial statements, notes to financial statements, and financial news of Taiwan’s companies at present and data is presented by the HTML and PDF files. Hence, we developed an information extraction system of Chinese financial data for business valuation from the domestic business financial statements, notes to financial statements, and financial news as our data sources. We extracted the correct financial data and their corresponding business valuation model to achieve an automatic extraction in the financial data from these different heterogeneous data sources. Users can collect the relevant valid valuation information and learn valuation models concepts within a very short time to improve accuracy and efficiency in information processing quality.
參考文獻 1.卜小蝶 (1996)。圖書資訊檢索技術。文華圖書館管理資訊股份有限公司。
2.中央研究院資訊科學所中文詞知識庫小組網站(Chinese Knowledge and Information Processing Group Website)。http://ckip.iis.sinica.edu.tw/CKIP
3.朱怡霖 (2002)。中文斷詞及專有名詞辨識之研究。國立台灣大學資訊工程研究所碩士論文,台北市。
4.吳岱儒 (2003)。財務管理。全華科技圖書股份有限公司。
5.吳啟銘 (2001)。企業評價:個案實證分析。智勝文化事業有限公司。
6.洪國賜、盧聯生 (2001)。財務報表分析。三民書局。
7.黃佳新 (2004)。關鍵字擷取與文件分類因子分析。國立清華大學工業工程與管理系碩士論文,新竹市。
8.黃燕萍 (1999)。中文社會新聞文件資訊擷取。國立雲林科技大學資訊管理系碩士論文,雲林縣。
9.葉政輝 (2002)。以語料為基礎的中文專有名詞的之研究。國立交通大學資訊科學所碩士論文,新竹市。
10.Atlam, El-S., Fuketa, M., Kashiji, S., Nakata, H., & Aoe, J. (2002). A new method for construction filed association terms using co-occurrence words and declinable words information. IEEE International Conference on Systems, Man and Cybernetics, 4, pp. 1217-1224.
11.Baeza-Yates, R. & Ribeiro-Neto, B. (1999). Modern information retrieval. Addision Wesley Longman Publishing Co. Inc.
12.Cercone, N., Huang, X., Peng, F., & Schurmans, D. (2003). Applying machine learning to text segmentation for information retrieval. Information Retrieval, 6(3), pp. 333-362.
13.Chen, A., Gey, F. C., He, J., Meggs, J., & Xu, L. (1997). Chinese Text Retrieval Without Using a Dictionary. Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49.
14.Chen, F. Y., Chen, K. J., Huang, C. R., & Tsai, P. F. (1999). Sinica treebank. Computational Linguistics and Chinese Language Processing, 4(2), pp. 87-104.
15.Chen, K. J. & Bai, M. H. (1998). Unknown word detection for Chinese by a corpus-based learning method. International Journal of Computational Linguistics and Chinese Language Processing, 3(1), pp. 27-44.
16.Chen, K. J. & Liu, S. H. (1992). Word identification for Mandarin Chinese sentences. Proceedings of the 14th Conference on Computational Linguistics, 1, pp. 101-107.
17.Chen, K. J. & Ma, W. Y. (2001). Construction and management for Chinese corpus. Proceedings of Research on Computational Linguistics Conference, pp.175-191.
18.Chen, K. J. & Ma, W. Y. (2002). Unknown word extraction for Chinese documents. Proceedings of the 19th International Conference on Computational Linguistics, 1, pp. 1-7.
19.Chen, K. J. & Ma, W. Y. (2003). A bottom-up merging algorithm for Chinese unknown word extraction. Proceedings of SIGHAN, pp. 31-38
20.Chen, K. J. & Ma, W. Y. (2005). Design of CKIP Chinese word segmentation system. Chinese and Oriental Languages Information Processing Society, 14(3), pp. 235-249.
21.Chen, K. J. & Tsai, Y. F. (2003). Context-rule model for pos tagging. Proceedings of PACLIC 17, pp.146-151.
22.Chien, L. F. & Pu, H. T. (1996). Important issues on Chinese retrieval. Computational Linguistics and Chinese Language Processing, 1(1), pp.205-221.
23.Fu, G. & Luke, K. K. (2003). A two-stage statistical word segmentation system for Chinese. Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, 17, Association for Computational Linguistics, pp. 156-159.
24.Gao, J., Li, M., & Huang, C. N. (2003). Improve source-channel models for Chinese word segmentation. Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, 1(3), pp. 272-279.
25.Goldstein, R.C. & Storey, V.C. (1994). Materialization. IEEE Transactions on Knowledge and Data Engineering, 6(5), pp.835-842.
26.Han, J., Cai, Y. & Cercone N., (1993). Data-driven discovery of quantitative rules in relation databases, IEEE Transactions on Knowledge and Data Engineering, 5(1), pp. 29-40.
27.Hsieh, Y. M., Yang, D. C., & Chen, K. J. (2006). Improve parsing performance by self-learning. Proceedings of ROCLING XVIII, pp 63-76.
28.Krupl, B., Herzog, M., & Gatterbauer, W. (2005). Using visual cues for extraction of tabular data from arbitrary HTML documents. Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, pp. 1000-1001.
29.Lee, R. C. T., Chang, R. C., Tseng, S. S. & Tsai, Y. T. (1999). Introduction to the design and analysis of algorithm (1). UNALIS Corp., pp. 419-423.
30.Li, W., Wong, K. F., & Yuan, C. (2003). A design of temporal event extraction from Chinese financial news. International Journal of Computer Processing of Oriental Languages, 16(1), pp. 21-39.
31.Liu, J., Nissim, D., & Thomas, J. (2002). Equity valuation using multiples. Journal of Account Research, 40(1).
32.Liu, T. & Wang, Z. (2005). Chinese unknown word identification based on local bi-gram model. International Journal of Computer Processing of Oriental Languages, 18(3), pp. 185-196.
33.Liu, Y., Mitra, P., Giles, C.L., & Bai, K.(2006). Automatic extraction of table metadata from digital documents. Digital Libraries,2006. JCDL’06. Proceedings of the 6th ACM/IEEE-CS Joint Conference on.
34.Lochovsky, F. H. & Wang, J. (2003). Data extraction and label assignment for Web database. Proceedings of the 12th International Conference on World Wide Web, pp. 187-196.
35.Maier D. (1978). The complexity of some problems on subsequences and supersequences. Journal of the ACM, 25(2), pp. 322-336.
36.Manning, C.D., Raghavan P., & Schutze, H. (2007). An introduction to information retriveal. Cambrige University Press Camvidge.England.
37.Nguyen, N. G., Hanny, Y. L. & Vo, T. T. (2005). An information extraction engine for Web discussion forums. Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, pp 978-979.
38.Peng, F., Huang, X., Schuurmans, D., & Cercone, N. (2002). Investigating the relationship between word segmentation performance and retrieval performance in Chinese IR. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 369-370.
39.Rosenfeld,B., Feldman, R., & Aumann, Y. (2002). Structural extraction from visual layout of documents. Proceedings of the eleventh international conference on Information and knowledge management, pp. 203-210.
40.Teahan, W.J., McNab, R., Wen, Y., & Witten, I. H. (2001). A compression-based algorithm for Chinese word segmentation. Computational Linguistics, 26(3), pp. 375–393.
41.Tseng, H. & Chen, K. J. (2002). Design of Chinese morphological analyzer. Proceeding of the First SIGHAN Workshop on Chinese Language Process, 18, pp. 1-7.
42.Wang, H. (2002). A study on noun sense disambiguation based on syntagmatic features. Computational Linguistics and Chinese Language Processing, 7(2), pp. 77-88.
43.Wong, K. & Xia, Y. (2005). An overview of temporal information extraction. International Journal of Computer Oriental Languages, 18(2), pp.137-152
44.You, J.M. & Chen, K.J. (2004). Automatic semantic role assignment for a tree structure. Proceedings of the 3rd SIGHAN Workshop on Chinese Language Processing, ACL-04, Barcelona.
45.Zhai, Y. & Liu, B.(2005). Web data extraction based on partial tree alignment. Proceedings of the 14th international conference on World Wide Web, pp.76-85.
46.Zhang, J., Gao, J., & Zhou, M. (2000). Extraction of Chinese compound words -An experimental study on a very large corpus. Proceedings of the Second Workshop on Chinese Language Processing: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, 12, pp. 132-139.
47.Zhou, G. & Su J. (2003). Chinese efficient analyser integrating word segmentation, Part-Of-Speech Tagging, Partial Parsing and Full ParsingParsing. Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, 17, pp. 78-83.
描述 碩士
國立政治大學
資訊管理研究所
94356025
95
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0094356025
資料類型 thesis
dc.contributor.advisor 林我聰<br>諶家蘭zh_TW
dc.contributor.advisor Lin, Woo-Tsong<br>Seng, Jia-Langen_US
dc.contributor.author (Authors) 賴哲霆zh_TW
dc.contributor.author (Authors) Lai,Jhe-Tingen_US
dc.creator (作者) 賴哲霆zh_TW
dc.creator (作者) Lai,Jhe-Tingen_US
dc.date (日期) 2006en_US
dc.date.accessioned 14-Sep-2009 09:14:11 (UTC+8)-
dc.date.available 14-Sep-2009 09:14:11 (UTC+8)-
dc.date.issued (上傳時間) 14-Sep-2009 09:14:11 (UTC+8)-
dc.identifier (Other Identifiers) G0094356025en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/31091-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊管理研究所zh_TW
dc.description (描述) 94356025zh_TW
dc.description (描述) 95zh_TW
dc.description.abstract (摘要) 由於近幾年來網際網路電子資源的數量大量成長下,搜尋引擎技術的誕生為使用者帶來檢索資料文件上極高的便利與效率。但網路資源和使用者大量成長下,現有的關鍵字檢索技術已無法滿足使用者需求。然而「資訊擷取」就是將從檢索文件中擷取重要特定訊息或產生資訊間特定關係的一種技術。其不僅從文件中能過濾不必要的資訊,而且產生有興趣或特定的重要訊息和摘要。
      企業評價即為一套收集、分析與應用財務或非財務資訊來評價企業的價值,其評估的結果可做為企業決策和無形資產買賣訂價之依據。目前在國內企業的財務報表、財務附註和財經新聞內容皆有與企業評價所需重要訊息和資料,並以網頁和PDF格式呈現。因此,本研究將對國內企業財務報表、財務附註和財經新聞為資料來源,以企業評價概念基礎下建立中文財務項資料的資訊擷取系統。從這些不同的異質資料來源中,擷取正確的財務項資料與其所對應之企業評價模型,以達成自動擷取企業評價資料。使用者能在最短的時間內取得相關有效評價資訊和學習評價模型,使資訊處理品質能夠提昇正確性和效率性。
zh_TW
dc.description.abstract (摘要) Due to an increase in the wealth of electronic resources on the Internet in the past several years, the birth of the search engine has brought the utmost convenience and efficiency for users. However, searching for data by keyword retrieval techniques in information retrieval is not contented with some users’ specific demands due to a large number of network resources and users on the Internet. Information extraction (IE) is an improvement method which extracts the important specific event or produces specific relations among information from documents. IE can not only filter unnecessary information in any documents but also produce specific important messages and summaries that users are interested in.
      Business valuation is collecting, analyzing, and applying to financial or non- financial integral information to appraise the business value. The evaluated results are used in the commerce pricing for the business decision and intangible assets. There are specific information and events about business valuation stored in the Chinese financial statements, notes to financial statements, and financial news of Taiwan’s companies at present and data is presented by the HTML and PDF files. Hence, we developed an information extraction system of Chinese financial data for business valuation from the domestic business financial statements, notes to financial statements, and financial news as our data sources. We extracted the correct financial data and their corresponding business valuation model to achieve an automatic extraction in the financial data from these different heterogeneous data sources. Users can collect the relevant valid valuation information and learn valuation models concepts within a very short time to improve accuracy and efficiency in information processing quality.
en_US
dc.description.tableofcontents Table of Contents v
     List of Tables vii
     List of Figures ix
     Chapter 1 Introduction 1
     1.1 Research Background and Motivation 1
     1.2 Research Objective 2
     1.3 Research Scope 2
     1.4 Research Issue 4
     1.5 Research Flow 4
     1.6 Organization of Thesis 6
     Chapter 2 Literature Review 7
     2.1 Chinese Information Extraction 7
     2.2 Chinese Word Segmentation 8
     2.2.1 Chinese Segmentation Methods 10
     2.2.2 Unknown Word Extraction 14
     2.2.3 Part-of-Speech Tagging Models 18
     2.2.4 Segmentation Models Based on Named Entities 20
     2.3 Information Extraction Methods 22
     2.3.1 Keyword Extraction for Pure-Text Data 22
     2.3.2 Structural Extraction for Tabular Data 23
     2.4 Business Valuation Models 25
     2.4.1 Income-Based Approach 26
     2.4.2 Market-Based Approach 31
     2.4.3 Asset-Based Approach 32
     2.5 Summary 32
     Chapter 3 Research Model 35
     3.1 Information Extraction on Financial Statements 36
     3.1.1 Financial Statements 36
     3.1.2 Keyword Extraction on Financial Statements 38
     3.2 Information Extraction on Notes to Financial Statements 41
     3.2.1 Notes to Financial Statements 41
     3.2.2 PDF Converting Processing 42
     3.2.3 Keyword Extraction on Notes to Financial Statements 42
     3.3 Information Extraction on Financial News 43
     3.3.1 Financial News 43
     3.3.2 Chinese Word Segmentation System Model 43
     3.3.3 Chinese Keyword Analyzing Model on Financial Data 44
     3.3.3.1 Account Name Analyzing 45
     3.3.3.2 Organization Name Analyzing 46
     3.3.3.3 Time Analyzing 49
     3.3.3.4 Money and Percent Analyzing 51
     3.3.4 Keyword Extraction on Financial News 54
     3.4 Valuation Model Analyzing Based on Concept Hierarchy 55
     3.5 Summary 60
     Chapter 4 Prototype Development 61
     4.1 Prototype Platform and Architecture 61
     4.2 Prototype System Design 62
     4.2.1 Web Crawler Design 62
     4.2.2 Domain Lexicon Tool 63
     4.2.3 PDF Converting Tool 65
     4.2.4 Knowledgebase Design 65
     4.2.5 Information Extraction System Function Design 67
     Chapter 5 Research Experiment 69
     5.1 Experiment Design 69
     5.2 Experiment Evaluation 69
     5.3 Experiment Results 70
     5.3.1 Experiment I: Financial Statements 70
     5.3.2 Experiment II: Notes to Financial Statements 75
     5.3.3 Experiment III: Financial News 78
     Chapter 6 Research Implication and Discussion 87
     6.1 Managerial Findings and Implications 87
     6.2 Technological Findings and Implications 88
     Chapter 7 Conclusion and Future Work 89
     7.1 Conclusion 89
     7.2 Future Work 89
     References 91
     Appendix A 95
     Appendix B 97
zh_TW
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0094356025en_US
dc.subject (關鍵詞) 資訊擷取zh_TW
dc.subject (關鍵詞) 企業評價zh_TW
dc.subject (關鍵詞) 財務項資料zh_TW
dc.subject (關鍵詞) Information Extractionen_US
dc.subject (關鍵詞) Business Valuationen_US
dc.subject (關鍵詞) Financial Dataen_US
dc.title (題名) 應用資訊擷取技術於企業評價財務項資料之取得zh_TW
dc.title (題名) An Application of Information Extraction in Collecting Financial Data for Business Valuationen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) 1.卜小蝶 (1996)。圖書資訊檢索技術。文華圖書館管理資訊股份有限公司。zh_TW
dc.relation.reference (參考文獻) 2.中央研究院資訊科學所中文詞知識庫小組網站(Chinese Knowledge and Information Processing Group Website)。http://ckip.iis.sinica.edu.tw/CKIPzh_TW
dc.relation.reference (參考文獻) 3.朱怡霖 (2002)。中文斷詞及專有名詞辨識之研究。國立台灣大學資訊工程研究所碩士論文,台北市。zh_TW
dc.relation.reference (參考文獻) 4.吳岱儒 (2003)。財務管理。全華科技圖書股份有限公司。zh_TW
dc.relation.reference (參考文獻) 5.吳啟銘 (2001)。企業評價:個案實證分析。智勝文化事業有限公司。zh_TW
dc.relation.reference (參考文獻) 6.洪國賜、盧聯生 (2001)。財務報表分析。三民書局。zh_TW
dc.relation.reference (參考文獻) 7.黃佳新 (2004)。關鍵字擷取與文件分類因子分析。國立清華大學工業工程與管理系碩士論文,新竹市。zh_TW
dc.relation.reference (參考文獻) 8.黃燕萍 (1999)。中文社會新聞文件資訊擷取。國立雲林科技大學資訊管理系碩士論文,雲林縣。zh_TW
dc.relation.reference (參考文獻) 9.葉政輝 (2002)。以語料為基礎的中文專有名詞的之研究。國立交通大學資訊科學所碩士論文,新竹市。zh_TW
dc.relation.reference (參考文獻) 10.Atlam, El-S., Fuketa, M., Kashiji, S., Nakata, H., & Aoe, J. (2002). A new method for construction filed association terms using co-occurrence words and declinable words information. IEEE International Conference on Systems, Man and Cybernetics, 4, pp. 1217-1224.zh_TW
dc.relation.reference (參考文獻) 11.Baeza-Yates, R. & Ribeiro-Neto, B. (1999). Modern information retrieval. Addision Wesley Longman Publishing Co. Inc.zh_TW
dc.relation.reference (參考文獻) 12.Cercone, N., Huang, X., Peng, F., & Schurmans, D. (2003). Applying machine learning to text segmentation for information retrieval. Information Retrieval, 6(3), pp. 333-362.zh_TW
dc.relation.reference (參考文獻) 13.Chen, A., Gey, F. C., He, J., Meggs, J., & Xu, L. (1997). Chinese Text Retrieval Without Using a Dictionary. Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49.zh_TW
dc.relation.reference (參考文獻) 14.Chen, F. Y., Chen, K. J., Huang, C. R., & Tsai, P. F. (1999). Sinica treebank. Computational Linguistics and Chinese Language Processing, 4(2), pp. 87-104.zh_TW
dc.relation.reference (參考文獻) 15.Chen, K. J. & Bai, M. H. (1998). Unknown word detection for Chinese by a corpus-based learning method. International Journal of Computational Linguistics and Chinese Language Processing, 3(1), pp. 27-44.zh_TW
dc.relation.reference (參考文獻) 16.Chen, K. J. & Liu, S. H. (1992). Word identification for Mandarin Chinese sentences. Proceedings of the 14th Conference on Computational Linguistics, 1, pp. 101-107.zh_TW
dc.relation.reference (參考文獻) 17.Chen, K. J. & Ma, W. Y. (2001). Construction and management for Chinese corpus. Proceedings of Research on Computational Linguistics Conference, pp.175-191.zh_TW
dc.relation.reference (參考文獻) 18.Chen, K. J. & Ma, W. Y. (2002). Unknown word extraction for Chinese documents. Proceedings of the 19th International Conference on Computational Linguistics, 1, pp. 1-7.zh_TW
dc.relation.reference (參考文獻) 19.Chen, K. J. & Ma, W. Y. (2003). A bottom-up merging algorithm for Chinese unknown word extraction. Proceedings of SIGHAN, pp. 31-38zh_TW
dc.relation.reference (參考文獻) 20.Chen, K. J. & Ma, W. Y. (2005). Design of CKIP Chinese word segmentation system. Chinese and Oriental Languages Information Processing Society, 14(3), pp. 235-249.zh_TW
dc.relation.reference (參考文獻) 21.Chen, K. J. & Tsai, Y. F. (2003). Context-rule model for pos tagging. Proceedings of PACLIC 17, pp.146-151.zh_TW
dc.relation.reference (參考文獻) 22.Chien, L. F. & Pu, H. T. (1996). Important issues on Chinese retrieval. Computational Linguistics and Chinese Language Processing, 1(1), pp.205-221.zh_TW
dc.relation.reference (參考文獻) 23.Fu, G. & Luke, K. K. (2003). A two-stage statistical word segmentation system for Chinese. Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, 17, Association for Computational Linguistics, pp. 156-159.zh_TW
dc.relation.reference (參考文獻) 24.Gao, J., Li, M., & Huang, C. N. (2003). Improve source-channel models for Chinese word segmentation. Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, 1(3), pp. 272-279.zh_TW
dc.relation.reference (參考文獻) 25.Goldstein, R.C. & Storey, V.C. (1994). Materialization. IEEE Transactions on Knowledge and Data Engineering, 6(5), pp.835-842.zh_TW
dc.relation.reference (參考文獻) 26.Han, J., Cai, Y. & Cercone N., (1993). Data-driven discovery of quantitative rules in relation databases, IEEE Transactions on Knowledge and Data Engineering, 5(1), pp. 29-40.zh_TW
dc.relation.reference (參考文獻) 27.Hsieh, Y. M., Yang, D. C., & Chen, K. J. (2006). Improve parsing performance by self-learning. Proceedings of ROCLING XVIII, pp 63-76.zh_TW
dc.relation.reference (參考文獻) 28.Krupl, B., Herzog, M., & Gatterbauer, W. (2005). Using visual cues for extraction of tabular data from arbitrary HTML documents. Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, pp. 1000-1001.zh_TW
dc.relation.reference (參考文獻) 29.Lee, R. C. T., Chang, R. C., Tseng, S. S. & Tsai, Y. T. (1999). Introduction to the design and analysis of algorithm (1). UNALIS Corp., pp. 419-423.zh_TW
dc.relation.reference (參考文獻) 30.Li, W., Wong, K. F., & Yuan, C. (2003). A design of temporal event extraction from Chinese financial news. International Journal of Computer Processing of Oriental Languages, 16(1), pp. 21-39.zh_TW
dc.relation.reference (參考文獻) 31.Liu, J., Nissim, D., & Thomas, J. (2002). Equity valuation using multiples. Journal of Account Research, 40(1).zh_TW
dc.relation.reference (參考文獻) 32.Liu, T. & Wang, Z. (2005). Chinese unknown word identification based on local bi-gram model. International Journal of Computer Processing of Oriental Languages, 18(3), pp. 185-196.zh_TW
dc.relation.reference (參考文獻) 33.Liu, Y., Mitra, P., Giles, C.L., & Bai, K.(2006). Automatic extraction of table metadata from digital documents. Digital Libraries,2006. JCDL’06. Proceedings of the 6th ACM/IEEE-CS Joint Conference on.zh_TW
dc.relation.reference (參考文獻) 34.Lochovsky, F. H. & Wang, J. (2003). Data extraction and label assignment for Web database. Proceedings of the 12th International Conference on World Wide Web, pp. 187-196.zh_TW
dc.relation.reference (參考文獻) 35.Maier D. (1978). The complexity of some problems on subsequences and supersequences. Journal of the ACM, 25(2), pp. 322-336.zh_TW
dc.relation.reference (參考文獻) 36.Manning, C.D., Raghavan P., & Schutze, H. (2007). An introduction to information retriveal. Cambrige University Press Camvidge.England.zh_TW
dc.relation.reference (參考文獻) 37.Nguyen, N. G., Hanny, Y. L. & Vo, T. T. (2005). An information extraction engine for Web discussion forums. Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, pp 978-979.zh_TW
dc.relation.reference (參考文獻) 38.Peng, F., Huang, X., Schuurmans, D., & Cercone, N. (2002). Investigating the relationship between word segmentation performance and retrieval performance in Chinese IR. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 369-370.zh_TW
dc.relation.reference (參考文獻) 39.Rosenfeld,B., Feldman, R., & Aumann, Y. (2002). Structural extraction from visual layout of documents. Proceedings of the eleventh international conference on Information and knowledge management, pp. 203-210.zh_TW
dc.relation.reference (參考文獻) 40.Teahan, W.J., McNab, R., Wen, Y., & Witten, I. H. (2001). A compression-based algorithm for Chinese word segmentation. Computational Linguistics, 26(3), pp. 375–393.zh_TW
dc.relation.reference (參考文獻) 41.Tseng, H. & Chen, K. J. (2002). Design of Chinese morphological analyzer. Proceeding of the First SIGHAN Workshop on Chinese Language Process, 18, pp. 1-7.zh_TW
dc.relation.reference (參考文獻) 42.Wang, H. (2002). A study on noun sense disambiguation based on syntagmatic features. Computational Linguistics and Chinese Language Processing, 7(2), pp. 77-88.zh_TW
dc.relation.reference (參考文獻) 43.Wong, K. & Xia, Y. (2005). An overview of temporal information extraction. International Journal of Computer Oriental Languages, 18(2), pp.137-152zh_TW
dc.relation.reference (參考文獻) 44.You, J.M. & Chen, K.J. (2004). Automatic semantic role assignment for a tree structure. Proceedings of the 3rd SIGHAN Workshop on Chinese Language Processing, ACL-04, Barcelona.zh_TW
dc.relation.reference (參考文獻) 45.Zhai, Y. & Liu, B.(2005). Web data extraction based on partial tree alignment. Proceedings of the 14th international conference on World Wide Web, pp.76-85.zh_TW
dc.relation.reference (參考文獻) 46.Zhang, J., Gao, J., & Zhou, M. (2000). Extraction of Chinese compound words -An experimental study on a very large corpus. Proceedings of the Second Workshop on Chinese Language Processing: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, 12, pp. 132-139.zh_TW
dc.relation.reference (參考文獻) 47.Zhou, G. & Su J. (2003). Chinese efficient analyser integrating word segmentation, Part-Of-Speech Tagging, Partial Parsing and Full ParsingParsing. Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, 17, pp. 78-83.zh_TW