dc.contributor | 資訊科學系 | zh_Tw |
dc.creator (作者) | 劉昭麟 | zh_TW |
dc.creator (作者) | Liu, Chao Lin | en_US |
dc.creator (作者) | Huang, Chihkai | en_US |
dc.creator (作者) | Wang, Hongsu | en_US |
dc.creator (作者) | Bol, Peter K. | en_US |
dc.date (日期) | 2015-11 | en_US |
dc.date.accessioned | 14-Aug-2017 16:07:25 (UTC+8) | - |
dc.date.available | 14-Aug-2017 16:07:25 (UTC+8) | - |
dc.date.issued (上傳時間) | 14-Aug-2017 16:07:25 (UTC+8) | - |
dc.identifier.uri (URI) | http://nccur.lib.nccu.edu.tw/handle/140.119/111954 | - |
dc.description.abstract (摘要) | Difangzhi is a large collection of local gazetteers complied by local govern-ments of China, and the documents provide invaluable information about the host locali-ty. This paper reports the current status of using natural language processing and text mining methods to identify biographical in-formation of government officers so that we can add the information into the China Bio-graphical Database (CBDB), which is hosted by Harvard University. Information offered by CBDB is instrumental for human histori-ans, and serves as a core foundation for au-tomatic tagging systems, like MARKUS of the Leiden University. Mining texts in Difangzhi is not easy partially because there is litter knowledge about the grammars of literary Chinese so far. We employed tech-niques of language modeling and conditional random fields to find person and location names and their relationships. The methods were evaluated with realistic Difangzhi data of more than 2 million Chinese characters written in literary Chinese. Experimental re-sults indicate that useful information was discovered from the current dataset. | en_US |
dc.format.extent | 662254 bytes | - |
dc.format.mimetype | application/pdf | - |
dc.relation (關聯) | 29th Pacific Asia Conference on Language, Information and Computation, PACLIC 2015, 87-95 | en_US |
dc.relation (關聯) | 29th Pacific Asia Conference on Language, Information and Computation, PACLIC 2015; Shanghai; China; 30 October 2015 到 1 November 2015; 代碼 119467 | zh_TW |
dc.subject (關鍵詞) | Algorithms; Computational linguistics; Modeling languages; Natural language processing systems; Ancient China; Chinese characters; Conditional random field; Current status; Harvard University; Language model; NAtural language processing; Tagging systems; Data mining | en_US |
dc.title (題名) | Toward algorithmic discovery of biographical information in local gazetteers of ancient China | en_US |
dc.type (資料類型) | conference | |