Publications-Proceedings

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 Toward algorithmic discovery of biographical information in local gazetteers of ancient China
作者 劉昭麟
Liu, Chao Lin
Huang, Chihkai
Wang, Hongsu
Bol, Peter K.
貢獻者 資訊科學系
關鍵詞 Algorithms; Computational linguistics; Modeling languages; Natural language processing systems; Ancient China; Chinese characters; Conditional random field; Current status; Harvard University; Language model; NAtural language processing; Tagging systems; Data mining
日期 2015-11
上傳時間 14-Aug-2017 16:07:25 (UTC+8)
摘要 Difangzhi is a large collection of local gazetteers complied by local govern-ments of China, and the documents provide invaluable information about the host locali-ty. This paper reports the current status of using natural language processing and text mining methods to identify biographical in-formation of government officers so that we can add the information into the China Bio-graphical Database (CBDB), which is hosted by Harvard University. Information offered by CBDB is instrumental for human histori-ans, and serves as a core foundation for au-tomatic tagging systems, like MARKUS of the Leiden University. Mining texts in Difangzhi is not easy partially because there is litter knowledge about the grammars of literary Chinese so far. We employed tech-niques of language modeling and conditional random fields to find person and location names and their relationships. The methods were evaluated with realistic Difangzhi data of more than 2 million Chinese characters written in literary Chinese. Experimental re-sults indicate that useful information was discovered from the current dataset.
關聯 29th Pacific Asia Conference on Language, Information and Computation, PACLIC 2015, 87-95
29th Pacific Asia Conference on Language, Information and Computation, PACLIC 2015; Shanghai; China; 30 October 2015 到 1 November 2015; 代碼 119467
資料類型 conference
dc.contributor 資訊科學系zh_Tw
dc.creator (作者) 劉昭麟zh_TW
dc.creator (作者) Liu, Chao Linen_US
dc.creator (作者) Huang, Chihkaien_US
dc.creator (作者) Wang, Hongsuen_US
dc.creator (作者) Bol, Peter K.en_US
dc.date (日期) 2015-11en_US
dc.date.accessioned 14-Aug-2017 16:07:25 (UTC+8)-
dc.date.available 14-Aug-2017 16:07:25 (UTC+8)-
dc.date.issued (上傳時間) 14-Aug-2017 16:07:25 (UTC+8)-
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/111954-
dc.description.abstract (摘要) Difangzhi is a large collection of local gazetteers complied by local govern-ments of China, and the documents provide invaluable information about the host locali-ty. This paper reports the current status of using natural language processing and text mining methods to identify biographical in-formation of government officers so that we can add the information into the China Bio-graphical Database (CBDB), which is hosted by Harvard University. Information offered by CBDB is instrumental for human histori-ans, and serves as a core foundation for au-tomatic tagging systems, like MARKUS of the Leiden University. Mining texts in Difangzhi is not easy partially because there is litter knowledge about the grammars of literary Chinese so far. We employed tech-niques of language modeling and conditional random fields to find person and location names and their relationships. The methods were evaluated with realistic Difangzhi data of more than 2 million Chinese characters written in literary Chinese. Experimental re-sults indicate that useful information was discovered from the current dataset.en_US
dc.format.extent 662254 bytes-
dc.format.mimetype application/pdf-
dc.relation (關聯) 29th Pacific Asia Conference on Language, Information and Computation, PACLIC 2015, 87-95en_US
dc.relation (關聯) 29th Pacific Asia Conference on Language, Information and Computation, PACLIC 2015; Shanghai; China; 30 October 2015 到 1 November 2015; 代碼 119467zh_TW
dc.subject (關鍵詞) Algorithms; Computational linguistics; Modeling languages; Natural language processing systems; Ancient China; Chinese characters; Conditional random field; Current status; Harvard University; Language model; NAtural language processing; Tagging systems; Data miningen_US
dc.title (題名) Toward algorithmic discovery of biographical information in local gazetteers of ancient Chinaen_US
dc.type (資料類型) conference