Toward algorithmic discovery of biographical information in local gazetteers of ancient China | NCCU Academic Hub

Publications-Proceedings

Article View/Open

pdf(328)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

No doi shows Citation Infomation

Related Publications in TAIR

Simple Record
Full Record

題名	Toward algorithmic discovery of biographical information in local gazetteers of ancient China
作者	劉昭麟 Liu, Chao Lin Huang, Chihkai Wang, Hongsu Bol, Peter K.
貢獻者	資訊科學系
關鍵詞	Algorithms; Computational linguistics; Modeling languages; Natural language processing systems; Ancient China; Chinese characters; Conditional random field; Current status; Harvard University; Language model; NAtural language processing; Tagging systems; Data mining
日期	2015-11
上傳時間	14-Aug-2017 16:07:25 (UTC+8)
摘要	Difangzhi is a large collection of local gazetteers complied by local govern-ments of China, and the documents provide invaluable information about the host locali-ty. This paper reports the current status of using natural language processing and text mining methods to identify biographical in-formation of government officers so that we can add the information into the China Bio-graphical Database (CBDB), which is hosted by Harvard University. Information offered by CBDB is instrumental for human histori-ans, and serves as a core foundation for au-tomatic tagging systems, like MARKUS of the Leiden University. Mining texts in Difangzhi is not easy partially because there is litter knowledge about the grammars of literary Chinese so far. We employed tech-niques of language modeling and conditional random fields to find person and location names and their relationships. The methods were evaluated with realistic Difangzhi data of more than 2 million Chinese characters written in literary Chinese. Experimental re-sults indicate that useful information was discovered from the current dataset.
關聯	29th Pacific Asia Conference on Language, Information and Computation, PACLIC 2015, 87-95 29th Pacific Asia Conference on Language, Information and Computation, PACLIC 2015; Shanghai; China; 30 October 2015 到 1 November 2015; 代碼 119467
資料類型	conference

dc.contributor	資訊科學系	zh_Tw
dc.creator (作者)	劉昭麟	zh_TW
dc.creator (作者)	Liu, Chao Lin	en_US
dc.creator (作者)	Huang, Chihkai	en_US
dc.creator (作者)	Wang, Hongsu	en_US
dc.creator (作者)	Bol, Peter K.	en_US
dc.date (日期)	2015-11	en_US
dc.date.accessioned	14-Aug-2017 16:07:25 (UTC+8)	-
dc.date.available	14-Aug-2017 16:07:25 (UTC+8)	-
dc.date.issued (上傳時間)	14-Aug-2017 16:07:25 (UTC+8)	-
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/111954	-
dc.description.abstract (摘要)	Difangzhi is a large collection of local gazetteers complied by local govern-ments of China, and the documents provide invaluable information about the host locali-ty. This paper reports the current status of using natural language processing and text mining methods to identify biographical in-formation of government officers so that we can add the information into the China Bio-graphical Database (CBDB), which is hosted by Harvard University. Information offered by CBDB is instrumental for human histori-ans, and serves as a core foundation for au-tomatic tagging systems, like MARKUS of the Leiden University. Mining texts in Difangzhi is not easy partially because there is litter knowledge about the grammars of literary Chinese so far. We employed tech-niques of language modeling and conditional random fields to find person and location names and their relationships. The methods were evaluated with realistic Difangzhi data of more than 2 million Chinese characters written in literary Chinese. Experimental re-sults indicate that useful information was discovered from the current dataset.	en_US
dc.format.extent	662254 bytes	-
dc.format.mimetype	application/pdf	-
dc.relation (關聯)	29th Pacific Asia Conference on Language, Information and Computation, PACLIC 2015, 87-95	en_US
dc.relation (關聯)	29th Pacific Asia Conference on Language, Information and Computation, PACLIC 2015; Shanghai; China; 30 October 2015 到 1 November 2015; 代碼 119467	zh_TW
dc.subject (關鍵詞)	Algorithms; Computational linguistics; Modeling languages; Natural language processing systems; Ancient China; Chinese characters; Conditional random field; Current status; Harvard University; Language model; NAtural language processing; Tagging systems; Data mining	en_US
dc.title (題名)	Toward algorithmic discovery of biographical information in local gazetteers of ancient China	en_US
dc.type (資料類型)	conference