Mining local gazetteers of literary Chinese with CRF and pattern based methods for biographical informati...

學術產出-會議論文

文章檢視/開啟

html(484)

書目匯出

Google Scholar^TM

政大圖書館

學術資源探索系統

引文資訊

TAIR相關學術產出

Simple Record
Full Record

題名	Mining local gazetteers of literary Chinese with CRF and pattern based methods for biographical information in Chinese history
作者	劉昭麟 Liu, Chao-Lin Huang, Chih-Kai Wang, Hongsu Bol, Peter K.
貢獻者	資訊科學系
關鍵詞	Computational linguistics; Data mining; History; Natural language processing systems; Random processes; Conditional random field; Digital humanities; Document structure; Harvard University; Historical documents; Language model; Pattern based method; Text mining; Big data
日期	2015-12
上傳時間	9-八月-2017 17:27:07 (UTC+8)
摘要	Person names and location names are essential building blocks for identifying events and social networks in historical documents that were written in literary Chinese. We take the lead to explore the research on algorithmically recognizing named entities in literary Chinese for historical studies with language-model based and conditional-random-field based methods, and extend our work to mining the document structures in historical documents. Practical evaluations were conducted with texts that were extracted from more than 220 volumes of local gazetteers (Difangzhi,). Difangzhi is a huge and the single most important collection that contains information about officers who served in local government in Chinese history. Our methods performed very well on these realistic tests. Thousands of names and addresses were identified from the texts. A good portion of the extracted names match the biographical information currently recorded in the China Biographical Database (CBDB) of Harvard University, and many others can be verified by historians and will become as new additions to CBDB.1 © 2015 IEEE.
關聯	Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015, 1629-1638
資料類型	conference
DOI	http://dx.doi.org/10.1109/BigData.2015.7363931

dc.contributor	資訊科學系	zh_Tw
dc.creator (作者)	劉昭麟	zh_TW
dc.creator (作者)	Liu, Chao-Lin	en_US
dc.creator (作者)	Huang, Chih-Kai	en_US
dc.creator (作者)	Wang, Hongsu	en_US
dc.creator (作者)	Bol, Peter K.	en_US
dc.date (日期)	2015-12	en_US
dc.date.accessioned	9-八月-2017 17:27:07 (UTC+8)	-
dc.date.available	9-八月-2017 17:27:07 (UTC+8)	-
dc.date.issued (上傳時間)	9-八月-2017 17:27:07 (UTC+8)	-
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/111687	-
dc.description.abstract (摘要)	Person names and location names are essential building blocks for identifying events and social networks in historical documents that were written in literary Chinese. We take the lead to explore the research on algorithmically recognizing named entities in literary Chinese for historical studies with language-model based and conditional-random-field based methods, and extend our work to mining the document structures in historical documents. Practical evaluations were conducted with texts that were extracted from more than 220 volumes of local gazetteers (Difangzhi,). Difangzhi is a huge and the single most important collection that contains information about officers who served in local government in Chinese history. Our methods performed very well on these realistic tests. Thousands of names and addresses were identified from the texts. A good portion of the extracted names match the biographical information currently recorded in the China Biographical Database (CBDB) of Harvard University, and many others can be verified by historians and will become as new additions to CBDB.1 © 2015 IEEE.	en_US
dc.format.extent	212 bytes	-
dc.format.mimetype	text/html	-
dc.relation (關聯)	Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015, 1629-1638	en_US
dc.subject (關鍵詞)	Computational linguistics; Data mining; History; Natural language processing systems; Random processes; Conditional random field; Digital humanities; Document structure; Harvard University; Historical documents; Language model; Pattern based method; Text mining; Big data	en_US
dc.title (題名)	Mining local gazetteers of literary Chinese with CRF and pattern based methods for biographical information in Chinese history	en_US
dc.type (資料類型)	conference
dc.identifier.doi (DOI)	10.1109/BigData.2015.7363931
dc.doi.uri (DOI)	http://dx.doi.org/10.1109/BigData.2015.7363931

學術產出-會議論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

Google Scholar^TM