Machine learning and data analysis for word segmentation of classical Chinese poems: illustrations with T... | 學術產出 | 政大學術集成

學術產出-期刊論文

文章檢視/開啟

html(578)

書目匯出

Google Scholar^TM

政大圖書館

學術資源探索系統

引文資訊

TAIR相關學術產出

Simple Record
Full Record

題名	Machine learning and data analysis for word segmentation of classical Chinese poems: illustrations with Tang and Song examples
作者	劉昭麟 Liu, Chao-Lin;Chang, Wei-Ting;Chu, Chang-Ting;Zheng, Ti-Yong
貢獻者	資訊系
日期	2024-04
上傳時間	29-一月-2024 09:45:29 (UTC+8)
摘要	Words are essential parts for understanding classical Chinese poems. We report a collection of 32,399 classical Chinese poems that were annotated with word boundaries. Statistics about the annotated poems support a few heuristic experiences, including the patterns of lines and a practice for the parallel structures (對仗), that researchers of Chinese literature discuss in the literature. The annotators were affiliated with two universities, so they could annotate the poems as independently as possible. Results of an inter-rater agreement study indicate that the annotators have consensus over the identified words 93 per cent of the time and have perfect consensus for the segmentation of a poem 42 per cent of the time. We applied unsupervised classification methods to annotate the poems in several different settings, and evaluated the results with human annotations. Under favorable conditions, the classifier identified about 88 per cent of the words, and segmented poems perfectly 22 per cent of the time.
關聯	Digital Scholarship in the Humanities, Vol.39, No.1, pp.228–241,
資料類型	article
DOI	https://doi.org/10.1093/llc/fqad073

dc.contributor	資訊系	-
dc.creator (作者)	劉昭麟	-
dc.creator (作者)	Liu, Chao-Lin;Chang, Wei-Ting;Chu, Chang-Ting;Zheng, Ti-Yong	-
dc.date (日期)	2024-04	-
dc.date.accessioned	29-一月-2024 09:45:29 (UTC+8)	-
dc.date.available	29-一月-2024 09:45:29 (UTC+8)	-
dc.date.issued (上傳時間)	29-一月-2024 09:45:29 (UTC+8)	-
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/149452	-
dc.description.abstract (摘要)	Words are essential parts for understanding classical Chinese poems. We report a collection of 32,399 classical Chinese poems that were annotated with word boundaries. Statistics about the annotated poems support a few heuristic experiences, including the patterns of lines and a practice for the parallel structures (對仗), that researchers of Chinese literature discuss in the literature. The annotators were affiliated with two universities, so they could annotate the poems as independently as possible. Results of an inter-rater agreement study indicate that the annotators have consensus over the identified words 93 per cent of the time and have perfect consensus for the segmentation of a poem 42 per cent of the time. We applied unsupervised classification methods to annotate the poems in several different settings, and evaluated the results with human annotations. Under favorable conditions, the classifier identified about 88 per cent of the words, and segmented poems perfectly 22 per cent of the time.	-
dc.format.extent	99 bytes	-
dc.format.mimetype	text/html	-
dc.relation (關聯)	Digital Scholarship in the Humanities, Vol.39, No.1, pp.228–241,	-
dc.title (題名)	Machine learning and data analysis for word segmentation of classical Chinese poems: illustrations with Tang and Song examples	-
dc.type (資料類型)	article	-
dc.identifier.doi (DOI)	10.1093/llc/fqad073	-
dc.doi.uri (DOI)	https://doi.org/10.1093/llc/fqad073	-