Machine learning and data analysis for word segmentation of classical Chinese poems: illustrations with T... | Publication | NCCU Academic Hub

Publications-Periodical Articles

Article View/Open

html(328)

Publication Export

Google Scholar^TM

NCCU Library

Discovery System

Citation Infomation

Loading...

Loading...

Related Publications in TAIR

Simple Record
Full Record

Title	Machine learning and data analysis for word segmentation of classical Chinese poems: illustrations with Tang and Song examples
Creator	劉昭麟 Liu, Chao-Lin;Chang, Wei-Ting;Chu, Chang-Ting;Zheng, Ti-Yong
Contributor	資訊系
Date	2024-04
Date Issued	29-Jan-2024 09:45:29 (UTC+8)
Summary	Words are essential parts for understanding classical Chinese poems. We report a collection of 32,399 classical Chinese poems that were annotated with word boundaries. Statistics about the annotated poems support a few heuristic experiences, including the patterns of lines and a practice for the parallel structures (對仗), that researchers of Chinese literature discuss in the literature. The annotators were affiliated with two universities, so they could annotate the poems as independently as possible. Results of an inter-rater agreement study indicate that the annotators have consensus over the identified words 93 per cent of the time and have perfect consensus for the segmentation of a poem 42 per cent of the time. We applied unsupervised classification methods to annotate the poems in several different settings, and evaluated the results with human annotations. Under favorable conditions, the classifier identified about 88 per cent of the words, and segmented poems perfectly 22 per cent of the time.
Relation	Digital Scholarship in the Humanities, Vol.39, No.1, pp.228–241,
Type	article
DOI	https://doi.org/10.1093/llc/fqad073

dc.contributor	資訊系	-
dc.creator (作者)	劉昭麟	-
dc.creator (作者)	Liu, Chao-Lin;Chang, Wei-Ting;Chu, Chang-Ting;Zheng, Ti-Yong	-
dc.date (日期)	2024-04	-
dc.date.accessioned	29-Jan-2024 09:45:29 (UTC+8)	-
dc.date.available	29-Jan-2024 09:45:29 (UTC+8)	-
dc.date.issued (上傳時間)	29-Jan-2024 09:45:29 (UTC+8)	-
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/149452	-
dc.description.abstract (摘要)	Words are essential parts for understanding classical Chinese poems. We report a collection of 32,399 classical Chinese poems that were annotated with word boundaries. Statistics about the annotated poems support a few heuristic experiences, including the patterns of lines and a practice for the parallel structures (對仗), that researchers of Chinese literature discuss in the literature. The annotators were affiliated with two universities, so they could annotate the poems as independently as possible. Results of an inter-rater agreement study indicate that the annotators have consensus over the identified words 93 per cent of the time and have perfect consensus for the segmentation of a poem 42 per cent of the time. We applied unsupervised classification methods to annotate the poems in several different settings, and evaluated the results with human annotations. Under favorable conditions, the classifier identified about 88 per cent of the words, and segmented poems perfectly 22 per cent of the time.	-
dc.format.extent	99 bytes	-
dc.format.mimetype	text/html	-
dc.relation (關聯)	Digital Scholarship in the Humanities, Vol.39, No.1, pp.228–241,	-
dc.title (題名)	Machine learning and data analysis for word segmentation of classical Chinese poems: illustrations with Tang and Song examples	-
dc.type (資料類型)	article	-
dc.identifier.doi (DOI)	10.1093/llc/fqad073	-
dc.doi.uri (DOI)	https://doi.org/10.1093/llc/fqad073	-