Publications-Periodical Articles

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

  • Loading...
    Loading...

Related Publications in TAIR

TitleMachine learning and data analysis for word segmentation of classical Chinese poems: illustrations with Tang and Song examples
Creator劉昭麟
Liu, Chao-Lin;Chang, Wei-Ting;Chu, Chang-Ting;Zheng, Ti-Yong
Contributor資訊系
Date2024-04
Date Issued29-Jan-2024 09:45:29 (UTC+8)
SummaryWords are essential parts for understanding classical Chinese poems. We report a collection of 32,399 classical Chinese poems that were annotated with word boundaries. Statistics about the annotated poems support a few heuristic experiences, including the patterns of lines and a practice for the parallel structures (對仗), that researchers of Chinese literature discuss in the literature. The annotators were affiliated with two universities, so they could annotate the poems as independently as possible. Results of an inter-rater agreement study indicate that the annotators have consensus over the identified words 93 per cent of the time and have perfect consensus for the segmentation of a poem 42 per cent of the time. We applied unsupervised classification methods to annotate the poems in several different settings, and evaluated the results with human annotations. Under favorable conditions, the classifier identified about 88 per cent of the words, and segmented poems perfectly 22 per cent of the time.
RelationDigital Scholarship in the Humanities, Vol.39, No.1, pp.228–241,
Typearticle
DOI https://doi.org/10.1093/llc/fqad073
dc.contributor 資訊系-
dc.creator (作者) 劉昭麟-
dc.creator (作者) Liu, Chao-Lin;Chang, Wei-Ting;Chu, Chang-Ting;Zheng, Ti-Yong-
dc.date (日期) 2024-04-
dc.date.accessioned 29-Jan-2024 09:45:29 (UTC+8)-
dc.date.available 29-Jan-2024 09:45:29 (UTC+8)-
dc.date.issued (上傳時間) 29-Jan-2024 09:45:29 (UTC+8)-
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/149452-
dc.description.abstract (摘要) Words are essential parts for understanding classical Chinese poems. We report a collection of 32,399 classical Chinese poems that were annotated with word boundaries. Statistics about the annotated poems support a few heuristic experiences, including the patterns of lines and a practice for the parallel structures (對仗), that researchers of Chinese literature discuss in the literature. The annotators were affiliated with two universities, so they could annotate the poems as independently as possible. Results of an inter-rater agreement study indicate that the annotators have consensus over the identified words 93 per cent of the time and have perfect consensus for the segmentation of a poem 42 per cent of the time. We applied unsupervised classification methods to annotate the poems in several different settings, and evaluated the results with human annotations. Under favorable conditions, the classifier identified about 88 per cent of the words, and segmented poems perfectly 22 per cent of the time.-
dc.format.extent 99 bytes-
dc.format.mimetype text/html-
dc.relation (關聯) Digital Scholarship in the Humanities, Vol.39, No.1, pp.228–241,-
dc.title (題名) Machine learning and data analysis for word segmentation of classical Chinese poems: illustrations with Tang and Song examples-
dc.type (資料類型) article-
dc.identifier.doi (DOI) 10.1093/llc/fqad073-
dc.doi.uri (DOI) https://doi.org/10.1093/llc/fqad073-