學術產出-Periodical Articles

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 Machine learning and data analysis for word segmentation of classical Chinese poems: illustrations with Tang and Song examples
作者 劉昭麟
Liu, Chao-Lin;Chang, Wei-Ting;Chu, Chang-Ting;Zheng, Ti-Yong
貢獻者 資訊系
日期 2024-04
上傳時間 29-Jan-2024 09:45:29 (UTC+8)
摘要 Words are essential parts for understanding classical Chinese poems. We report a collection of 32,399 classical Chinese poems that were annotated with word boundaries. Statistics about the annotated poems support a few heuristic experiences, including the patterns of lines and a practice for the parallel structures (對仗), that researchers of Chinese literature discuss in the literature. The annotators were affiliated with two universities, so they could annotate the poems as independently as possible. Results of an inter-rater agreement study indicate that the annotators have consensus over the identified words 93 per cent of the time and have perfect consensus for the segmentation of a poem 42 per cent of the time. We applied unsupervised classification methods to annotate the poems in several different settings, and evaluated the results with human annotations. Under favorable conditions, the classifier identified about 88 per cent of the words, and segmented poems perfectly 22 per cent of the time.
關聯 Digital Scholarship in the Humanities, Vol.39, No.1, pp.228–241,
資料類型 article
DOI https://doi.org/10.1093/llc/fqad073
dc.contributor 資訊系-
dc.creator (作者) 劉昭麟-
dc.creator (作者) Liu, Chao-Lin;Chang, Wei-Ting;Chu, Chang-Ting;Zheng, Ti-Yong-
dc.date (日期) 2024-04-
dc.date.accessioned 29-Jan-2024 09:45:29 (UTC+8)-
dc.date.available 29-Jan-2024 09:45:29 (UTC+8)-
dc.date.issued (上傳時間) 29-Jan-2024 09:45:29 (UTC+8)-
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/149452-
dc.description.abstract (摘要) Words are essential parts for understanding classical Chinese poems. We report a collection of 32,399 classical Chinese poems that were annotated with word boundaries. Statistics about the annotated poems support a few heuristic experiences, including the patterns of lines and a practice for the parallel structures (對仗), that researchers of Chinese literature discuss in the literature. The annotators were affiliated with two universities, so they could annotate the poems as independently as possible. Results of an inter-rater agreement study indicate that the annotators have consensus over the identified words 93 per cent of the time and have perfect consensus for the segmentation of a poem 42 per cent of the time. We applied unsupervised classification methods to annotate the poems in several different settings, and evaluated the results with human annotations. Under favorable conditions, the classifier identified about 88 per cent of the words, and segmented poems perfectly 22 per cent of the time.-
dc.format.extent 99 bytes-
dc.format.mimetype text/html-
dc.relation (關聯) Digital Scholarship in the Humanities, Vol.39, No.1, pp.228–241,-
dc.title (題名) Machine learning and data analysis for word segmentation of classical Chinese poems: illustrations with Tang and Song examples-
dc.type (資料類型) article-
dc.identifier.doi (DOI) 10.1093/llc/fqad073-
dc.doi.uri (DOI) https://doi.org/10.1093/llc/fqad073-