Publications-Proceedings

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 Statistical approaches to patent translation - Experiments with various settings of training data
作者 Tseng, Yuen-Hsien;Liu, Chao-Lin;Tsai, Chia-Chi;Wang, Jui-Ping;Chuang, Yi-Hsuan;Jeng, James
劉昭麟
貢獻者 資科系
關鍵詞 Chinese segmentation, language modeling, training corpus
日期 2011-12
上傳時間 22-Jun-2016 17:10:06 (UTC+8)
摘要 This paper describes our experiments and results in the NTCIR-9 Chinese-to-English Patent Translation Task. A series of open source software were integrated to build a statistical machine translation model for the task. Various Chinese segmentation, additional resources, and training corpus preprocessing were then tried based on this model. As a result, more than 20 experiments were conducted to compare the translation performance. Our current results show that 1) consistent segmentation between the training and testing data is important to maintain the performance; 2) sufficient number of good quality bilingual training sentences is more helpful than additional bilingual dictionaries; and 3) the translation effectiveness in BLEU values doubles as the number of bilingual training sentences at the level of 100,000 doubles.
關聯 Proceedings of the Ninth NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access - PatentMT (NTCIR 9), 661‒665. Tokyo, Japan, 6-9 December 2011
資料類型 conference
dc.contributor 資科系
dc.creator (作者) Tseng, Yuen-Hsien;Liu, Chao-Lin;Tsai, Chia-Chi;Wang, Jui-Ping;Chuang, Yi-Hsuan;Jeng, James
dc.creator (作者) 劉昭麟zh_TW
dc.date (日期) 2011-12
dc.date.accessioned 22-Jun-2016 17:10:06 (UTC+8)-
dc.date.available 22-Jun-2016 17:10:06 (UTC+8)-
dc.date.issued (上傳時間) 22-Jun-2016 17:10:06 (UTC+8)-
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/98230-
dc.description.abstract (摘要) This paper describes our experiments and results in the NTCIR-9 Chinese-to-English Patent Translation Task. A series of open source software were integrated to build a statistical machine translation model for the task. Various Chinese segmentation, additional resources, and training corpus preprocessing were then tried based on this model. As a result, more than 20 experiments were conducted to compare the translation performance. Our current results show that 1) consistent segmentation between the training and testing data is important to maintain the performance; 2) sufficient number of good quality bilingual training sentences is more helpful than additional bilingual dictionaries; and 3) the translation effectiveness in BLEU values doubles as the number of bilingual training sentences at the level of 100,000 doubles.
dc.format.extent 1418128 bytes-
dc.format.mimetype application/pdf-
dc.relation (關聯) Proceedings of the Ninth NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access - PatentMT (NTCIR 9), 661‒665. Tokyo, Japan, 6-9 December 2011
dc.subject (關鍵詞) Chinese segmentation, language modeling, training corpus
dc.title (題名) Statistical approaches to patent translation - Experiments with various settings of training data
dc.type (資料類型) conference