dc.contributor | 資科系 | |
dc.creator (作者) | Tseng, Yuen-Hsien;Liu, Chao-Lin;Tsai, Chia-Chi;Wang, Jui-Ping;Chuang, Yi-Hsuan;Jeng, James | |
dc.creator (作者) | 劉昭麟 | zh_TW |
dc.date (日期) | 2011-12 | |
dc.date.accessioned | 22-Jun-2016 17:10:06 (UTC+8) | - |
dc.date.available | 22-Jun-2016 17:10:06 (UTC+8) | - |
dc.date.issued (上傳時間) | 22-Jun-2016 17:10:06 (UTC+8) | - |
dc.identifier.uri (URI) | http://nccur.lib.nccu.edu.tw/handle/140.119/98230 | - |
dc.description.abstract (摘要) | This paper describes our experiments and results in the NTCIR-9 Chinese-to-English Patent Translation Task. A series of open source software were integrated to build a statistical machine translation model for the task. Various Chinese segmentation, additional resources, and training corpus preprocessing were then tried based on this model. As a result, more than 20 experiments were conducted to compare the translation performance. Our current results show that 1) consistent segmentation between the training and testing data is important to maintain the performance; 2) sufficient number of good quality bilingual training sentences is more helpful than additional bilingual dictionaries; and 3) the translation effectiveness in BLEU values doubles as the number of bilingual training sentences at the level of 100,000 doubles. | |
dc.format.extent | 1418128 bytes | - |
dc.format.mimetype | application/pdf | - |
dc.relation (關聯) | Proceedings of the Ninth NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access - PatentMT (NTCIR 9), 661‒665. Tokyo, Japan, 6-9 December 2011 | |
dc.subject (關鍵詞) | Chinese segmentation, language modeling, training corpus | |
dc.title (題名) | Statistical approaches to patent translation - Experiments with various settings of training data | |
dc.type (資料類型) | conference | |