Statistical approaches to patent translation - Experiments with various settings of training data | 學術產出 | 政大學術集成

學術產出-會議論文

文章檢視/開啟

pdf(977)

書目匯出

Google Scholar^TM

政大圖書館

學術資源探索系統

引文資訊

無doi欄位資料顯示引文資訊

TAIR相關學術產出

Simple Record
Full Record

題名	Statistical approaches to patent translation - Experiments with various settings of training data
作者	Tseng, Yuen-Hsien;Liu, Chao-Lin;Tsai, Chia-Chi;Wang, Jui-Ping;Chuang, Yi-Hsuan;Jeng, James 劉昭麟
貢獻者	資科系
關鍵詞	Chinese segmentation, language modeling, training corpus
日期	2011-12
上傳時間	22-六月-2016 17:10:06 (UTC+8)
摘要	This paper describes our experiments and results in the NTCIR-9 Chinese-to-English Patent Translation Task. A series of open source software were integrated to build a statistical machine translation model for the task. Various Chinese segmentation, additional resources, and training corpus preprocessing were then tried based on this model. As a result, more than 20 experiments were conducted to compare the translation performance. Our current results show that 1) consistent segmentation between the training and testing data is important to maintain the performance; 2) sufficient number of good quality bilingual training sentences is more helpful than additional bilingual dictionaries; and 3) the translation effectiveness in BLEU values doubles as the number of bilingual training sentences at the level of 100,000 doubles.
關聯	Proceedings of the Ninth NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access - PatentMT (NTCIR 9), 661‒665. Tokyo, Japan, 6-9 December 2011
資料類型	conference

dc.contributor	資科系
dc.creator (作者)	Tseng, Yuen-Hsien;Liu, Chao-Lin;Tsai, Chia-Chi;Wang, Jui-Ping;Chuang, Yi-Hsuan;Jeng, James
dc.creator (作者)	劉昭麟	zh_TW
dc.date (日期)	2011-12
dc.date.accessioned	22-六月-2016 17:10:06 (UTC+8)	-
dc.date.available	22-六月-2016 17:10:06 (UTC+8)	-
dc.date.issued (上傳時間)	22-六月-2016 17:10:06 (UTC+8)	-
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/98230	-
dc.description.abstract (摘要)	This paper describes our experiments and results in the NTCIR-9 Chinese-to-English Patent Translation Task. A series of open source software were integrated to build a statistical machine translation model for the task. Various Chinese segmentation, additional resources, and training corpus preprocessing were then tried based on this model. As a result, more than 20 experiments were conducted to compare the translation performance. Our current results show that 1) consistent segmentation between the training and testing data is important to maintain the performance; 2) sufficient number of good quality bilingual training sentences is more helpful than additional bilingual dictionaries; and 3) the translation effectiveness in BLEU values doubles as the number of bilingual training sentences at the level of 100,000 doubles.
dc.format.extent	1418128 bytes	-
dc.format.mimetype	application/pdf	-
dc.relation (關聯)	Proceedings of the Ninth NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access - PatentMT (NTCIR 9), 661‒665. Tokyo, Japan, 6-9 December 2011
dc.subject (關鍵詞)	Chinese segmentation, language modeling, training corpus
dc.title (題名)	Statistical approaches to patent translation - Experiments with various settings of training data
dc.type (資料類型)	conference