學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 以共現資訊為基礎增進英漢翻譯對列改進方法
Using Co-Occurrence Information for Alignment Improvement in English-Chinese Translation
作者 黃昭憲
Huang,Chao Shainn
貢獻者 劉昭麟
Liu,Chao Lin
黃昭憲
Huang,Chao Shainn
關鍵詞 詞彙對列
電腦輔助翻譯
遺留字詞
新詞對擷取
日期 2009
上傳時間 8-Dec-2010 12:08:43 (UTC+8)
摘要 本論文承接呂明欣和張智傑兩位原有的翻譯系統,主要針對詞彙對列模組來進行改善,進而增進詞序範例樹之精確率和數量,以建立高品質的詞序範例樹資料庫,提升整體的翻譯品質。
  我們選用國民中學、高級中學和科普雜誌,這三種在句法結構和用字遣詞皆有所差異的中英文平行語料,先透過斷詞系統進行前處理,接著藉由辭典檔索引其相對應之翻譯字詞,以進行中英文詞彙之間的對列,其中更採用了原詞還原和同義詞擴充,來對原始的字詞進行補強。並且將對列完畢之後的遺留字詞,重新搭配組合,以一個中文字詞為基礎,分別對應一個英文字詞和對應多個英文字詞兩種搭配方式,並透過分析公式篩選出可信度較高的新詞對,以便擴充原始的辭典檔,使得詞彙對列模組達到更好的效果。
  在評估方面,以不同英文程度的平行語料當作訓練資料,將國際數學與科學教育成就趨勢調查測驗試題當做翻譯對象,利用NIST和BLEU當作評比的標準進行評估。實驗結果顯示,我們所提出的想法有助於提升詞彙對列的效果,並且可以產生更多的詞序範例樹以供翻譯系統進行詞序調動,並提升輔助式翻譯系統的翻譯品質。
This research continues the translation systems designed by Ming-Shin Lu and Chih-Chieh Chang. We mainly ameliorate the word alignment and create high-quality databases of reordering tree to improve the quality in translation.
  In this paper, we explore the possibility of finding alignments for words that are not aligned by methods that employ only information about word translations from English and Chinese dictionaries. With the proposed methods, we were able to align chunks of words between English and Chinese, not limiting to just word-to-word alignment.
  In evaluation, parallel corpuses with different degrees for English are used as training data. In addition, Trends in International Mathematics and Science Study questions are chosen as testing data. The evaluation is performed by exploiting NIST and BLEU as standards. The experimental results show that the proposed method enhances the effect of word alignment. Also, it can generate more reordering tree for bilingual structured string tree corredpondence. Besides, the translation quality of assisted translation system will increase by using our method.
參考文獻 [1] 三民學習網, http://www.grandeast.com.tw/Englishsite/ [Last visited on 2010/05/26].
[2] 中央研究院中文斷詞系統, http://ckipsvr.iis.sinica.edu.tw/ [Last visited on 2010/05/26].
[3] 牛津現代英漢雙解辭典, http://stardict.sourceforge.net/Dictionaries_zh_TW.php [Last visited on 2010/05/26].
[4] 田侃文,英漢專利文書文句對列與應用,國立政治大學資訊科學所,碩士論文, 2009。
[5] 呂明欣,電腦輔助試題翻譯:以國際數學與科學教育成就調查為例,國立政治大學資訊科學所,碩士論文, 2007。
[6] 狄克生片語, http://203.68.17.29/kevin/EteachWeb/DIXON/ [Last visited on 2009/11/10].
[7] 哈工大訊息檢索實驗室同義詞詞林擴充版, http://www.nlp.org.cn/docs/doclist.php?cat_id=9&type=7 [Last visited on 2010/05/26].
[8] 英文諺語, http://www.eng.fju.edu.tw/etc/quiz/proverbs.htm [Last visited on 2010/05/26].
[9] 科學人雜誌中英對照電子書, http://edu2.wordpedia.com/taipei_sa/ [Last visited on 2010/05/26].
[10] 旋元佑文法,
http://tw.myblog.yahoo.com/jw!GFGhGimWHxN4wRWXG1UDIL_XSA--/ [Last visited on 2010/05/26].
[11] 基礎英文1200句, http://hk.geocities.com/cnlyhhp/eng.htm [Last visited on 2010/05/26].
[12] 國民中學學習資源網, http://140.111.34.172/teacool/new_page_2.htm [Last visited on 2010/05/26].
[13] 教育部委託宜蘭縣發展九年一貫課程建智語文學習領域(英文)國中教科書補充資料暨題庫建置計畫, http://140.111.66.37/english/ [Last visited on 2010/05/26].
[14] 曾元顯、劉昭麟和莊則敬,專利雙語語料之中、英對照詞自動擷取,第二十一屆自然語言與語音處理研討會論文集,279–292, 2009。
[15] 梅家駿、竺一鳴和高蘊琦,同義詞詞林,上海:上海詞書出版社, 1983
[16] 張智傑,以範例為基礎之英漢TIMSS試題輔助翻譯,國立政治大學資訊科學所,碩士論文, 2007。
[17] 趙紅梅、劉群、張瑞強、呂雅娟、隅田英一郎和吳翠玲,漢英詞語對齊規範,中文信息學報第23卷第3期, 2009。
[18] M. H. Bai, J. M. You, K. J. Chen and J. S. Chang, Acquiring Translation Equivalences of Multiword Expressions by Normalized Correlation Frequencies, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 478–486, 2009.
[19] CEDICT漢英電子字典檔, http://us1.mdbg.net/chindict/chindict.php [Last visited on 2010/05/26].
[20] J. S. Chang and M. H. Chen, An Alignment Method for Noisy Parallel Corpora based on Image Processing Techniques, Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, 297–304, 1997.
[21] D. Chiang, A Hierarchical Phrase-Based Model for Statistical Machine Translation, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 263–270, 2005.
[22] G. Doddington, Automatic Evaluation of Machine Translation Quality Using N-gram Co-occurrence Statistics, Proceedings of the Second International Conference on Human Language Technology Research, 138–145, 2002.
[23] Dr.eye譯典通線上辭典, http://www.dreye.com:8080/axis/ddict.jsp [Last visited on 2010/05/26].
[24] S. J. Ker and J. S. Chang, A Class-based Approach to Word Alignment, Computational Linguistics, Vol. 23, No. 2, 313–343, 1997.
[25] S. Le, J. Youbing, D. Lin and S. Yufang, Word Alignment of English-Chinese Bilingual Corpus Based on Chunks, Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 110–116, 2000.
[26] Y. Ma, N. Stroppa and A. Way, Bootstrapping Word Alignment via Word Packing, Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, 304–311, 2007.
[27] Y. Ma, S. Ozdowska, Y. Sun and A. Way, Improving Word Alignment Using Syntactic Dependencies, Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation, 69–77, 2008.
[28] C. D. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, The MIT Press, 1999.
[29] C. D. Manning, P.Raghavan and H. Schutze, Introduction to Information Retrieval, Cambridge University Press, 2008.
[30] R. Mihalcea and T. Pedersen, An Evaluation Exercise for Word Alignment, Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Ttranslation and Beyond, 1–10, 2003.
[31] F. J. Och, An Efficient Method for Determining Bilingual Word Classes, In 9th Conference of the European Chapter of the Association for Computational Linguistics, 71–76, 1999.
[32] F. J. Och and Hermann Ney, Improved Statistical Alignment Models, Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, 440–447, 2000.
[33] K. Papineni, S. Roukos, T. Ward, and W. J. Zhu, BLEU: A Method for Automatic Evaluation of Machine Translation, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 311–318, 2002.
[34] M. F. Porter, An Algorithm for Suffix Stripping, Program, 130–137, 1980.
[35] D. Ren, H. Wu and H. Wang, Improving Statistical Word Alignment With Various Clues, In Proceedings of Machine Translation Summit XI, 391–397, 2007.
[36] SRILM, http://www.speech.sri.com/projects/srilm/ [Last visited on 2010/05/26].
[37] The International Association for the Evaluation of Education Achievement, http://www.uea.nl/ [Last visited on 2010/05/26].
[38] The Stanford Parser: A statistical parser, http://nlp.stanford.edu/software/lex-parser.shtml [Last visited on 2010/05/26].
[39] TIMSS國際數學與科學教育成就趨勢調查, http://timss.sec.ntnu.edu.tw/timss2007/news.asp [Last visited on 2010/05/26].
[40] M. Utiyama and H. Isahara, A Japanese-English Patent Parallel Corpus, Proceedings of the Eleventh Machine Translation Summit, 475–482, 2007.
[41] D. Wu, Grammarless Extraction of Phrasal Translation Examples from Parallel Texts, Proceedings of the Sixth International Conference on Theoretical and Methodological Issues in Machine Translation, 354–372,1995.
[42] WordNet API, http://wordnet.princeton.edu/ [Last visited on 2010/05/26].
描述 碩士
國立政治大學
資訊科學學系
97753007
98
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0097753007
資料類型 thesis
dc.contributor.advisor 劉昭麟zh_TW
dc.contributor.advisor Liu,Chao Linen_US
dc.contributor.author (Authors) 黃昭憲zh_TW
dc.contributor.author (Authors) Huang,Chao Shainnen_US
dc.creator (作者) 黃昭憲zh_TW
dc.creator (作者) Huang,Chao Shainnen_US
dc.date (日期) 2009en_US
dc.date.accessioned 8-Dec-2010 12:08:43 (UTC+8)-
dc.date.available 8-Dec-2010 12:08:43 (UTC+8)-
dc.date.issued (上傳時間) 8-Dec-2010 12:08:43 (UTC+8)-
dc.identifier (Other Identifiers) G0097753007en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/49473-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學學系zh_TW
dc.description (描述) 97753007zh_TW
dc.description (描述) 98zh_TW
dc.description.abstract (摘要) 本論文承接呂明欣和張智傑兩位原有的翻譯系統,主要針對詞彙對列模組來進行改善,進而增進詞序範例樹之精確率和數量,以建立高品質的詞序範例樹資料庫,提升整體的翻譯品質。
  我們選用國民中學、高級中學和科普雜誌,這三種在句法結構和用字遣詞皆有所差異的中英文平行語料,先透過斷詞系統進行前處理,接著藉由辭典檔索引其相對應之翻譯字詞,以進行中英文詞彙之間的對列,其中更採用了原詞還原和同義詞擴充,來對原始的字詞進行補強。並且將對列完畢之後的遺留字詞,重新搭配組合,以一個中文字詞為基礎,分別對應一個英文字詞和對應多個英文字詞兩種搭配方式,並透過分析公式篩選出可信度較高的新詞對,以便擴充原始的辭典檔,使得詞彙對列模組達到更好的效果。
  在評估方面,以不同英文程度的平行語料當作訓練資料,將國際數學與科學教育成就趨勢調查測驗試題當做翻譯對象,利用NIST和BLEU當作評比的標準進行評估。實驗結果顯示,我們所提出的想法有助於提升詞彙對列的效果,並且可以產生更多的詞序範例樹以供翻譯系統進行詞序調動,並提升輔助式翻譯系統的翻譯品質。
zh_TW
dc.description.abstract (摘要) This research continues the translation systems designed by Ming-Shin Lu and Chih-Chieh Chang. We mainly ameliorate the word alignment and create high-quality databases of reordering tree to improve the quality in translation.
  In this paper, we explore the possibility of finding alignments for words that are not aligned by methods that employ only information about word translations from English and Chinese dictionaries. With the proposed methods, we were able to align chunks of words between English and Chinese, not limiting to just word-to-word alignment.
  In evaluation, parallel corpuses with different degrees for English are used as training data. In addition, Trends in International Mathematics and Science Study questions are chosen as testing data. The evaluation is performed by exploiting NIST and BLEU as standards. The experimental results show that the proposed method enhances the effect of word alignment. Also, it can generate more reordering tree for bilingual structured string tree corredpondence. Besides, the translation quality of assisted translation system will increase by using our method.
en_US
dc.description.tableofcontents 第一章 緒論 1
1.1 研究背景與目的 1
1.2 研究方法 4
1.3 論文架構 5
第二章 文獻探討 6
2.1 詞彙對列技術之相關研究 6
2.2 遺留字詞對列之相關研究 9
第三章 語料來源與系統架構 10
3.1 中英平行語料分析 11
3.2 詞彙對列模組 13
3.3 詞序範例樹資料庫 13
3.4 辭典的選取 14
第四章 詞彙對列技術 17
4.1 中英文平行句對的詞彙對列 17
4.1.1 以辭典為基礎進行詞彙對列 19
4.1.2 以原詞還原為基礎進行詞彙對列 23
4.1.3 以同義詞詞林為基礎進行詞彙對列 26
4.2 遺留字詞的利用 31
4.2.1 停用詞列表與遺漏詞修補 32
4.2.2 對列計算 39
第五章 系統效率評估 44
5.1 實驗語料來源 44
5.2 實驗設計流程 46
5.2.1 詞彙對列結果之檢驗與比較 47
5.2.2 利用機器翻譯系統翻譯英文試題 48
5.3 評估指標BLEU和NIST 49
5.4 實驗結果與比較 51
5.4.1 不同語料之詞彙對列結果比較 51
5.4.2 遺留字詞之結果分析 54
5.4.3 以遺留字詞修正詞彙對列結果之比較 57
5.4.4 輔助式機器翻譯系統翻譯品質提升評估 59
第六章 結論與未來展望 66
6.1 結論 66
6.2 未來展望 68
參考文獻 69
附錄Ⅰ 新詞對人工檢測結果 74
附錄Ⅱ 論文口試其他相關討論 75
附錄III 各組別詳細分數結果 76
zh_TW
dc.format.extent 8339859 bytes-
dc.format.mimetype application/pdf-
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0097753007en_US
dc.subject (關鍵詞) 詞彙對列zh_TW
dc.subject (關鍵詞) 電腦輔助翻譯zh_TW
dc.subject (關鍵詞) 遺留字詞zh_TW
dc.subject (關鍵詞) 新詞對擷取zh_TW
dc.title (題名) 以共現資訊為基礎增進英漢翻譯對列改進方法zh_TW
dc.title (題名) Using Co-Occurrence Information for Alignment Improvement in English-Chinese Translationen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) [1] 三民學習網, http://www.grandeast.com.tw/Englishsite/ [Last visited on 2010/05/26].zh_TW
dc.relation.reference (參考文獻) [2] 中央研究院中文斷詞系統, http://ckipsvr.iis.sinica.edu.tw/ [Last visited on 2010/05/26].zh_TW
dc.relation.reference (參考文獻) [3] 牛津現代英漢雙解辭典, http://stardict.sourceforge.net/Dictionaries_zh_TW.php [Last visited on 2010/05/26].zh_TW
dc.relation.reference (參考文獻) [4] 田侃文,英漢專利文書文句對列與應用,國立政治大學資訊科學所,碩士論文, 2009。zh_TW
dc.relation.reference (參考文獻) [5] 呂明欣,電腦輔助試題翻譯:以國際數學與科學教育成就調查為例,國立政治大學資訊科學所,碩士論文, 2007。zh_TW
dc.relation.reference (參考文獻) [6] 狄克生片語, http://203.68.17.29/kevin/EteachWeb/DIXON/ [Last visited on 2009/11/10].zh_TW
dc.relation.reference (參考文獻) [7] 哈工大訊息檢索實驗室同義詞詞林擴充版, http://www.nlp.org.cn/docs/doclist.php?cat_id=9&type=7 [Last visited on 2010/05/26].zh_TW
dc.relation.reference (參考文獻) [8] 英文諺語, http://www.eng.fju.edu.tw/etc/quiz/proverbs.htm [Last visited on 2010/05/26].zh_TW
dc.relation.reference (參考文獻) [9] 科學人雜誌中英對照電子書, http://edu2.wordpedia.com/taipei_sa/ [Last visited on 2010/05/26].zh_TW
dc.relation.reference (參考文獻) [10] 旋元佑文法,zh_TW
dc.relation.reference (參考文獻) http://tw.myblog.yahoo.com/jw!GFGhGimWHxN4wRWXG1UDIL_XSA--/ [Last visited on 2010/05/26].zh_TW
dc.relation.reference (參考文獻) [11] 基礎英文1200句, http://hk.geocities.com/cnlyhhp/eng.htm [Last visited on 2010/05/26].zh_TW
dc.relation.reference (參考文獻) [12] 國民中學學習資源網, http://140.111.34.172/teacool/new_page_2.htm [Last visited on 2010/05/26].zh_TW
dc.relation.reference (參考文獻) [13] 教育部委託宜蘭縣發展九年一貫課程建智語文學習領域(英文)國中教科書補充資料暨題庫建置計畫, http://140.111.66.37/english/ [Last visited on 2010/05/26].zh_TW
dc.relation.reference (參考文獻) [14] 曾元顯、劉昭麟和莊則敬,專利雙語語料之中、英對照詞自動擷取,第二十一屆自然語言與語音處理研討會論文集,279–292, 2009。zh_TW
dc.relation.reference (參考文獻) [15] 梅家駿、竺一鳴和高蘊琦,同義詞詞林,上海:上海詞書出版社, 1983zh_TW
dc.relation.reference (參考文獻) [16] 張智傑,以範例為基礎之英漢TIMSS試題輔助翻譯,國立政治大學資訊科學所,碩士論文, 2007。zh_TW
dc.relation.reference (參考文獻) [17] 趙紅梅、劉群、張瑞強、呂雅娟、隅田英一郎和吳翠玲,漢英詞語對齊規範,中文信息學報第23卷第3期, 2009。zh_TW
dc.relation.reference (參考文獻) [18] M. H. Bai, J. M. You, K. J. Chen and J. S. Chang, Acquiring Translation Equivalences of Multiword Expressions by Normalized Correlation Frequencies, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 478–486, 2009.zh_TW
dc.relation.reference (參考文獻) [19] CEDICT漢英電子字典檔, http://us1.mdbg.net/chindict/chindict.php [Last visited on 2010/05/26].zh_TW
dc.relation.reference (參考文獻) [20] J. S. Chang and M. H. Chen, An Alignment Method for Noisy Parallel Corpora based on Image Processing Techniques, Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, 297–304, 1997.zh_TW
dc.relation.reference (參考文獻) [21] D. Chiang, A Hierarchical Phrase-Based Model for Statistical Machine Translation, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 263–270, 2005.zh_TW
dc.relation.reference (參考文獻) [22] G. Doddington, Automatic Evaluation of Machine Translation Quality Using N-gram Co-occurrence Statistics, Proceedings of the Second International Conference on Human Language Technology Research, 138–145, 2002.zh_TW
dc.relation.reference (參考文獻) [23] Dr.eye譯典通線上辭典, http://www.dreye.com:8080/axis/ddict.jsp [Last visited on 2010/05/26].zh_TW
dc.relation.reference (參考文獻) [24] S. J. Ker and J. S. Chang, A Class-based Approach to Word Alignment, Computational Linguistics, Vol. 23, No. 2, 313–343, 1997.zh_TW
dc.relation.reference (參考文獻) [25] S. Le, J. Youbing, D. Lin and S. Yufang, Word Alignment of English-Chinese Bilingual Corpus Based on Chunks, Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 110–116, 2000.zh_TW
dc.relation.reference (參考文獻) [26] Y. Ma, N. Stroppa and A. Way, Bootstrapping Word Alignment via Word Packing, Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, 304–311, 2007.zh_TW
dc.relation.reference (參考文獻) [27] Y. Ma, S. Ozdowska, Y. Sun and A. Way, Improving Word Alignment Using Syntactic Dependencies, Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation, 69–77, 2008.zh_TW
dc.relation.reference (參考文獻) [28] C. D. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, The MIT Press, 1999.zh_TW
dc.relation.reference (參考文獻) [29] C. D. Manning, P.Raghavan and H. Schutze, Introduction to Information Retrieval, Cambridge University Press, 2008.zh_TW
dc.relation.reference (參考文獻) [30] R. Mihalcea and T. Pedersen, An Evaluation Exercise for Word Alignment, Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Ttranslation and Beyond, 1–10, 2003.zh_TW
dc.relation.reference (參考文獻) [31] F. J. Och, An Efficient Method for Determining Bilingual Word Classes, In 9th Conference of the European Chapter of the Association for Computational Linguistics, 71–76, 1999.zh_TW
dc.relation.reference (參考文獻) [32] F. J. Och and Hermann Ney, Improved Statistical Alignment Models, Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, 440–447, 2000.zh_TW
dc.relation.reference (參考文獻) [33] K. Papineni, S. Roukos, T. Ward, and W. J. Zhu, BLEU: A Method for Automatic Evaluation of Machine Translation, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 311–318, 2002.zh_TW
dc.relation.reference (參考文獻) [34] M. F. Porter, An Algorithm for Suffix Stripping, Program, 130–137, 1980.zh_TW
dc.relation.reference (參考文獻) [35] D. Ren, H. Wu and H. Wang, Improving Statistical Word Alignment With Various Clues, In Proceedings of Machine Translation Summit XI, 391–397, 2007.zh_TW
dc.relation.reference (參考文獻) [36] SRILM, http://www.speech.sri.com/projects/srilm/ [Last visited on 2010/05/26].zh_TW
dc.relation.reference (參考文獻) [37] The International Association for the Evaluation of Education Achievement, http://www.uea.nl/ [Last visited on 2010/05/26].zh_TW
dc.relation.reference (參考文獻) [38] The Stanford Parser: A statistical parser, http://nlp.stanford.edu/software/lex-parser.shtml [Last visited on 2010/05/26].zh_TW
dc.relation.reference (參考文獻) [39] TIMSS國際數學與科學教育成就趨勢調查, http://timss.sec.ntnu.edu.tw/timss2007/news.asp [Last visited on 2010/05/26].zh_TW
dc.relation.reference (參考文獻) [40] M. Utiyama and H. Isahara, A Japanese-English Patent Parallel Corpus, Proceedings of the Eleventh Machine Translation Summit, 475–482, 2007.zh_TW
dc.relation.reference (參考文獻) [41] D. Wu, Grammarless Extraction of Phrasal Translation Examples from Parallel Texts, Proceedings of the Sixth International Conference on Theoretical and Methodological Issues in Machine Translation, 354–372,1995.zh_TW
dc.relation.reference (參考文獻) [42] WordNet API, http://wordnet.princeton.edu/ [Last visited on 2010/05/26].zh_TW