學術產出-學位論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

題名 英文介系詞片語定位與英文介系詞推薦
Attachment of English prepositional phrases and suggestions of English prepositions
作者 蔡家琦
Tsai, Chia Chi
貢獻者 劉昭麟
Liu, Chao Lin
蔡家琦
Tsai, Chia Chi
關鍵詞 語義分析
機器翻譯
文本校對
semantic analysis
machine translation
text proofreading
日期 2011
上傳時間 30-十月-2012 15:21:59 (UTC+8)
摘要 英文介系詞在句子裡所扮演的角色通常是用來使介系詞片語更精確地補述上下文,英文的母語使用者可以很直覺地使用。然而電腦不瞭解語義,因此不容易判斷介系詞修飾對象;非英文母語使用者則不容易直覺地使用正確的介系詞。所以本研究將專注於介系詞片語定位與介系詞推薦的議題。
在本研究將這二個介系詞議題抽象化為一個決策問題,並提出一個一般化的解決方法。這二個問題共通的部分在於動詞片語,一個簡單的動詞片語含有最重要的四個中心詞(headword):動詞、名詞一、介系詞和名詞二。由這四個中心詞做為出發點,透過WordNet做階層式的選擇,在大量的案例中尋找語義上共通的部分,再利用機器學習的方法建構一般化的模型。此外,針對介系詞片語定的問題,我們挑選較具挑戰性介系詞做實驗。
藉由使用真實生活語料,我們的方法處理介系詞片語定位的問題,比同樣考慮四個中心詞的最大熵值法(Max Entropy)好;但與考慮上下文的Stanford剖析器差不多。而在介系詞推薦的問題裡,較難有全面比較的對象,但我們的方法精準度可達到53.14%。
本研究發現,高層次的語義可以使分類器有不錯的分類效果,而透過階層式的選擇語義能使分類效果更佳。這顯示我們確實可以透過語義歸納一套準則,用於這二個介系詞的議題。相信成果在未來會對機器翻譯與文本校對的相關研究有所價值。
This thesis focuses on problems of attachment of prepositional phrases (PPs) and problems of prepositional suggestions. Determining the correct PP attachment is not easy for computers. Using correct prepositions is not easy for learners of English as a second language.
I transform the problems of PPs attachment and prepositional suggestion into an abstract model, and apply the same computational procedures to solve these two problems. The common model features four headwords, i.e., the verb, the first noun, the preposition, and the second noun in the prepositional phrases. My methods consider the semantic features of the headwords in WordNet to train classification models, and apply the learned models for tackling the attachment and suggestion problems. This exploration of PP attachment problems is special in that only those PPs that are almost equally possible to attach to the verb and the first noun were used in the study.
The proposed models consider only four headwords to achieve satisfactory performances. In experiments for PP attachment, my methods outperformed a Maximum Entropy classifier which also considered four headwords. The performances of my methods and of the Stanford parsers were similar, while the Stanford parsers had access to the complete sentences to judge the attachments. In experiments for prepositional suggestions, my methods found the correct prepositions 53.14% of the time, which is not as good as the best performing system today.
This study reconfirms that semantic information is instrument for both PP attachment and prepositional suggestions. High level semantic information helped to offer good performances, and hierarchical semantic synsets helped to improve the observed results. I believe that the reported results are valuable for future studies of PP attachment and prepositional suggestions, which are key components for machine translation and text proofreading.
參考文獻 [1] Eneko Agirre, Timothy Baldwin, and David Martinez. Improving Parsing and PP Attachment Performance with Sense Information. In 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2008.
[2] Michaela Atterer and Hinrich Schütze. Prepositional Phrase Attachment without Oracles. Computational Linguistics, 33(4):469–476, 2007.
[3] Timothy Baldwin, Valia Kordoni, and Aline Villavicencio. Prepositions in Applications: A Survey and Introduction to the Special Issue. Computational Linguistics, 35(2):119–149, 2009.
[4] Michael John Collins. Head-driven Statistical Models for Natural Language Parsing. PhD thesis, 1999.
[5] Gregory F. Coppola, Alexandra Birch, Tejaswini Deoskar, and Mark Steedman. Simple Semi-supervised Learning for Prepositional Phrase Attachment. In Proceedings of the 12th International Conference on Parsing Technologies, pages 129–139, 2011.
[6] RacheleDeFeliceandStephenG.Pulman.AutomaticallyAcquiringModelsofPreposition Use. In Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions, pages 45–50, 2007.
[7] Rachele De Felice and Stephen G. Pulman. A Classifier-based Approach to Preposition and Determiner Error Correction in L2 English. In Proceedings of the 22nd International Conference on Computational Linguistics, volume 1, pages 169–176, 2008.
[8] Michael Gamon, Jianfeng Gao, Chris Brockett, and Re Klementiev. Using Contextual Speller Techniques and Language Modeling for ESL Error Correction. In Proceedings of Joint Conference on Natural Language Processing 2008, pages 449–456, 2008.
[9] Na-Rae Han, Joel Tetreault, Soo-Hwa Lee, and Jin-Young Ha. Using an Error-annotated Learner Corpus to Develop an ESL/EFL Error Correction System. In Proceedings of the Seventh conference on International Language Resources and Evaluation, 2010.
[10] Donald Hindle and Mats Rooth. Structural Ambiguity and Lexical Relations. Computational Linguistics, 19(1):103–120, 1993.
[11] Dirk Hovy, Stephen Tratz, and Eduard Hovy. What’s in a Preposition?: Dimensions of Sense Disambiguation for an Interesting Word Class. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 454–462, 2010.
[12] Dan Klein and Christopher D. Manning. Fast Exact Inference with a Factored Model for Natural Language Parsing. In Advances in Neural Information Processing Systems, volume 15, pages 3–10, 2003.
[13] Claudia Leacock, Michael Gamon, and Chris Brockett. User Input and Interactions on Microsoft Research ESL Assistant. In Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications, pages 73–81, 2009.
[14] Ken C. Litkowski and Orin Hargraves. Coverage and Inheritance in The Preposition Project. In Proceedings of the Third ACL-SIGSEM Workshop on Prepositions, pages 37– 44, 2006.
[15] Chao-Lin Liu, Jing-Shin Chang, and Keh-Yih Su. The Semantic Score Approach to the Disambiguation of PP Attachment Problem. In Proceedings of the ROC Computational Linguistics Conference III, pages 253–270, 1990.
[16] Tom O’Hara and Janyce Wiebe. Exploiting Semantic Role Resources for Preposition Disambiguation. Computational Linguistics, 35(2):151–184, 2009.
[17] Marian Olteanu and Dan Moldovan. PP-Attachment Disambiguation Using Large Context. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 273–280, 2005.
[18] Patrick Pantel and Dekang Lin. An Unsupervised Approach to Prepositional Phrase Attachment Using Contextually Similar Words. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pages 101–108, 2000.
[19] Li Quan, Oleksandr Kolomiyets, and Marie-Francine Moens. KU Leuven at HOO-2012: A Hybrid Approach to Detection and Correction of Determiner and Preposition Errors in Non-native English Text. In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, pages 263–271, 2012.
[20] Adwait Ratnaparkhi, Jeff Reynar, and Salim Roukos. A Maximum Entropy Model for Prepositional Phrase Attachment. In Proceedings of the Workshop on Human Language Technology, pages 250–255, 1994.
[21] Jiri Stetina and Makoto Nagao. Corpus Based PP Attachment Ambiguity Resolution with a Semantic Dictionary. In Proceedings of the Fifth Workshop on Very Large Corpora, pages 66–80, 1997.
[22] JoelR.TetreaultandMartinChodorow.TheUpsandDownsofPrepositionErrorDetection in ESL Writing. In Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1, pages 865–872, 2008.
[23] Stephen Tratz and Dirk Hovy. Disambiguation of Preposition Sense Using Linguistically Motivated Features. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium, pages 96–100, 2009.
[24] Martin Volk. Combining Unsupervised and Supervised Methods for PP Attachment Disambiguation. In Proceedings of the 19th International Conference on Computational Linguistics, volume 1, pages 1–7, 2002.
[25] Jian-Cheng Wu, Joseph Chang, Yi-Chun Chen, Shih-Ting Huang, Mei-Hua Chen, and Jason S. Chang. Helping Our Own: NTHU NLPLAB System Description. In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, pages 295–301, 2012.
描述 碩士
國立政治大學
資訊科學學系
99753006
100
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0099753006
資料類型 thesis
dc.contributor.advisor 劉昭麟zh_TW
dc.contributor.advisor Liu, Chao Linen_US
dc.contributor.author (作者) 蔡家琦zh_TW
dc.contributor.author (作者) Tsai, Chia Chien_US
dc.creator (作者) 蔡家琦zh_TW
dc.creator (作者) Tsai, Chia Chien_US
dc.date (日期) 2011en_US
dc.date.accessioned 30-十月-2012 15:21:59 (UTC+8)-
dc.date.available 30-十月-2012 15:21:59 (UTC+8)-
dc.date.issued (上傳時間) 30-十月-2012 15:21:59 (UTC+8)-
dc.identifier (其他 識別碼) G0099753006en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/55034-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學學系zh_TW
dc.description (描述) 99753006zh_TW
dc.description (描述) 100zh_TW
dc.description.abstract (摘要) 英文介系詞在句子裡所扮演的角色通常是用來使介系詞片語更精確地補述上下文,英文的母語使用者可以很直覺地使用。然而電腦不瞭解語義,因此不容易判斷介系詞修飾對象;非英文母語使用者則不容易直覺地使用正確的介系詞。所以本研究將專注於介系詞片語定位與介系詞推薦的議題。
在本研究將這二個介系詞議題抽象化為一個決策問題,並提出一個一般化的解決方法。這二個問題共通的部分在於動詞片語,一個簡單的動詞片語含有最重要的四個中心詞(headword):動詞、名詞一、介系詞和名詞二。由這四個中心詞做為出發點,透過WordNet做階層式的選擇,在大量的案例中尋找語義上共通的部分,再利用機器學習的方法建構一般化的模型。此外,針對介系詞片語定的問題,我們挑選較具挑戰性介系詞做實驗。
藉由使用真實生活語料,我們的方法處理介系詞片語定位的問題,比同樣考慮四個中心詞的最大熵值法(Max Entropy)好;但與考慮上下文的Stanford剖析器差不多。而在介系詞推薦的問題裡,較難有全面比較的對象,但我們的方法精準度可達到53.14%。
本研究發現,高層次的語義可以使分類器有不錯的分類效果,而透過階層式的選擇語義能使分類效果更佳。這顯示我們確實可以透過語義歸納一套準則,用於這二個介系詞的議題。相信成果在未來會對機器翻譯與文本校對的相關研究有所價值。
zh_TW
dc.description.abstract (摘要) This thesis focuses on problems of attachment of prepositional phrases (PPs) and problems of prepositional suggestions. Determining the correct PP attachment is not easy for computers. Using correct prepositions is not easy for learners of English as a second language.
I transform the problems of PPs attachment and prepositional suggestion into an abstract model, and apply the same computational procedures to solve these two problems. The common model features four headwords, i.e., the verb, the first noun, the preposition, and the second noun in the prepositional phrases. My methods consider the semantic features of the headwords in WordNet to train classification models, and apply the learned models for tackling the attachment and suggestion problems. This exploration of PP attachment problems is special in that only those PPs that are almost equally possible to attach to the verb and the first noun were used in the study.
The proposed models consider only four headwords to achieve satisfactory performances. In experiments for PP attachment, my methods outperformed a Maximum Entropy classifier which also considered four headwords. The performances of my methods and of the Stanford parsers were similar, while the Stanford parsers had access to the complete sentences to judge the attachments. In experiments for prepositional suggestions, my methods found the correct prepositions 53.14% of the time, which is not as good as the best performing system today.
This study reconfirms that semantic information is instrument for both PP attachment and prepositional suggestions. High level semantic information helped to offer good performances, and hierarchical semantic synsets helped to improve the observed results. I believe that the reported results are valuable for future studies of PP attachment and prepositional suggestions, which are key components for machine translation and text proofreading.
en_US
dc.description.tableofcontents 1 緒論 1
1.1 研究背景 1
1.2 研究方法 3
1.3 研究成果 5
2 文獻回顧 8
2.1 介系片語定位 8
2.2 介系詞推薦 10
3 語料處理 12
3.1 語料庫 12
3.1.1 RRR 13
3.1.2 PTB3 13
3.1.3 華爾街日報與紐約時報 14
3.2 詞彙資料庫:WordNet 15
3.3 前處理 18
3.3.1 句子剖析與斷句 19
3.3.2 中心詞抽取 19
3.3.3 雜訊過濾 22
3.3.4 挑選具挑戰性的介系詞 23
3.4 目的語料 24
3.4.1 介系詞片語定位語料 25
3.4.2 介系詞推薦語料 28
4 研究方法 31
4.1特徵處理 31
4.1.1 特徵量化 32
4.1.2 特徵加權 38
4.2特徵選擇 39
4.2.1 階層式選擇 40
4.2.2 篩選條件 44
4.3模型建構 46
4.3.1 基準模型建構 46
4.3.2 傳統模型建構 48
4.3.3 高階模型建構 49
5 實驗 51
5.1實驗設計 51
5.1.1 基準模型實驗 51
5.1.2 傳統模型實驗 53
5.1.3 高階模型實驗 54
5.2 實驗評量 54
5.3 實驗分析:介系詞片語定位 56
5.3.1 不同條件組合之分析 56
5.3.2 階層式特徵選擇之分析 73
5.3.3 高階模型建構之分析 74
5.3.4 綜合評比與最大熵值法之分析 78
5.3.5 綜合評比與Stanford剖析器之分析 79
5.4實驗分析:介系詞推薦 81
5.4.1 不同條件組合之分析 81
5.4.2 高階模型建構之分析 85
5.4.3 綜合比較 85
5.4.4 大語料庫 88
6 結論 90
6.1 討論 91
6.2 未來工作 92
參考文獻 94
附錄I 同義詞集種類 98
zh_TW
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0099753006en_US
dc.subject (關鍵詞) 語義分析zh_TW
dc.subject (關鍵詞) 機器翻譯zh_TW
dc.subject (關鍵詞) 文本校對zh_TW
dc.subject (關鍵詞) semantic analysisen_US
dc.subject (關鍵詞) machine translationen_US
dc.subject (關鍵詞) text proofreadingen_US
dc.title (題名) 英文介系詞片語定位與英文介系詞推薦zh_TW
dc.title (題名) Attachment of English prepositional phrases and suggestions of English prepositionsen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) [1] Eneko Agirre, Timothy Baldwin, and David Martinez. Improving Parsing and PP Attachment Performance with Sense Information. In 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2008.
[2] Michaela Atterer and Hinrich Schütze. Prepositional Phrase Attachment without Oracles. Computational Linguistics, 33(4):469–476, 2007.
[3] Timothy Baldwin, Valia Kordoni, and Aline Villavicencio. Prepositions in Applications: A Survey and Introduction to the Special Issue. Computational Linguistics, 35(2):119–149, 2009.
[4] Michael John Collins. Head-driven Statistical Models for Natural Language Parsing. PhD thesis, 1999.
[5] Gregory F. Coppola, Alexandra Birch, Tejaswini Deoskar, and Mark Steedman. Simple Semi-supervised Learning for Prepositional Phrase Attachment. In Proceedings of the 12th International Conference on Parsing Technologies, pages 129–139, 2011.
[6] RacheleDeFeliceandStephenG.Pulman.AutomaticallyAcquiringModelsofPreposition Use. In Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions, pages 45–50, 2007.
[7] Rachele De Felice and Stephen G. Pulman. A Classifier-based Approach to Preposition and Determiner Error Correction in L2 English. In Proceedings of the 22nd International Conference on Computational Linguistics, volume 1, pages 169–176, 2008.
[8] Michael Gamon, Jianfeng Gao, Chris Brockett, and Re Klementiev. Using Contextual Speller Techniques and Language Modeling for ESL Error Correction. In Proceedings of Joint Conference on Natural Language Processing 2008, pages 449–456, 2008.
[9] Na-Rae Han, Joel Tetreault, Soo-Hwa Lee, and Jin-Young Ha. Using an Error-annotated Learner Corpus to Develop an ESL/EFL Error Correction System. In Proceedings of the Seventh conference on International Language Resources and Evaluation, 2010.
[10] Donald Hindle and Mats Rooth. Structural Ambiguity and Lexical Relations. Computational Linguistics, 19(1):103–120, 1993.
[11] Dirk Hovy, Stephen Tratz, and Eduard Hovy. What’s in a Preposition?: Dimensions of Sense Disambiguation for an Interesting Word Class. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 454–462, 2010.
[12] Dan Klein and Christopher D. Manning. Fast Exact Inference with a Factored Model for Natural Language Parsing. In Advances in Neural Information Processing Systems, volume 15, pages 3–10, 2003.
[13] Claudia Leacock, Michael Gamon, and Chris Brockett. User Input and Interactions on Microsoft Research ESL Assistant. In Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications, pages 73–81, 2009.
[14] Ken C. Litkowski and Orin Hargraves. Coverage and Inheritance in The Preposition Project. In Proceedings of the Third ACL-SIGSEM Workshop on Prepositions, pages 37– 44, 2006.
[15] Chao-Lin Liu, Jing-Shin Chang, and Keh-Yih Su. The Semantic Score Approach to the Disambiguation of PP Attachment Problem. In Proceedings of the ROC Computational Linguistics Conference III, pages 253–270, 1990.
[16] Tom O’Hara and Janyce Wiebe. Exploiting Semantic Role Resources for Preposition Disambiguation. Computational Linguistics, 35(2):151–184, 2009.
[17] Marian Olteanu and Dan Moldovan. PP-Attachment Disambiguation Using Large Context. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 273–280, 2005.
[18] Patrick Pantel and Dekang Lin. An Unsupervised Approach to Prepositional Phrase Attachment Using Contextually Similar Words. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pages 101–108, 2000.
[19] Li Quan, Oleksandr Kolomiyets, and Marie-Francine Moens. KU Leuven at HOO-2012: A Hybrid Approach to Detection and Correction of Determiner and Preposition Errors in Non-native English Text. In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, pages 263–271, 2012.
[20] Adwait Ratnaparkhi, Jeff Reynar, and Salim Roukos. A Maximum Entropy Model for Prepositional Phrase Attachment. In Proceedings of the Workshop on Human Language Technology, pages 250–255, 1994.
[21] Jiri Stetina and Makoto Nagao. Corpus Based PP Attachment Ambiguity Resolution with a Semantic Dictionary. In Proceedings of the Fifth Workshop on Very Large Corpora, pages 66–80, 1997.
[22] JoelR.TetreaultandMartinChodorow.TheUpsandDownsofPrepositionErrorDetection in ESL Writing. In Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1, pages 865–872, 2008.
[23] Stephen Tratz and Dirk Hovy. Disambiguation of Preposition Sense Using Linguistically Motivated Features. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium, pages 96–100, 2009.
[24] Martin Volk. Combining Unsupervised and Supervised Methods for PP Attachment Disambiguation. In Proceedings of the 19th International Conference on Computational Linguistics, volume 1, pages 1–7, 2002.
[25] Jian-Cheng Wu, Joseph Chang, Yi-Chun Chen, Shih-Ting Huang, Mei-Hua Chen, and Jason S. Chang. Helping Our Own: NTHU NLPLAB System Description. In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, pages 295–301, 2012.
zh_TW