利用馬可夫邏輯網路模型與自動化生成的模板加強生醫文獻之語意角色標註 | Publication

Publications-Theses

Article View/Open

pdf(690)

Publication Export

Google Scholar^TM

Title	利用馬可夫邏輯網路模型與自動化生成的模板加強生醫文獻之語意角色標註 Biomedical semantic role labeling with a Markov Logic network and automatically generated patterns
Creator	賴柏廷
Contributor	蔡宗翰<br>劉昭麟 Tsai, Richard Tzong Han<br>Liu, Chao Lin 賴柏廷
Key Words	語意角色標註自然語言處理馬可夫邏輯網路機器學習資訊擷取 Semantic Role Labeling Natural Language Processing Markov Logic Network Machine Learning Information Extraction
Date	2011
Date Issued	30-Oct-2012 11:07:49 (UTC+8)
Summary	背景: 生醫文獻語意角色標註（Semantic Role Labeling, SRL）是一種自然語言處理的技術，其可用來將描述生物過程的語句以predicate-argument structures ( PASs ) 表示。SRL 經常受限於arguments的unbalance problem而且需要花費許多的時間和記憶體空間在學習 arguments 之間的相依性。方法: 我們提出一Markov Logic Network ( MLN ) -based SRL之系統，且此系統使用自動化生成之SRL 模板同時辨識constituents與候選之語意角色。結果及結論: 我們的方法在BioProp語料上來評估。實驗結果顯示我們的方法勝過目前最先進的系統。此外，使用SRL模板後，在時間及記憶體之花費上亦大幅的減少，而且我們自動化生成之模板亦能幫助建立這些模板。我們認為本論文提出之方法可以透過增加新的SRL模板例如：由生物學家手動寫的模板，而得到進一步的提升，而且本方法也為於需要處理大量SRL 語料時，提供一種可能的解法。 Background: Biomedical semantic role labeling ( SRL ) is a natural language processing technique that expresses the sentences that describe biological processes as predicate-argument structures ( PASs ) . SRL usually suffers from the unbalanced problem of arguments and consuming time and memory on learning the dependencies between the arguments. Method: We constructed a Markov Logic Network ( MLN ) -based SRL system, and the system uses SRL patterns, which utilizes automatically generated approaches, to simultaneously recognize the constituents and candidates of semantic roles. Results and conclusions: Our method is evaluated on the BioProp corpus. The experimental result shows that our method outperforms the state-of-the-art system. Furthermore, after applying SRL patterns, the costs of the time and memory are greatly reduced, and our automatically generated patterns are helpful in the development of these patterns. We consider that our method can be further improved by adding new SRL patterns such as biological experts manually written patterns and it also provide a possible solution to process large SRL corpus.
參考文獻	[1] H.-J. Dai, Y.-C. Chang, R. Tzong-Han Tsai, and W.-L. Hsu, "New Challenges for Biological Text-Mining in the Next Decade," Journal of Computer Science and Technology, vol. 25, pp. 169-179, 2010. [2] R. T.-H. Tsai, W.-C. Chou, Y.-S. Su, Y.-C. Lin, C.-L. Sung, H.-J. Dai, I. T.-H. Yeh, W. Ku, T.-Y. Sung, and W.-L. Hsu, "BIOSMILE: a semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features," BMC Bioinformatics, vol. 8, p. 325, 2007. [3] S. Pradhan, W. Ward, K. Hacioglu, J. Martin, and D. Jurafsky, "Shallow Semantic Parsing using Support Vector Machines," in Proceedings of the Human Language Technology Conference/North American chapter of the Association for Computational Linguistics annual meeting (HLT/NAACL-2004), Boston, MA, USA, 2004. [4] K. B. Cohen and L. Hunter, "A critical review of PASBio`s argument structures for biomedical verbs.," BMC Bioinformatics, vol. 7, 2006. [5] S. Pradhan, K. Hacioglu, V. Krugler, W. Ward, J. H. Martin, and D. Jurafsky. (2005). Support Vector Learning for Semantic Argument Classification [6] T. Cohn and P. Blunsom, "Semantic role labelling with tree conditional random fields," in In Proceedings of CoNLL-2005, ed, 2005, pp. 169-172. [7] P. Kingsbury and M. Palmer, "From Treebank to PropBank," ed, 2002. [8] X. Carreras and L. Marquez, "Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling," 2005. [9] D. Gildea and M. Palmer, "The necessity of parsing for predicate argument recognition," in ACL `02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 2001, pp. 239-246. [10] V. Punyakanok, D. Roth, W.-t. Yih, and D. Zimak, "Semantic role labeling via integer linear programming inference," in In Proceedings of COLING-04, ed, 2004, pp. 1346-1352. [11] S. Riedel, "Improving the accuracy and Efficiency of MAP Inference for Markov Logic," in Proceedings of the 24th Annual Conference on Uncertainty in AI (UAI `08), ed, 2008, pp. 468-475. [12] P. Domingos and M. Richardson, "Markov Logic: A Unifying Framework for Statistical Relational Learning," in PROCEEDINGS OF THE ICML-2004 WORKSHOP ON STATISTICAL RELATIONAL LEARNING AND ITS CONNECTIONS TO OTHER FIELDS, 2004, pp. 49-54. [13] M. Richardson and P. Domingos, "Markov logic networks," Machine Learning, vol. 62, pp. 107-136, 2006. [14] K. Crammer and Y. Singer, "Ultraconservative online algorithms for multiclass problems," Journal of Machine Learning Research, vol. 3, pp. 951-991, 2003. [15] S. Riedel, "Improving the accuracy and efficiency of map inference for markov logic," presented at the Proceedings of UAI 2008, 2008. [16] D. Gildea and D. Jurafsky, "Automatic labeling of semantic roles," Comput. Linguist., vol. 28, pp. 245-288, 2002. [17] N. Xue, "Calibrating features for semantic role labeling," in In Proceedings of EMNLP 2004, ed, 2004, pp. 88-94. [18] M. Surdeanu, S. Harabagiu, J. Williams, and P. Aarseth, "Using predicate-argument structures for information extraction," presented at the Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, Sapporo, Japan, 2003. [19] R. Agrawal, T. Imieli\\, \\#324, ski, and A. Swami, "Mining association rules between sets of items in large databases," SIGMOD Rec., vol. 22, pp. 207-216, 1993. [20] S. Riedel and I. Meza-Ruiz, "Collective semantic role labelling with Markov logic," presented at the Proceedings of the Twelfth Conference on Computational Natural Language Learning, Manchester, United Kingdom, 2008. [21] T. Wattarujeekrit, P. Shah, and N. Collier, "PASBio: predicate-argument structures for event extraction in molecular biology," BMC Bioinformatics, vol. 5, p. 155, 2004. [22] P. Thompson, S. Iqbal, J. McNaught, and S. Ananiadou, "Construction of an annotated corpus to support biomedical information extraction," BMC Bioinformatics, vol. 10, p. 349, 2009. [23] A. Bies, "Bracketing Guidelines for Treebank II Style Penn Treebank Project," ed, 1995. [24] W.-C. Chou, R. T.-H. Tsai, Y.-S. Su, W. Ku, T.-Y. Sung, and W.-L. Hsu, "A semi-automatic method for annotating a biomedical proposition bank," presented at the Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006, Sydney, Australia, 2006. [25] J. D. Kim, T. Ohta, Y. Tateisi, and J. Tsujii, "GENIA corpus -- a semantically annotated corpus for bio-textmining," Bioinformatics, vol. 19, pp. i180-i182, 2003. [26] D. Dahlmeier and H. T. Ng, "Domain Adaptation for Semantic Role Labeling in the Biomedical Domain," Bioinformatics (Oxford, England), 2010. [27] S. Bethard, Z. Lu, J. Martin, and L. Hunter, "Semantic Role Labeling for Protein Transport Predicates," BMC Bioinformatics, vol. 9, p. 277, 2008.
Description	碩士國立政治大學資訊科學學系 99753004 100
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0099753004
Type	thesis

dc.contributor.advisor	蔡宗翰<br>劉昭麟	zh_TW
dc.contributor.advisor	Tsai, Richard Tzong Han<br>Liu, Chao Lin	en_US
dc.contributor.author (Authors)	賴柏廷	zh_TW
dc.creator (作者)	賴柏廷	zh_TW
dc.date (日期)	2011	en_US
dc.date.accessioned	30-Oct-2012 11:07:49 (UTC+8)	-
dc.date.available	30-Oct-2012 11:07:49 (UTC+8)	-
dc.date.issued (上傳時間)	30-Oct-2012 11:07:49 (UTC+8)	-
dc.identifier (Other Identifiers)	G0099753004	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/54463	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊科學學系	zh_TW
dc.description (描述)	99753004	zh_TW
dc.description (描述)	100	zh_TW
dc.description.abstract (摘要)	背景: 生醫文獻語意角色標註（Semantic Role Labeling, SRL）是一種自然語言處理的技術，其可用來將描述生物過程的語句以predicate-argument structures ( PASs ) 表示。SRL 經常受限於arguments的unbalance problem而且需要花費許多的時間和記憶體空間在學習 arguments 之間的相依性。方法: 我們提出一Markov Logic Network ( MLN ) -based SRL之系統，且此系統使用自動化生成之SRL 模板同時辨識constituents與候選之語意角色。結果及結論: 我們的方法在BioProp語料上來評估。實驗結果顯示我們的方法勝過目前最先進的系統。此外，使用SRL模板後，在時間及記憶體之花費上亦大幅的減少，而且我們自動化生成之模板亦能幫助建立這些模板。我們認為本論文提出之方法可以透過增加新的SRL模板例如：由生物學家手動寫的模板，而得到進一步的提升，而且本方法也為於需要處理大量SRL 語料時，提供一種可能的解法。	zh_TW
dc.description.abstract (摘要)	Background: Biomedical semantic role labeling ( SRL ) is a natural language processing technique that expresses the sentences that describe biological processes as predicate-argument structures ( PASs ) . SRL usually suffers from the unbalanced problem of arguments and consuming time and memory on learning the dependencies between the arguments. Method: We constructed a Markov Logic Network ( MLN ) -based SRL system, and the system uses SRL patterns, which utilizes automatically generated approaches, to simultaneously recognize the constituents and candidates of semantic roles. Results and conclusions: Our method is evaluated on the BioProp corpus. The experimental result shows that our method outperforms the state-of-the-art system. Furthermore, after applying SRL patterns, the costs of the time and memory are greatly reduced, and our automatically generated patterns are helpful in the development of these patterns. We consider that our method can be further improved by adding new SRL patterns such as biological experts manually written patterns and it also provide a possible solution to process large SRL corpus.	en_US
dc.description.tableofcontents	CHAPTER 1 Introduction 1 1.1 Background 1 1.2 Biomedical Semantic Role Labeling ( SRL ) 2 1.3 Traditional Formulation of SRL 3 1.4 Problems 6 1.4.1 Unbalanced Problem 6 1.4.2 Dependency Problem 7 1.5 Our Goal 7 CHAPTER 2 Method 8 2.1 Markov Logic 8 2.1.1 First-Order Logic 8 2.1.2 Markov Networks 8 2.1.3 Markov Logic Networks 9 2.2 Implement Biomedical Semantic Role Labeling 9 2.2.1 Formulating SRL 9 2.2.2 Basic formulae 10 2.2.3 Conjunction formulae 11 2.2.4 Global formulae 12 2.3 Patterns for SRL 12 2.3.1 Introduction of the Patterns 12 2.3.2 Tree Pruning 13 2.3.3 Lexicon Pattern 14 2.3.4 Temporal Pattern 15 2.3.5 Conjunction Pattern 15 2.3.6 Syntactic Path Pattern 19 2.4 Collective Learning for SRL 19 2.4.1 Collective Learning 19 2.4.2 Linguistic Constraints 19 CHPATER 3 Experiment 21 3.1 Dataset 21 3.2 Experiment Design 22 3.2.1 Experiment 1 – The Effect of Automatically Generated Patterns 22 3.2.2 Experiment 2 – Improvement by Using Collective Learning 22 3.3 Evaluation Metric 22 3.4 t-test 23 CHAPTER 4 Results and Discussion 25 4.1 Improvement by Using SRL Patterns 25 4.2 Improvement by Using Collective Learning 26 4.3 Related Work 28 4.3.1 Biomedical Semantic Role Labeling Corpus 28 4.3.2 Biomedical Semantic Role Labeling System 28 CHAPTER 5 Conclusion 30 References 31	zh_TW
dc.language.iso	en_US	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0099753004	en_US
dc.subject (關鍵詞)	語意角色標註	zh_TW
dc.subject (關鍵詞)	自然語言處理	zh_TW
dc.subject (關鍵詞)	馬可夫邏輯網路	zh_TW
dc.subject (關鍵詞)	機器學習	zh_TW
dc.subject (關鍵詞)	資訊擷取	zh_TW
dc.subject (關鍵詞)	Semantic Role Labeling	en_US
dc.subject (關鍵詞)	Natural Language Processing	en_US
dc.subject (關鍵詞)	Markov Logic Network	en_US
dc.subject (關鍵詞)	Machine Learning	en_US
dc.subject (關鍵詞)	Information Extraction	en_US
dc.title (題名)	利用馬可夫邏輯網路模型與自動化生成的模板加強生醫文獻之語意角色標註	zh_TW
dc.title (題名)	Biomedical semantic role labeling with a Markov Logic network and automatically generated patterns	en_US
dc.type (資料類型)	thesis	en
dc.relation.reference (參考文獻)	[1] H.-J. Dai, Y.-C. Chang, R. Tzong-Han Tsai, and W.-L. Hsu, "New Challenges for Biological Text-Mining in the Next Decade," Journal of Computer Science and Technology, vol. 25, pp. 169-179, 2010. [2] R. T.-H. Tsai, W.-C. Chou, Y.-S. Su, Y.-C. Lin, C.-L. Sung, H.-J. Dai, I. T.-H. Yeh, W. Ku, T.-Y. Sung, and W.-L. Hsu, "BIOSMILE: a semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features," BMC Bioinformatics, vol. 8, p. 325, 2007. [3] S. Pradhan, W. Ward, K. Hacioglu, J. Martin, and D. Jurafsky, "Shallow Semantic Parsing using Support Vector Machines," in Proceedings of the Human Language Technology Conference/North American chapter of the Association for Computational Linguistics annual meeting (HLT/NAACL-2004), Boston, MA, USA, 2004. [4] K. B. Cohen and L. Hunter, "A critical review of PASBio`s argument structures for biomedical verbs.," BMC Bioinformatics, vol. 7, 2006. [5] S. Pradhan, K. Hacioglu, V. Krugler, W. Ward, J. H. Martin, and D. Jurafsky. (2005). Support Vector Learning for Semantic Argument Classification [6] T. Cohn and P. Blunsom, "Semantic role labelling with tree conditional random fields," in In Proceedings of CoNLL-2005, ed, 2005, pp. 169-172. [7] P. Kingsbury and M. Palmer, "From Treebank to PropBank," ed, 2002. [8] X. Carreras and L. Marquez, "Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling," 2005. [9] D. Gildea and M. Palmer, "The necessity of parsing for predicate argument recognition," in ACL `02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 2001, pp. 239-246. [10] V. Punyakanok, D. Roth, W.-t. Yih, and D. Zimak, "Semantic role labeling via integer linear programming inference," in In Proceedings of COLING-04, ed, 2004, pp. 1346-1352. [11] S. Riedel, "Improving the accuracy and Efficiency of MAP Inference for Markov Logic," in Proceedings of the 24th Annual Conference on Uncertainty in AI (UAI `08), ed, 2008, pp. 468-475. [12] P. Domingos and M. Richardson, "Markov Logic: A Unifying Framework for Statistical Relational Learning," in PROCEEDINGS OF THE ICML-2004 WORKSHOP ON STATISTICAL RELATIONAL LEARNING AND ITS CONNECTIONS TO OTHER FIELDS, 2004, pp. 49-54. [13] M. Richardson and P. Domingos, "Markov logic networks," Machine Learning, vol. 62, pp. 107-136, 2006. [14] K. Crammer and Y. Singer, "Ultraconservative online algorithms for multiclass problems," Journal of Machine Learning Research, vol. 3, pp. 951-991, 2003. [15] S. Riedel, "Improving the accuracy and efficiency of map inference for markov logic," presented at the Proceedings of UAI 2008, 2008. [16] D. Gildea and D. Jurafsky, "Automatic labeling of semantic roles," Comput. Linguist., vol. 28, pp. 245-288, 2002. [17] N. Xue, "Calibrating features for semantic role labeling," in In Proceedings of EMNLP 2004, ed, 2004, pp. 88-94. [18] M. Surdeanu, S. Harabagiu, J. Williams, and P. Aarseth, "Using predicate-argument structures for information extraction," presented at the Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, Sapporo, Japan, 2003. [19] R. Agrawal, T. Imieli\\, \\#324, ski, and A. Swami, "Mining association rules between sets of items in large databases," SIGMOD Rec., vol. 22, pp. 207-216, 1993. [20] S. Riedel and I. Meza-Ruiz, "Collective semantic role labelling with Markov logic," presented at the Proceedings of the Twelfth Conference on Computational Natural Language Learning, Manchester, United Kingdom, 2008. [21] T. Wattarujeekrit, P. Shah, and N. Collier, "PASBio: predicate-argument structures for event extraction in molecular biology," BMC Bioinformatics, vol. 5, p. 155, 2004. [22] P. Thompson, S. Iqbal, J. McNaught, and S. Ananiadou, "Construction of an annotated corpus to support biomedical information extraction," BMC Bioinformatics, vol. 10, p. 349, 2009. [23] A. Bies, "Bracketing Guidelines for Treebank II Style Penn Treebank Project," ed, 1995. [24] W.-C. Chou, R. T.-H. Tsai, Y.-S. Su, W. Ku, T.-Y. Sung, and W.-L. Hsu, "A semi-automatic method for annotating a biomedical proposition bank," presented at the Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006, Sydney, Australia, 2006. [25] J. D. Kim, T. Ohta, Y. Tateisi, and J. Tsujii, "GENIA corpus -- a semantically annotated corpus for bio-textmining," Bioinformatics, vol. 19, pp. i180-i182, 2003. [26] D. Dahlmeier and H. T. Ng, "Domain Adaptation for Semantic Role Labeling in the Biomedical Domain," Bioinformatics (Oxford, England), 2010. [27] S. Bethard, Z. Lu, J. Martin, and L. Hunter, "Semantic Role Labeling for Protein Transport Predicates," BMC Bioinformatics, vol. 9, p. 277, 2008.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM