學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 中文繁簡等義詞自動辨識之研究
A Study on Automatic Recognition on Exact Synonyms between Traditional and Simplified Chinese
作者 黃群弼
貢獻者 劉吉軒
Liu,Jyi Shane
黃群弼
關鍵詞 中文繁簡對照
等義詞
自動辨識
日期 2008
上傳時間 19-Sep-2009 12:10:04 (UTC+8)
摘要 中文繁簡在字體或電腦編碼上明顯不同之外,在部份詞彙的用法也有不同,這些用法不同的詞彙卻有相同意義的詞彙稱為繁簡體中的等義詞,這些等義詞在雙方文化交流時可能會造成一些障礙,例如人們互相對話、文件書籍翻譯或軟體系統等轉換時容易造成詞義上的誤解,目前均以人工方式來解決不同詞彙的問題,均會費時耗力且易疏漏,若能利用科學的方法讓電腦能自動辨識中文繁簡的等義詞,便能利用辨識出的等義詞給予提示,解決繁簡詞義不同所造成的誤解。
依照實驗設計架構,首先建立電腦類與一般類的繁簡體語料庫,作為辨識的基礎,並建立研究的架構與方法,分為二個階段三種方法,第一階段使用第一種方法,我們先使用N-gram辨識等義詞,評估單一方法是否能有效辨識出等義詞,第二階段使用第二種方法PMI-IR & LC-IR方法與第三種方法Context Vector,評估第二階段的方法是否能將等義詞的辨識能力提高。
根據本研究目的,讓電腦能自動在語料庫中自動辨識中文繁簡等義詞,所以提出了新的辨識架構,用N-gram初步辨識出等義詞,並經由PMI-IR & LC-IR與Context Vector方法提高Precision約0~20%不等。本研究結論是採用不同語言的語料庫,使用N-gram能夠辦識出等義詞,並搭配PMI-IR & LC-IR與Context Vector方法後,可以強化與提昇其等義詞辨識的能力,解決單一方法等義詞辨識能力不足之問題。
Traditional Chinese and Simplied Chinese are not only different in the typeface and in the computer code, but also in the partial usage of vocabularies. These vocabularies which have different usage but have the same significance are called synonyms. These synonyms will cause some obstacles and misunderstanding in meaning when two parties have cultural exchange, such as during conversation, documents and books translation or softwares system transformation. What we do to solve the problem now is picked them out by manpower, but that will waste a lot of time and strength and easily make errors. If we can use scientific way to let the computer distinguish automatically the synonyms between Traditional Chinese and Simplied Chinese, we will be able to solve such misunderstanding by the hints of the distinguished synonyms.
According to the structure of experiment, to let the computer distinguish automatically the synonyms between Traditional Chinese and Simplied Chinese, we have to establish a Traditional Chinese and Simplied Chinese computer category and a general category first as the basis of identification. We should build up the research structure and the method, which divided into two stages and three methods. The first stage uses the first method to use N-gram to distinguish the synonyms and then review if this single method can identify the synonyms effectively. The second stage uses the second method PMI-IR & LC-IR and the third method Context Vector and review if the second stage can raise the synonyms’ ability of identification.
According to this research purpose, the computer to study on automatic exact recognition synonyms between traditional and simplified Chinese, so has proposed the new structure of distinguishing, N-gram automatic exact recognition synonym tentatively, and PMI-IR & LC-IR and Context Vector method can improve Precision about 0~20%. This conclusion is a corpus base of using different languages, using N-gram can be exact recognition synonyms, PMI-IR & LC-IR and Context Vector method, can improve single method ability.
參考文獻 1. Amruta Purandare, & Ted Pedersen. (2004). Improving Word Sense Discrimination with Gloss Augmented Feature Vectors. Appears in the Proceedings of the Workshop on Lexical Resources for the Web and Word Sense Disambiguation. Puebla Mexico.
2. Attar, R., & Fraenkel, A. S. (1977). Local Feedback in Full-Text Retrieval Systems. Journal of the ACM, Volume 24, Issue 3, (頁 397-417).
3. Ben, Gabriel, & David. (2006). Dimensionality Reduction Aids Term Co-occurrence Based Multi-Document Summarization.
4. Brown, & Peter. (1991). Word sense disambiguation using statistical methods. In ACL 29, (pp. 264-270).
5. C. J. Van Rijsbergen. (1979). Information Retrieval. Butterworths, sec. edition., (pp 208).
6. Chen, Jen-Nan, & Chang, Jason-S. (1998). TopSense: A Topical Sense Clustering Method based on Information Retrieval Techniques on Machine Readable Resources. Special Issue on Word Sense Disambiguation, Computational Linguistics, (pp. 61-95).
7. Chen, Keh-Jiann, & You, Jia-Ming. (2002). A Study on Word Similarity using Context Vector Models.
8. Chen, Keh-Jiann, & You, Jia-Ming. (2006). Improving Context Vector Models by Feature Clustering for Automatic Thesaurus Construction.”. Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing.
9. David Hull. (1994). Improving Text Retrieval for the Routing Problem using Latent Semantic Indexing. ACM SIGIR Conference.
10. David Yarowsky. (1994). Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics. Las Cruces, NM, (pp. 88-95).
11. Daniel Jurafsky, & James H. Martin. (2000). Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. Prentice-Hall.
12. Dan Klein, & Christopher D. Manning. (2003). Accurate Unlexicalized Parsing. Proceedings of the 41st Meeting of the Association for Computational Linguistics., (pp. 423-430).
13. Derrick Higgins. (2004). Which statistics reflect semantics? Rethinking synonymy and word similarity.
14. Dong, Zhen-dong, & Dong, Qiang. (2006). Hownet and the Computation of Meaning. World Scientific.
15. Efron, B. (1979). Bootstrap methods: another look at the jackknife. Ann. Statist., (pp. 1-26).
16. G. Salton & MJ McGill. (1983). Introduction to modern information retrieval. New York: McGraw-Hill.
17. GAISWWW Query. 擷取自 http://gais.cs.ccu.edu.tw/
18. Gale, William, Church, Kenneth, Yarowsky. (1992). A method of disambiguating word senses in a large corpus. Computers and the Humanties 26, (pp. 415-439).
19. Google Offers Immediate Access to 3 Billion Web Documents. (2001). 擷取自 Google Inc: http://www.google.com/press/pressrel/3billion.html
20. H. Edmund Stiles. (1961). The association factor in information retrieval. Journal of the ACM, 8, (pp. 271-279).
21. Helen J. Peat, & Peter Willett . (1991). The Limitations of Term Co-occurrence Data for Query Expansion in Document Retrieval Systems.
22. Howard D. White, Xia Lin, Jan W. Buzydlowski, & Chaomei Chen . (2001). Term Co-occurrence Analysis as an Interface for Digital Libraries.
23. Jarmasz, M., & Szpakowicz. S. (2003). Roget’s thesaurus and semantic similarity. University of Ottawa ms.
24. Joe A. Guthrie, Louise Guthrie, Yorick Wilks, & Homa Aidinejad. (1991). Subject-Dependent Co-occurrence and Word Sense Disambiguation. In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, (pp. 146-152).
25. Le, Cuong-Anh, & Shimizu, Akira. (2004). High WSD Accuracy Using Naive Bayesian Classifier with Rich Features. PACLIC 18. Tokyo.
26. Lesk, M. E. (1969). Word-word associations in document retrieval systems. American Documentation, 20, (pp. 27-38).
27. Li, Xiaobin, Stan Szpakowicz, & Matwin. (1995). A WordNet-Based Algorithm for Word Semantic Sense Disambiguation. In Proceedings of the 14th International Joint Conference on Artificial Intelligence IJCAL-95,. Montreal, Canada.
28. Lin, De-kang. (1997). Using Syntactic Dependency as Local Context to Resolve Word Sense Ambiguity. In Proceedings of ACL-97. Madrid, Spain.
29. Lu, Wen-Hsiang, Lee, Hsi-Jian, & Chien, Lee-Feng. (2003). Term Translation Extraction Using Web Mining Techniques.
30. Magnus Sahlgren. (2006). Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces.
31. Manning, Christopher, Schutze, & Hinrich. (1999). Foundations of Statistical Natural Language Processing. MIT Press.
32. Marco Baroni, & Sabrina Bisi. (2004). Using cooccurrence statistics and the web to discover synonyms in a technical language.
33. Mar´ıa Ruiz-Casado, Enrique Alfonseca, & Pablo Castells. (2005). Using context-window overlapping in synonym discovery and ontology extension.
34. M. E. Maron, & J. L. Kuhns. (1960). On relevance, probabilistic indexing and information retrieval. Journal of rhe ACM, 7, (pp. 216-244).
35. Michael.W. Berry, Susan.T. Dumais, & Amy.T. Shippy. (1995). A Case Study of Latent Semantic Indexing. Tech Rep., (pp. 95-271).
36. Michael Lesk . (1986). Automatic Sense Disambiguation: How to tell a pine cone from an ice cream cone. In Proceedings of the 1986 SIGDOC Conference, New York. Association for Computing Machinerypp. 24-26.
37. Siddharth Patwardhan, Satanjeev Banerjee, & Ted Pedersen. (2005). SenseRelate::TargetWord - A Generalized Framework for Word Sense Disambiguation. Appears in the Proceedings of the Twentieth National Conference on Artificial Intelligence. Pittsburgh, PA.
38. Peng, Fu-chun, Huang, Xiang-ji, Dale, Schuurmans,& Wang, Shao-jun. (2003). Text Classification in Asian Languages without Word Segmentation. Proceedings of the Sixth Internationa Workshop on Information Retrieval with Asian Languages (IRAL), Vol. 18, (pp. 41-48).
39. Philip Edmonds & Graeme Hirst. (2002). Near-synonymy and lexical choice. Computational Linguistics,28(2), (pp. 105-144).
40. Q.yuhen斷詞系統. 擷取自 http://www.rainsts.net
41. Senseval-2. (2001). 擷取自 http://193.133.140.102/senseval2/
42. Sketch Engine. 擷取自 http://www.sketchengine.co.uk/
43. Slator, B. (1991). Using Context for Sense Preference. In Zernik (ed.) Lexical Acquisition: Exploiting on-line Resources to Build a Lexicon, Lawrence Erlbaum, Hillsdale.
44. Soumen Chakrabarti, Martin van den Berg, & Byron Dom. (1999). Focused crawling: A new approach to Topic-Specific Web Resource Discovery. Proceedings of the WWW8 Conference.
45. Stanford Parser. 擷取自 http://www-nlp.stanford.edu/downloads/lex-parser.shtml
46. Stevens, M. E., Giuliano, V. E., & Heilprin, L. B. (1965). Statistical association methods for mechanized documentation. Washington:National Bureau of Standards (Occasional Publication no. 269).
47. Thomas K Landaauer, & Susan T. Dumais. (1997). A solution to Plato`s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104(2), (pp. 211–240).
48. Turney, . (2001). Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning (ECML2001), (pp. 491-502). Freiburg, Germany.
49. UngererF & Schmid. (1996). An Introduction to Cognitive Linguistics. London: Longman.
50. Walker. (1987). Thesaurus-Based Disambiguation.
51. Wang, Jenq-Haur, Teng, Jei-Wen, Cheng, Pu-Jen, Lu, Wen-Hsiang, & Chien, Lee-Feng (2004). Translating Unknown Cross-Lingual Queries in Digital Libraries Using a Web-based Approach.
52. William C. Hannas. (1997). Asia`s Orthographic Dilemma. University of Hawaii Press.
53. William, R. Caid, & Joel, L. Carleton. (2003). Context Vector-Based Text Retrieval. A Fair Isaac White Paper.
54. Yang, Chang-hua, & Sue, Jin-Ker. (2002). Considerations of Linking WordNet with MRD. In Proceedings of the 19th International Conference on Computational Linguistics, (pp. 1121-1127).
55. 中央研究院斷詞系統. 擷取自 http://rocling.iis.sinica.edu.tw/CKIP/wordsegment.htm
56. 中国知网. 擷取自 http://www.cnki.net/index.htm
57. 北京大學语言信息处理研究所. 擷取自 http://202.112.195.8/Down.asp
58. 全昌勤、何婷婷、姬東鴻與劉輝. (2005). 從搭配知識獲取最優種子的詞義消歧方法. 中文信息學報,第十九卷,第一期, (頁 30-37).
59. 朱邦復工作室. 中台港澳通用中文內碼之介紹 . 擷取自 http://www.cbflabs.com/tec/cbflabs/jason2k0914.htm
60. 車方翔、劉挺、秦兵與李生. (2003). 面向依存文法分析的搭配抽取方法研究. 哈爾濱工業大學信息檢索研究室論文集.
61. 知网. 擷取自 http://www.keenage.com/
62. 俞士汶、朱學峰、王惠與張芸芸. (1998). 現代漢語語法信息辭典. 清華大學出版社.
63. 倚天. 倚天中文系統技術手冊.
64. 梅家駒、竺一鳴、高蘊琦與殷鴻翔. (1993). 同義詞詞林. 上海辭書出版社.
65. 搜狗实验室(Sogou Labs). 擷取自 http://www.sogou.com/labs/
66. 維基百科. 擷取自 http://zh.wikipedia.org
67. 汤志祥. (2002). 汉语词汇的"借用"和"移用"及其深层社会意义.
68. 陈水仙. (2006). 港台地区词汇对普通话的影响. 广东外语外贸大学英语教育学院.
69. 陈钟、彭波、关宏飞與王继民. (2005). 一种词汇共现算法及共现词对检索系统排序的影响.
描述 碩士
國立政治大學
資訊科學學系
94971010
97
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0094971010
資料類型 thesis
dc.contributor.advisor 劉吉軒zh_TW
dc.contributor.advisor Liu,Jyi Shaneen_US
dc.contributor.author (Authors) 黃群弼zh_TW
dc.creator (作者) 黃群弼zh_TW
dc.date (日期) 2008en_US
dc.date.accessioned 19-Sep-2009 12:10:04 (UTC+8)-
dc.date.available 19-Sep-2009 12:10:04 (UTC+8)-
dc.date.issued (上傳時間) 19-Sep-2009 12:10:04 (UTC+8)-
dc.identifier (Other Identifiers) G0094971010en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/37106-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學學系zh_TW
dc.description (描述) 94971010zh_TW
dc.description (描述) 97zh_TW
dc.description.abstract (摘要) 中文繁簡在字體或電腦編碼上明顯不同之外,在部份詞彙的用法也有不同,這些用法不同的詞彙卻有相同意義的詞彙稱為繁簡體中的等義詞,這些等義詞在雙方文化交流時可能會造成一些障礙,例如人們互相對話、文件書籍翻譯或軟體系統等轉換時容易造成詞義上的誤解,目前均以人工方式來解決不同詞彙的問題,均會費時耗力且易疏漏,若能利用科學的方法讓電腦能自動辨識中文繁簡的等義詞,便能利用辨識出的等義詞給予提示,解決繁簡詞義不同所造成的誤解。
依照實驗設計架構,首先建立電腦類與一般類的繁簡體語料庫,作為辨識的基礎,並建立研究的架構與方法,分為二個階段三種方法,第一階段使用第一種方法,我們先使用N-gram辨識等義詞,評估單一方法是否能有效辨識出等義詞,第二階段使用第二種方法PMI-IR & LC-IR方法與第三種方法Context Vector,評估第二階段的方法是否能將等義詞的辨識能力提高。
根據本研究目的,讓電腦能自動在語料庫中自動辨識中文繁簡等義詞,所以提出了新的辨識架構,用N-gram初步辨識出等義詞,並經由PMI-IR & LC-IR與Context Vector方法提高Precision約0~20%不等。本研究結論是採用不同語言的語料庫,使用N-gram能夠辦識出等義詞,並搭配PMI-IR & LC-IR與Context Vector方法後,可以強化與提昇其等義詞辨識的能力,解決單一方法等義詞辨識能力不足之問題。
zh_TW
dc.description.abstract (摘要) Traditional Chinese and Simplied Chinese are not only different in the typeface and in the computer code, but also in the partial usage of vocabularies. These vocabularies which have different usage but have the same significance are called synonyms. These synonyms will cause some obstacles and misunderstanding in meaning when two parties have cultural exchange, such as during conversation, documents and books translation or softwares system transformation. What we do to solve the problem now is picked them out by manpower, but that will waste a lot of time and strength and easily make errors. If we can use scientific way to let the computer distinguish automatically the synonyms between Traditional Chinese and Simplied Chinese, we will be able to solve such misunderstanding by the hints of the distinguished synonyms.
According to the structure of experiment, to let the computer distinguish automatically the synonyms between Traditional Chinese and Simplied Chinese, we have to establish a Traditional Chinese and Simplied Chinese computer category and a general category first as the basis of identification. We should build up the research structure and the method, which divided into two stages and three methods. The first stage uses the first method to use N-gram to distinguish the synonyms and then review if this single method can identify the synonyms effectively. The second stage uses the second method PMI-IR & LC-IR and the third method Context Vector and review if the second stage can raise the synonyms’ ability of identification.
According to this research purpose, the computer to study on automatic exact recognition synonyms between traditional and simplified Chinese, so has proposed the new structure of distinguishing, N-gram automatic exact recognition synonym tentatively, and PMI-IR & LC-IR and Context Vector method can improve Precision about 0~20%. This conclusion is a corpus base of using different languages, using N-gram can be exact recognition synonyms, PMI-IR & LC-IR and Context Vector method, can improve single method ability.
en_US
dc.description.tableofcontents 第 一 章 緒 論 13
1.1 簡介 13
1.2 研究背景與動機 14
1.3 研究方法 16
1.4 本論文的貢獻 17
1.5 研究範圍與限制 18
1.6 論文架構 19
第 二 章 文獻探討 20
2.1 等義詞辨識的相關研究 20
2.1.1 即絕對等義詞和即相對等義詞 20
2.1.2 詞義辨識的演算法 22
2.1.3 中文詞義辨識技術 24
2.2 詞彙共現TERM CO-OCCURRENCE 26
2.3 N-GRAM(N連詞) 28
2.4 PMI-IR&LC-IR方法 30
2.4.1 PMI-IR(POINTWISE MUTUAL INFORMATION-INFORMATION RETRIEVAL) 30
2.4.2 LC-IR(LOCAL CONTEXT–INFORMATION RETRIEVAL) 32
2.5 CONTEXT VECTOR向量空間模型 33
2.6 小結 36
第 三 章 研究繁簡等義詞辨識方法 37
3.1 研究架構 37
3.2 建立語料庫模組 39
3.2.1 建立電腦類繁簡體語料庫 40
3.2.2 建立一般類繁簡體語料庫 42
3.2.3 建立正確詞組 44
3.2.4 建立雜訊資料 46
3.2.5 虛詞STOP WORD 47
3.2.6 中文的內碼 48
3.2.7 繁簡體編碼的轉換 50
3.3 文字斷詞處理 52
3.3.1 繁體斷詞的處理 53
3.3.2 簡體斷詞的處理 56
3.3.3 標點符號的處理 58
3.4 建立N-GRAM模組 59
3.5 建立PMI-IR&LC-IR模組 62
3.6 建立CONTEXT VECTOR模組 64
3.7 小結 65
第 四 章 實驗設計與分析 67
4.1 實驗語料庫來源 67
4.2 實驗設計 72
4.2.1 語料庫的斷詞: 72
4.2.2 N-GRAM將斷詞結果處理 75
4.2.3 篩選等義詞候選詞 81
4.2.4 PMI-IR&LC-IR處理二次篩選 83
4.2.5 CONTEXT VECTOR處理二次篩選 90
4.3 實驗評估方法 92
4.4 實驗分析 94
4.5 小結 106
第 五 章 結論和未來方向 108
5.1 研究結論 108
5.2 未來研究建議 109
5.3 未來研究方向 110
第 六 章 參考文獻 112
zh_TW
dc.format.extent 109225 bytes-
dc.format.extent 135156 bytes-
dc.format.extent 134939 bytes-
dc.format.extent 155430 bytes-
dc.format.extent 220231 bytes-
dc.format.extent 380862 bytes-
dc.format.extent 473214 bytes-
dc.format.extent 886296 bytes-
dc.format.extent 174851 bytes-
dc.format.extent 158195 bytes-
dc.format.extent 1065193 bytes-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0094971010en_US
dc.subject (關鍵詞) 中文繁簡對照zh_TW
dc.subject (關鍵詞) 等義詞zh_TW
dc.subject (關鍵詞) 自動辨識zh_TW
dc.title (題名) 中文繁簡等義詞自動辨識之研究zh_TW
dc.title (題名) A Study on Automatic Recognition on Exact Synonyms between Traditional and Simplified Chineseen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) 1. Amruta Purandare, & Ted Pedersen. (2004). Improving Word Sense Discrimination with Gloss Augmented Feature Vectors. Appears in the Proceedings of the Workshop on Lexical Resources for the Web and Word Sense Disambiguation. Puebla Mexico.zh_TW
dc.relation.reference (參考文獻) 2. Attar, R., & Fraenkel, A. S. (1977). Local Feedback in Full-Text Retrieval Systems. Journal of the ACM, Volume 24, Issue 3, (頁 397-417).zh_TW
dc.relation.reference (參考文獻) 3. Ben, Gabriel, & David. (2006). Dimensionality Reduction Aids Term Co-occurrence Based Multi-Document Summarization.zh_TW
dc.relation.reference (參考文獻) 4. Brown, & Peter. (1991). Word sense disambiguation using statistical methods. In ACL 29, (pp. 264-270).zh_TW
dc.relation.reference (參考文獻) 5. C. J. Van Rijsbergen. (1979). Information Retrieval. Butterworths, sec. edition., (pp 208).zh_TW
dc.relation.reference (參考文獻) 6. Chen, Jen-Nan, & Chang, Jason-S. (1998). TopSense: A Topical Sense Clustering Method based on Information Retrieval Techniques on Machine Readable Resources. Special Issue on Word Sense Disambiguation, Computational Linguistics, (pp. 61-95).zh_TW
dc.relation.reference (參考文獻) 7. Chen, Keh-Jiann, & You, Jia-Ming. (2002). A Study on Word Similarity using Context Vector Models.zh_TW
dc.relation.reference (參考文獻) 8. Chen, Keh-Jiann, & You, Jia-Ming. (2006). Improving Context Vector Models by Feature Clustering for Automatic Thesaurus Construction.”. Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing.zh_TW
dc.relation.reference (參考文獻) 9. David Hull. (1994). Improving Text Retrieval for the Routing Problem using Latent Semantic Indexing. ACM SIGIR Conference.zh_TW
dc.relation.reference (參考文獻) 10. David Yarowsky. (1994). Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics. Las Cruces, NM, (pp. 88-95).zh_TW
dc.relation.reference (參考文獻) 11. Daniel Jurafsky, & James H. Martin. (2000). Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. Prentice-Hall.zh_TW
dc.relation.reference (參考文獻) 12. Dan Klein, & Christopher D. Manning. (2003). Accurate Unlexicalized Parsing. Proceedings of the 41st Meeting of the Association for Computational Linguistics., (pp. 423-430).zh_TW
dc.relation.reference (參考文獻) 13. Derrick Higgins. (2004). Which statistics reflect semantics? Rethinking synonymy and word similarity.zh_TW
dc.relation.reference (參考文獻) 14. Dong, Zhen-dong, & Dong, Qiang. (2006). Hownet and the Computation of Meaning. World Scientific.zh_TW
dc.relation.reference (參考文獻) 15. Efron, B. (1979). Bootstrap methods: another look at the jackknife. Ann. Statist., (pp. 1-26).zh_TW
dc.relation.reference (參考文獻) 16. G. Salton & MJ McGill. (1983). Introduction to modern information retrieval. New York: McGraw-Hill.zh_TW
dc.relation.reference (參考文獻) 17. GAISWWW Query. 擷取自 http://gais.cs.ccu.edu.tw/zh_TW
dc.relation.reference (參考文獻) 18. Gale, William, Church, Kenneth, Yarowsky. (1992). A method of disambiguating word senses in a large corpus. Computers and the Humanties 26, (pp. 415-439).zh_TW
dc.relation.reference (參考文獻) 19. Google Offers Immediate Access to 3 Billion Web Documents. (2001). 擷取自 Google Inc: http://www.google.com/press/pressrel/3billion.htmlzh_TW
dc.relation.reference (參考文獻) 20. H. Edmund Stiles. (1961). The association factor in information retrieval. Journal of the ACM, 8, (pp. 271-279).zh_TW
dc.relation.reference (參考文獻) 21. Helen J. Peat, & Peter Willett . (1991). The Limitations of Term Co-occurrence Data for Query Expansion in Document Retrieval Systems.zh_TW
dc.relation.reference (參考文獻) 22. Howard D. White, Xia Lin, Jan W. Buzydlowski, & Chaomei Chen . (2001). Term Co-occurrence Analysis as an Interface for Digital Libraries.zh_TW
dc.relation.reference (參考文獻) 23. Jarmasz, M., & Szpakowicz. S. (2003). Roget’s thesaurus and semantic similarity. University of Ottawa ms.zh_TW
dc.relation.reference (參考文獻) 24. Joe A. Guthrie, Louise Guthrie, Yorick Wilks, & Homa Aidinejad. (1991). Subject-Dependent Co-occurrence and Word Sense Disambiguation. In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, (pp. 146-152).zh_TW
dc.relation.reference (參考文獻) 25. Le, Cuong-Anh, & Shimizu, Akira. (2004). High WSD Accuracy Using Naive Bayesian Classifier with Rich Features. PACLIC 18. Tokyo.zh_TW
dc.relation.reference (參考文獻) 26. Lesk, M. E. (1969). Word-word associations in document retrieval systems. American Documentation, 20, (pp. 27-38).zh_TW
dc.relation.reference (參考文獻) 27. Li, Xiaobin, Stan Szpakowicz, & Matwin. (1995). A WordNet-Based Algorithm for Word Semantic Sense Disambiguation. In Proceedings of the 14th International Joint Conference on Artificial Intelligence IJCAL-95,. Montreal, Canada.zh_TW
dc.relation.reference (參考文獻) 28. Lin, De-kang. (1997). Using Syntactic Dependency as Local Context to Resolve Word Sense Ambiguity. In Proceedings of ACL-97. Madrid, Spain.zh_TW
dc.relation.reference (參考文獻) 29. Lu, Wen-Hsiang, Lee, Hsi-Jian, & Chien, Lee-Feng. (2003). Term Translation Extraction Using Web Mining Techniques.zh_TW
dc.relation.reference (參考文獻) 30. Magnus Sahlgren. (2006). Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces.zh_TW
dc.relation.reference (參考文獻) 31. Manning, Christopher, Schutze, & Hinrich. (1999). Foundations of Statistical Natural Language Processing. MIT Press.zh_TW
dc.relation.reference (參考文獻) 32. Marco Baroni, & Sabrina Bisi. (2004). Using cooccurrence statistics and the web to discover synonyms in a technical language.zh_TW
dc.relation.reference (參考文獻) 33. Mar´ıa Ruiz-Casado, Enrique Alfonseca, & Pablo Castells. (2005). Using context-window overlapping in synonym discovery and ontology extension.zh_TW
dc.relation.reference (參考文獻) 34. M. E. Maron, & J. L. Kuhns. (1960). On relevance, probabilistic indexing and information retrieval. Journal of rhe ACM, 7, (pp. 216-244).zh_TW
dc.relation.reference (參考文獻) 35. Michael.W. Berry, Susan.T. Dumais, & Amy.T. Shippy. (1995). A Case Study of Latent Semantic Indexing. Tech Rep., (pp. 95-271).zh_TW
dc.relation.reference (參考文獻) 36. Michael Lesk . (1986). Automatic Sense Disambiguation: How to tell a pine cone from an ice cream cone. In Proceedings of the 1986 SIGDOC Conference, New York. Association for Computing Machinerypp. 24-26.zh_TW
dc.relation.reference (參考文獻) 37. Siddharth Patwardhan, Satanjeev Banerjee, & Ted Pedersen. (2005). SenseRelate::TargetWord - A Generalized Framework for Word Sense Disambiguation. Appears in the Proceedings of the Twentieth National Conference on Artificial Intelligence. Pittsburgh, PA.zh_TW
dc.relation.reference (參考文獻) 38. Peng, Fu-chun, Huang, Xiang-ji, Dale, Schuurmans,& Wang, Shao-jun. (2003). Text Classification in Asian Languages without Word Segmentation. Proceedings of the Sixth Internationa Workshop on Information Retrieval with Asian Languages (IRAL), Vol. 18, (pp. 41-48).zh_TW
dc.relation.reference (參考文獻) 39. Philip Edmonds & Graeme Hirst. (2002). Near-synonymy and lexical choice. Computational Linguistics,28(2), (pp. 105-144).zh_TW
dc.relation.reference (參考文獻) 40. Q.yuhen斷詞系統. 擷取自 http://www.rainsts.netzh_TW
dc.relation.reference (參考文獻) 41. Senseval-2. (2001). 擷取自 http://193.133.140.102/senseval2/zh_TW
dc.relation.reference (參考文獻) 42. Sketch Engine. 擷取自 http://www.sketchengine.co.uk/zh_TW
dc.relation.reference (參考文獻) 43. Slator, B. (1991). Using Context for Sense Preference. In Zernik (ed.) Lexical Acquisition: Exploiting on-line Resources to Build a Lexicon, Lawrence Erlbaum, Hillsdale.zh_TW
dc.relation.reference (參考文獻) 44. Soumen Chakrabarti, Martin van den Berg, & Byron Dom. (1999). Focused crawling: A new approach to Topic-Specific Web Resource Discovery. Proceedings of the WWW8 Conference.zh_TW
dc.relation.reference (參考文獻) 45. Stanford Parser. 擷取自 http://www-nlp.stanford.edu/downloads/lex-parser.shtmlzh_TW
dc.relation.reference (參考文獻) 46. Stevens, M. E., Giuliano, V. E., & Heilprin, L. B. (1965). Statistical association methods for mechanized documentation. Washington:National Bureau of Standards (Occasional Publication no. 269).zh_TW
dc.relation.reference (參考文獻) 47. Thomas K Landaauer, & Susan T. Dumais. (1997). A solution to Plato`s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104(2), (pp. 211–240).zh_TW
dc.relation.reference (參考文獻) 48. Turney, . (2001). Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning (ECML2001), (pp. 491-502). Freiburg, Germany.zh_TW
dc.relation.reference (參考文獻) 49. UngererF & Schmid. (1996). An Introduction to Cognitive Linguistics. London: Longman.zh_TW
dc.relation.reference (參考文獻) 50. Walker. (1987). Thesaurus-Based Disambiguation.zh_TW
dc.relation.reference (參考文獻) 51. Wang, Jenq-Haur, Teng, Jei-Wen, Cheng, Pu-Jen, Lu, Wen-Hsiang, & Chien, Lee-Feng (2004). Translating Unknown Cross-Lingual Queries in Digital Libraries Using a Web-based Approach.zh_TW
dc.relation.reference (參考文獻) 52. William C. Hannas. (1997). Asia`s Orthographic Dilemma. University of Hawaii Press.zh_TW
dc.relation.reference (參考文獻) 53. William, R. Caid, & Joel, L. Carleton. (2003). Context Vector-Based Text Retrieval. A Fair Isaac White Paper.zh_TW
dc.relation.reference (參考文獻) 54. Yang, Chang-hua, & Sue, Jin-Ker. (2002). Considerations of Linking WordNet with MRD. In Proceedings of the 19th International Conference on Computational Linguistics, (pp. 1121-1127).zh_TW
dc.relation.reference (參考文獻) 55. 中央研究院斷詞系統. 擷取自 http://rocling.iis.sinica.edu.tw/CKIP/wordsegment.htmzh_TW
dc.relation.reference (參考文獻) 56. 中国知网. 擷取自 http://www.cnki.net/index.htmzh_TW
dc.relation.reference (參考文獻) 57. 北京大學语言信息处理研究所. 擷取自 http://202.112.195.8/Down.aspzh_TW
dc.relation.reference (參考文獻) 58. 全昌勤、何婷婷、姬東鴻與劉輝. (2005). 從搭配知識獲取最優種子的詞義消歧方法. 中文信息學報,第十九卷,第一期, (頁 30-37).zh_TW
dc.relation.reference (參考文獻) 59. 朱邦復工作室. 中台港澳通用中文內碼之介紹 . 擷取自 http://www.cbflabs.com/tec/cbflabs/jason2k0914.htmzh_TW
dc.relation.reference (參考文獻) 60. 車方翔、劉挺、秦兵與李生. (2003). 面向依存文法分析的搭配抽取方法研究. 哈爾濱工業大學信息檢索研究室論文集.zh_TW
dc.relation.reference (參考文獻) 61. 知网. 擷取自 http://www.keenage.com/zh_TW
dc.relation.reference (參考文獻) 62. 俞士汶、朱學峰、王惠與張芸芸. (1998). 現代漢語語法信息辭典. 清華大學出版社.zh_TW
dc.relation.reference (參考文獻) 63. 倚天. 倚天中文系統技術手冊.zh_TW
dc.relation.reference (參考文獻) 64. 梅家駒、竺一鳴、高蘊琦與殷鴻翔. (1993). 同義詞詞林. 上海辭書出版社.zh_TW
dc.relation.reference (參考文獻) 65. 搜狗实验室(Sogou Labs). 擷取自 http://www.sogou.com/labs/zh_TW
dc.relation.reference (參考文獻) 66. 維基百科. 擷取自 http://zh.wikipedia.orgzh_TW
dc.relation.reference (參考文獻) 67. 汤志祥. (2002). 汉语词汇的"借用"和"移用"及其深层社会意义.zh_TW
dc.relation.reference (參考文獻) 68. 陈水仙. (2006). 港台地区词汇对普通话的影响. 广东外语外贸大学英语教育学院.zh_TW
dc.relation.reference (參考文獻) 69. 陈钟、彭波、关宏飞與王继民. (2005). 一种词汇共现算法及共现词对检索系统排序的影响.zh_TW