Publications-Theses

題名 利用WordNet建立證券領域的語意結構
作者 游舒帆
Yu,Shu Fan
貢獻者 劉文卿
Liou,Wen Qing
游舒帆
Yu,Shu Fan
關鍵詞 相似性
語意距離
Similarity
Semantic distance
WordNet
日期 2004
上傳時間 18-Sep-2009 14:36:22 (UTC+8)
摘要 本研究主要在探討普林斯頓大學所開發出來的WordNet線上辭典是否適合用在語意結構(Semantic Structure)的表達上,在整個研究中,我們會先將重點放在WordNet架構的討論,接著研究關於WordNet在建立語意結構上的文獻,以在研究前先取得過去研究的狀況,並針對缺點提出改進方案,最後則進行模式的驗證與修改,期望能得出一個較具代表性且完整的WordNet語意結構。
本研究採用Jarmasz, Szpakowicz(2001)的語意距離計算模式併Resnik(1995)的相似度(similarity)計算模式,透過這兩個模式來計算出詞彙的距離,並以此距離來辨別語意的關係,最後透過117道證券考題來實證這個架構的正確性與完整性,並針對不足之處作補強修改,以達到較佳的結果。
本研究的主要限制為下列幾項:
一、無法全盤的將證券業的所有的詞彙及其關係一次含括進來
二、測試的題目無法完整代表所有的問題可能性
三、由於最後結果並非實際架構與修改WordNet系統,僅僅是採用相似度
計算演算法算出結果,因此與實際機上測試難免會有所差距。
四、並沒有針對WordNet中所有的關係都做定義,僅只挑選較具代表性的
幾個詞彙關係做定義,在細部上可能會有所影響。
This paper is mainly focusing on does the Princeton WordNet fit the Semantic Structure. In this research, we’ll discuss the structure of WordNet, then the reference of WordNet in Semantic Structure. Before we get start, we may collect all the passed data, and study the data more detail. Then we can know the situation and result of passed reseach, so we can modify the model of pass. Finally, we hope we can get a more completed WordNet semantic structure.
This paper uses the Jarmasz, Szpakowicz’s (2001) semantic distance and Resnik’s Similarity calculative model. Through
this two models to calculating the distance between two words, and calculating the similarity.
We collect 117 stock exam questions to verify the correctiveness and the completeness of this structure. And to complement the weakness, so we can have a more strong result.
This research has three constraints:
1.We can’t collect all words of stock domain
2.The 117 questions can’t explain all probability of query
3.We just run an algorithm to calculate the similarity, not
real testing on WordNet system, so it may be some bias.
4.Only identifying some chief words relationship, so it can not cover whole relations.
參考文獻 中文參考文獻
[1]黃居仁、張如瑩、蔡柏生。「資訊與社會叢書系列之三:語言文學與資訊科技─語意網時代的網路華語教學:兼介中英雙語知識本體與領域檢索介面」, 民國93年, 頁443-467。
[2]美國資訊科學學會臺北分會, “索引典理論與實務”,民國83年,頁8。
[3]陳攸華,「圖書資訊學研究」,文華出版社,民國84年,頁34-35。
[4]黃慕萱,「資訊檢索」,台灣學生書局,民國85年,頁209。
[5]蔡明月,「線上資訊檢索:理論與應用」,初版,台灣學生書局,民國80年,頁177。
[6]呂江麟,「組織記憶─概念性語意資訊檢索」”,國立臺灣大學資訊管理研究所碩士論文,民國91年。
[7]黃惠株,“淺談索引典”,佛教圖書館館訊 第五期,民國85年,頁2。
[8]陳光華,“資訊檢索查詢之自然語言處理”,中國圖書館學會會報,第57期,民國85年,頁141-153。.
[9]陳光華、莊雅蓁,“資訊檢索之中文詞彙擴展”,資訊傳播與圖書館學,第八卷第一期,民國90年,頁59-75。
[10]陳光華、莊雅蓁,“應用於資訊檢索的中文同義詞之建構”,中國圖書館學會會報,第67期,民國90年,頁93-108。
英文參考文獻
[1]Alan F. Smeaton, & Ian Quigley, “Experiments on Using Semantic Distances Between Words in Image Caption Retrieval”, Proceedings of the 19th International Conference on Research and Development in Information Retrieval, 1996, pp.176-180.
[2]Julio Gonzalo, & Felisa Verdejo, & Irina Chugur, &Juan Cigarran, “Indexing with WordNet synsets can improve text retrieval”, Proceedings of the COLING/ACL`98 Workshop on Usage of WordNet for NLP, 1998.
[3]Mario JARMASZ, & Stan SZPAKOWICZ, “Roget’s Thesaurus and Semantic Similarity”, Proceedings of the International Conference on Recent Advances in Natural Language Processin, 2003, pp.212-219.
[4]Rada Mihalcea, & Dan Moldovan, “Semantic Indexing using WordNet Senses”, Proceedings of ACL Workshop on IR & NLP, 2000.
[5]Rila Mandala, & Tokunaga Takenobu, & Tanaka Hozumi, “The Use of WordNet in Information Retrieval”, Proceedings of the COLING/ACL Workshop on Usage of WordNet in Natural Language Processing System, 1998, pp.31-37.
[6]V´aclav Sn´aˇsel, & Pavel Moravec, & Jaroslav Pokorn´y, “WordNet Ontology Based Model for Web Retrieval”, Proceedings of International Workshop on Challenges in Web Information Retrieval and Integration, IEEE Computer Society Press, 2005, pp.231-236.
[7]Ian Niles, & Adam Pease, “Towards a Standard Upper Ontology”, Proceedings of the 2nd International Conference on Formal Ontology in Information Systems, 2001, pp.2-9.
[8]Chu-Ren Huang, & Xiang-Bing Li, & Jia-Fei Hong,” Domain Lexico-Taxonomy: An Approach Towards Multi-domain Language Processing”, Proceedings of the Asian Symposium on Natural Language Processing to Overcome Language Barriers, 2004, pp.52-60.
[9]Nicholas J. Belkin, & W. Bruce Croft, “Information Filtering and Information Retrieval─Two Side of the same coin”, Communications of the ACM, 35(2), 1992, pp.29-38.
[10]Karen Spark Jones, & Peter Willett,「Readings in Information Retrieval」, 1997, pp.1-25
[11]Adorno, & Marco, & Bolin etc., “Critical Review of Essay #2:“Readings in Information Retrieval”, 1997.
[12]Kuang-hua Chen, & Chien-tin Wu, “Automatically Controlled- Vocabulary Indexing for Text Retrieval”, Proceedings of the 12 Research on Computational Linguistics Conference, (ROCLING99), 1986, pp.171-185.
[13]Rakesh Gupta, & Mykel J.Kochenderfer, “Using Statistical Techniques and WordNet to Reason with Noisy Data”, Workshop on Adaptive Text Extraction and Mining, Nineteenth National Conference on Artificial Intelligence (AAAI-04), 2004.
[14]Tefko Saracevic, & Paul Kantor, & Alice Y. Chamis`, & Donna Trivison, “A Study of Information Seeking and Retrieving”, JASIS, (39), 1998, pp.161-216.
[15]M.E.IVMBON, & J.L.KUHNS, “On Relevance, ProbabiUstic Indexing and Information Retrieval”, Journal of the ACM 7(3), 1960, pp.216-244.
[16]Tomek Strzalkowski, “Robust Text Processing in Automated Information Retrieval”, Proceedings of the 4 Conference on Applied Natural Language Processing in Stuttgart. ACL, 1994, pp.168-173.
[17]Dmitri Asonov, & Johann-Christoph Freytag, “Repudiative Information Retrieval”, Pre- and Postproceedings of ACM Workshop on Privacy in the Electronic Society (WPES2002), 2002, pp32-40.
[18]Peter Ingwersen, ”Information Retrieval Interaction”, 1992, pp.49-60
參考網站
[1]中央研究院中英雙語知識本體詞網, http://bow.sinica.edu.tw/
[2]DJ小百科, http://www.moneydj.com/z/glossary/gl_homeA.asp?a=$^$glossary$glcat[18]DJHTM
[3]Suggested Upper Merged Ontology, http://ontology.teknowledge.com/
[4]台灣證券交易所證券辭典, http://www.tse.com.tw/ch/dict.php
[5]Yahoo股市常用術語, http://geocities.yahoo.com.br/itapema_br/stk/Books/StockLanguage.htm
[6]聲達資訊股市術語, http://www.sound.com.tw/page.asp?sp=2&url=research/terminology.asp
[7]Yahoo股市名詞解釋, http://tw.money.yahoo.com/faqterm/stock_term_0.html
[8]Quote123闊網─股市辭典http://www.quote123.com/usmkt/edu/glossary/glossary.asp
[9]富林投資─金融小辭典, http://www.fuland.com.tw/flh04.htm
[10]聯合新聞網─理財百科, http://udn.com/UDN_STOCK/GLOSSARY/P/Pindex.htm
[11]股市用語, http://www.888money.com.tw/Analysis/k01.htm
[12]PIIS股市用語, http://www.piis.com.tw/piis/stockname/c.htm
[13]德信證券理財百科, http://www.rsc.com.tw/money/money_2.html
[14]高點考古題天地, http://www.get.com.tw/getroot/exam/stock/
描述 碩士
國立政治大學
資訊管理研究所
92356016
93
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0923560161
資料類型 thesis
dc.contributor.advisor 劉文卿zh_TW
dc.contributor.advisor Liou,Wen Qingen_US
dc.contributor.author (Authors) 游舒帆zh_TW
dc.contributor.author (Authors) Yu,Shu Fanen_US
dc.creator (作者) 游舒帆zh_TW
dc.creator (作者) Yu,Shu Fanen_US
dc.date (日期) 2004en_US
dc.date.accessioned 18-Sep-2009 14:36:22 (UTC+8)-
dc.date.available 18-Sep-2009 14:36:22 (UTC+8)-
dc.date.issued (上傳時間) 18-Sep-2009 14:36:22 (UTC+8)-
dc.identifier (Other Identifiers) G0923560161en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/35272-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊管理研究所zh_TW
dc.description (描述) 92356016zh_TW
dc.description (描述) 93zh_TW
dc.description.abstract (摘要) 本研究主要在探討普林斯頓大學所開發出來的WordNet線上辭典是否適合用在語意結構(Semantic Structure)的表達上,在整個研究中,我們會先將重點放在WordNet架構的討論,接著研究關於WordNet在建立語意結構上的文獻,以在研究前先取得過去研究的狀況,並針對缺點提出改進方案,最後則進行模式的驗證與修改,期望能得出一個較具代表性且完整的WordNet語意結構。
本研究採用Jarmasz, Szpakowicz(2001)的語意距離計算模式併Resnik(1995)的相似度(similarity)計算模式,透過這兩個模式來計算出詞彙的距離,並以此距離來辨別語意的關係,最後透過117道證券考題來實證這個架構的正確性與完整性,並針對不足之處作補強修改,以達到較佳的結果。
本研究的主要限制為下列幾項:
一、無法全盤的將證券業的所有的詞彙及其關係一次含括進來
二、測試的題目無法完整代表所有的問題可能性
三、由於最後結果並非實際架構與修改WordNet系統,僅僅是採用相似度
計算演算法算出結果,因此與實際機上測試難免會有所差距。
四、並沒有針對WordNet中所有的關係都做定義,僅只挑選較具代表性的
幾個詞彙關係做定義,在細部上可能會有所影響。
zh_TW
dc.description.abstract (摘要) This paper is mainly focusing on does the Princeton WordNet fit the Semantic Structure. In this research, we’ll discuss the structure of WordNet, then the reference of WordNet in Semantic Structure. Before we get start, we may collect all the passed data, and study the data more detail. Then we can know the situation and result of passed reseach, so we can modify the model of pass. Finally, we hope we can get a more completed WordNet semantic structure.
This paper uses the Jarmasz, Szpakowicz’s (2001) semantic distance and Resnik’s Similarity calculative model. Through
this two models to calculating the distance between two words, and calculating the similarity.
We collect 117 stock exam questions to verify the correctiveness and the completeness of this structure. And to complement the weakness, so we can have a more strong result.
This research has three constraints:
1.We can’t collect all words of stock domain
2.The 117 questions can’t explain all probability of query
3.We just run an algorithm to calculate the similarity, not
real testing on WordNet system, so it may be some bias.
4.Only identifying some chief words relationship, so it can not cover whole relations.
en_US
dc.description.tableofcontents 第一章 緒 論 ---------------------------------1
1.1 研究背景與動機 ---------------------------------1
1.2 研究目的 ---------------------------------2
1.3 問題描述 ---------------------------------2
1.4 論文架構 ---------------------------------2
第二章 文獻探討 ---------------------------------4
2.1 本體論(Ontology) ------------------------4
2.1.1 SUMO (Suggested Upper Merged Ontology) ------5
2.1.2 WordNet ---------------------------------5
2.1.3 索引典(Thesaurus) ------------------------6
2.2 WordNet架構 ---------------------------------7
2.2.1 動詞(Verb) ---------------------------------7
2.2.2 名詞(Noun) ---------------------------------9
2.2.3 形容詞(Adjective) -----------------------10
2.3 WordNet在驗證語意結構上的應用 --------------11
2.4 語意距離(semantic distance)與相似性(similarity)13
2.5 WordNet在計算相似性上的應用 --------------16
第三章 研究架構與方法 -----------------------18
3.1 研究架構 --------------------------------18
3.2 詞彙篩選 --------------------------------18
3.3 詞彙關係定義 --------------------------------21
3.4 計算語意距離與相似度 -----------------------22
3.5 收集證券業考題 --------------------------------24
3.6 架構驗證─反覆的驗證方法 --------------24
第四章 架構驗證 --------------------------------28
4.1 初始架構 --------------------------------28
4.2 新增詞彙庫 --------------------------------34
4.3 其他問題 --------------------------------36
4.3.1 第一個問題─關鍵字過多 --------------37
4.3.2 第二個問題─口語描述文字無法定義 -----39
4.3.3 第三個問題─邏輯判斷問題 --------------39
4.3.4 第四個問題─跨越詞性的定義 --------------40
4.3.5 最後一個問題─詞彙屬性定義不清 --------------41
第五章 結論、預期貢獻與未來研究方向 --------------44
5.1 結論 -----------------------------------------44
5.2 研究限制 --------------------------------45
5.3 預期貢獻 --------------------------------46
5.4 未來研究方向 --------------------------------46
5.5 結語 -----------------------------------------47
參考文獻 -----------------------------------------48
zh_TW
dc.format.extent 46062 bytes-
dc.format.extent 71781 bytes-
dc.format.extent 87890 bytes-
dc.format.extent 108662 bytes-
dc.format.extent 219454 bytes-
dc.format.extent 199908 bytes-
dc.format.extent 317123 bytes-
dc.format.extent 157235 bytes-
dc.format.extent 181180 bytes-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0923560161en_US
dc.subject (關鍵詞) 相似性zh_TW
dc.subject (關鍵詞) 語意距離zh_TW
dc.subject (關鍵詞) Similarityen_US
dc.subject (關鍵詞) Semantic distanceen_US
dc.subject (關鍵詞) WordNeten_US
dc.title (題名) 利用WordNet建立證券領域的語意結構zh_TW
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) 中文參考文獻zh_TW
dc.relation.reference (參考文獻) [1]黃居仁、張如瑩、蔡柏生。「資訊與社會叢書系列之三:語言文學與資訊科技─語意網時代的網路華語教學:兼介中英雙語知識本體與領域檢索介面」, 民國93年, 頁443-467。zh_TW
dc.relation.reference (參考文獻) [2]美國資訊科學學會臺北分會, “索引典理論與實務”,民國83年,頁8。zh_TW
dc.relation.reference (參考文獻) [3]陳攸華,「圖書資訊學研究」,文華出版社,民國84年,頁34-35。zh_TW
dc.relation.reference (參考文獻) [4]黃慕萱,「資訊檢索」,台灣學生書局,民國85年,頁209。zh_TW
dc.relation.reference (參考文獻) [5]蔡明月,「線上資訊檢索:理論與應用」,初版,台灣學生書局,民國80年,頁177。zh_TW
dc.relation.reference (參考文獻) [6]呂江麟,「組織記憶─概念性語意資訊檢索」”,國立臺灣大學資訊管理研究所碩士論文,民國91年。zh_TW
dc.relation.reference (參考文獻) [7]黃惠株,“淺談索引典”,佛教圖書館館訊 第五期,民國85年,頁2。zh_TW
dc.relation.reference (參考文獻) [8]陳光華,“資訊檢索查詢之自然語言處理”,中國圖書館學會會報,第57期,民國85年,頁141-153。.zh_TW
dc.relation.reference (參考文獻) [9]陳光華、莊雅蓁,“資訊檢索之中文詞彙擴展”,資訊傳播與圖書館學,第八卷第一期,民國90年,頁59-75。zh_TW
dc.relation.reference (參考文獻) [10]陳光華、莊雅蓁,“應用於資訊檢索的中文同義詞之建構”,中國圖書館學會會報,第67期,民國90年,頁93-108。zh_TW
dc.relation.reference (參考文獻) 英文參考文獻zh_TW
dc.relation.reference (參考文獻) [1]Alan F. Smeaton, & Ian Quigley, “Experiments on Using Semantic Distances Between Words in Image Caption Retrieval”, Proceedings of the 19th International Conference on Research and Development in Information Retrieval, 1996, pp.176-180.zh_TW
dc.relation.reference (參考文獻) [2]Julio Gonzalo, & Felisa Verdejo, & Irina Chugur, &Juan Cigarran, “Indexing with WordNet synsets can improve text retrieval”, Proceedings of the COLING/ACL`98 Workshop on Usage of WordNet for NLP, 1998.zh_TW
dc.relation.reference (參考文獻) [3]Mario JARMASZ, & Stan SZPAKOWICZ, “Roget’s Thesaurus and Semantic Similarity”, Proceedings of the International Conference on Recent Advances in Natural Language Processin, 2003, pp.212-219.zh_TW
dc.relation.reference (參考文獻) [4]Rada Mihalcea, & Dan Moldovan, “Semantic Indexing using WordNet Senses”, Proceedings of ACL Workshop on IR & NLP, 2000.zh_TW
dc.relation.reference (參考文獻) [5]Rila Mandala, & Tokunaga Takenobu, & Tanaka Hozumi, “The Use of WordNet in Information Retrieval”, Proceedings of the COLING/ACL Workshop on Usage of WordNet in Natural Language Processing System, 1998, pp.31-37.zh_TW
dc.relation.reference (參考文獻) [6]V´aclav Sn´aˇsel, & Pavel Moravec, & Jaroslav Pokorn´y, “WordNet Ontology Based Model for Web Retrieval”, Proceedings of International Workshop on Challenges in Web Information Retrieval and Integration, IEEE Computer Society Press, 2005, pp.231-236.zh_TW
dc.relation.reference (參考文獻) [7]Ian Niles, & Adam Pease, “Towards a Standard Upper Ontology”, Proceedings of the 2nd International Conference on Formal Ontology in Information Systems, 2001, pp.2-9.zh_TW
dc.relation.reference (參考文獻) [8]Chu-Ren Huang, & Xiang-Bing Li, & Jia-Fei Hong,” Domain Lexico-Taxonomy: An Approach Towards Multi-domain Language Processing”, Proceedings of the Asian Symposium on Natural Language Processing to Overcome Language Barriers, 2004, pp.52-60.zh_TW
dc.relation.reference (參考文獻) [9]Nicholas J. Belkin, & W. Bruce Croft, “Information Filtering and Information Retrieval─Two Side of the same coin”, Communications of the ACM, 35(2), 1992, pp.29-38.zh_TW
dc.relation.reference (參考文獻) [10]Karen Spark Jones, & Peter Willett,「Readings in Information Retrieval」, 1997, pp.1-25zh_TW
dc.relation.reference (參考文獻) [11]Adorno, & Marco, & Bolin etc., “Critical Review of Essay #2:“Readings in Information Retrieval”, 1997.zh_TW
dc.relation.reference (參考文獻) [12]Kuang-hua Chen, & Chien-tin Wu, “Automatically Controlled- Vocabulary Indexing for Text Retrieval”, Proceedings of the 12 Research on Computational Linguistics Conference, (ROCLING99), 1986, pp.171-185.zh_TW
dc.relation.reference (參考文獻) [13]Rakesh Gupta, & Mykel J.Kochenderfer, “Using Statistical Techniques and WordNet to Reason with Noisy Data”, Workshop on Adaptive Text Extraction and Mining, Nineteenth National Conference on Artificial Intelligence (AAAI-04), 2004.zh_TW
dc.relation.reference (參考文獻) [14]Tefko Saracevic, & Paul Kantor, & Alice Y. Chamis`, & Donna Trivison, “A Study of Information Seeking and Retrieving”, JASIS, (39), 1998, pp.161-216.zh_TW
dc.relation.reference (參考文獻) [15]M.E.IVMBON, & J.L.KUHNS, “On Relevance, ProbabiUstic Indexing and Information Retrieval”, Journal of the ACM 7(3), 1960, pp.216-244.zh_TW
dc.relation.reference (參考文獻) [16]Tomek Strzalkowski, “Robust Text Processing in Automated Information Retrieval”, Proceedings of the 4 Conference on Applied Natural Language Processing in Stuttgart. ACL, 1994, pp.168-173.zh_TW
dc.relation.reference (參考文獻) [17]Dmitri Asonov, & Johann-Christoph Freytag, “Repudiative Information Retrieval”, Pre- and Postproceedings of ACM Workshop on Privacy in the Electronic Society (WPES2002), 2002, pp32-40.zh_TW
dc.relation.reference (參考文獻) [18]Peter Ingwersen, ”Information Retrieval Interaction”, 1992, pp.49-60zh_TW
dc.relation.reference (參考文獻) 參考網站zh_TW
dc.relation.reference (參考文獻) [1]中央研究院中英雙語知識本體詞網, http://bow.sinica.edu.tw/zh_TW
dc.relation.reference (參考文獻) [2]DJ小百科, http://www.moneydj.com/z/glossary/gl_homeA.asp?a=$^$glossary$glcat[18]DJHTMzh_TW
dc.relation.reference (參考文獻) [3]Suggested Upper Merged Ontology, http://ontology.teknowledge.com/zh_TW
dc.relation.reference (參考文獻) [4]台灣證券交易所證券辭典, http://www.tse.com.tw/ch/dict.phpzh_TW
dc.relation.reference (參考文獻) [5]Yahoo股市常用術語, http://geocities.yahoo.com.br/itapema_br/stk/Books/StockLanguage.htmzh_TW
dc.relation.reference (參考文獻) [6]聲達資訊股市術語, http://www.sound.com.tw/page.asp?sp=2&url=research/terminology.aspzh_TW
dc.relation.reference (參考文獻) [7]Yahoo股市名詞解釋, http://tw.money.yahoo.com/faqterm/stock_term_0.htmlzh_TW
dc.relation.reference (參考文獻) [8]Quote123闊網─股市辭典http://www.quote123.com/usmkt/edu/glossary/glossary.aspzh_TW
dc.relation.reference (參考文獻) [9]富林投資─金融小辭典, http://www.fuland.com.tw/flh04.htmzh_TW
dc.relation.reference (參考文獻) [10]聯合新聞網─理財百科, http://udn.com/UDN_STOCK/GLOSSARY/P/Pindex.htmzh_TW
dc.relation.reference (參考文獻) [11]股市用語, http://www.888money.com.tw/Analysis/k01.htmzh_TW
dc.relation.reference (參考文獻) [12]PIIS股市用語, http://www.piis.com.tw/piis/stockname/c.htmzh_TW
dc.relation.reference (參考文獻) [13]德信證券理財百科, http://www.rsc.com.tw/money/money_2.htmlzh_TW
dc.relation.reference (參考文獻) [14]高點考古題天地, http://www.get.com.tw/getroot/exam/stock/zh_TW