學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 從搜尋引擎查詢紀錄中學習Ontology
Ontology Learning from Query Logs of Search Engines
作者 陳茂富
貢獻者 沈錳坤
陳茂富
關鍵詞 搜尋引擎查詢紀錄
學習Ontology
Ontology learning
Query log
日期 2001
上傳時間 11-Sep-2009 16:02:50 (UTC+8)
摘要 Ontology可用來組織、管理與分享知識,Ontology Engineering是一種建構Ontology的過程,建構的過程中,多數的工作需要人費時費力地去完成,因此利用機器來輔助Ontology Engineering成了一門重要的課題。使用Knowledge Discovery的方法協助Ontology Engineering建構Ontology的過程,稱為Ontology Learning,本論文中提出的Ontology Learning方法為分析使用者在搜尋引擎下關鍵字查詢時的行為,加上利用與查詢關鍵字有關的網頁資訊,以輔助建構Ontology。本論文中的Ontology由使用者所查詢的關鍵字組成,我們要learning的,則是這些關鍵字彼此之間的關係,其中有上義詞、下義詞與同義詞等等,因此,自動尋找關鍵字彼此之間的關係以輔助建構Ontology,即為我們提出本論文的目的。除此之外,本論文亦實作了完整的Ontology Learning系統,從一開始使用者查詢記錄的蒐集,關鍵字擷取與分析,關鍵字之間的關係判定,直到最後Ontology的產生,都將由系統自動完成。
Ontology can be used to organize, manage and share knowledge. Ontology Engineering is the process of constructing Ontology. However, it’s usually a time-consuming and error-prone task. Thus, utilizing methods of Knowledge Discovery to help Ontology Engineering is called Ontology Learning. In this thesis, Ontology Learning process is done by using those pages related query terms and analyzing the querying behavior of users on search engines. The Ontology is organized by user query terms and relations among them. These relations we define are hyperonomy, hyponomy, synonymy and et al. Our goal of this thesis is to automatically learn the correct relations among these query terms. Besides, we implemented the complete system platform for Ontology Learning. The system can automatically collect logs, extract and analyze query keywords, and produce the final Ontology.
參考文獻 [1] Agrawal, R., Imielinski, T. & Swami, A. (1993). Mining Association Rules Between Sets of Items in Large Databases. Proc. of ACM SIGMOD Conference on Management of Data.
[2] Alfonseca, E. & Manandhar, S. (2002). Improving an Ontology Refinement Method with Hyponymy Patterns. Proc. of International Conference on Language Resources and Evaluation LREC’02.
[3] Beeferman, D. & Berger, A. (2000). Agglomerative Clustering of a Search Engine Query Log. Proc. of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[4] Berendt, B., Mobasher, B., Spiliopoulou, M. & Wiltshire, J. (2001). Measuring the Accuracy of Sessionizers for Web Usage Analysis. Proc. of Workshop on Web mining, SIAM Conference on Data Mining.
[5] Byrd, R. J. & Ravin, Y. (1999). Identifying and Extracting Relations in Text. Proc. of International Conference on Applications of Natural Language to Information Systems NLDB’99.
[6] Chen, Z., Fu, A.W.C. & Tong F.C.H (2003). Optimal Algorithms for Finding User Access Sessions from Very Large Web Logs. World Wide Web: Internet and Web Information Systems, 6(3).
[7] Chuang, S.L. & Chien, L.F. (2002). Towards Automatic Generation of Query Taxonomy: A Hierarchical Query Clustering Approach. Proc. of IEEE International Conference on Data Mining ICDM’02.
[8] Chuang, S.L. & Chien, L.F. (2003). Enriching Web Taxonomies Through Subject Categorization of Query Terms from Search Engine Logs. Decision Support Systems, 35 (1).
[9] Faure, D. & Nedellec, C. (1998). A Corpus-based Conceptual Clustering Method for Verb Frames and Ontology. Proc. of LREC Workshop on Adapting Lexical and Corpus Resources to Sublanguages and Applications.
[10] Faure, D. & Poibeau, T. (2000). First Experiments of Using Semantic Knowledge Learned by ASIUM for Information Extraction Task Using INTEX. Proc. of Workshop on Ontology Learning.
[11] Gomez-Perez, A. & Manzano-Macho, D. (2003). A Survey of Ontology Learning Methods and Techniques. Technical Report, Institute of Computer Science, Leopold Franzens University of Innsbruck.
[12] Hahn, U.& Klemens, S. (1998). Towards Text Knowledge Engineering. Proc. of Conference on Artificial Intelligence AI’98.
[13] Hahn, U. & Schulz, S. (2000). Towards Very Large Terminological Knowledge Bases: A Case Study from Medicine. Proc. of Canadian Conference on Artificial Intelligence AI’00.
[14] Hearst, M.A. (1992). Automatic Acquisition of Hyponyms from Large Text Corpora. Proc. of International Conference on Computational Linguistic.
[15] Huang, C.K., Chien, L.F. & Oyang, Y.J (2003). Relevant Term Suggestion in Interactive Web Search Based on Contextual Information in Query Session Logs. Journal of the American Society for Information Science and Technology, 54(7).
[16] Khan, L. & Luo, F. (2002). Ontology Construction for Information Selection. Proc. of IEEE International Conference on Tools with Artificial Intelligence ICTAI`02.
[17] Kietz, J.U., Maedche, A. & Volz, R. (2000). A Method of Semi-Automatic Ontology Acquisition from a Corporate Intranet. Proc. of EKAW’2000 Workshop on Ontologies and Texts.
[18] Lawrie, D. & Croft, W.B. (2000). Discovering and Comparing Topic Hierarchies. Proc. of RIAO 2000 Conference.
[19] Lonsdale, D., Ding, Y., Embley, D.W. & Melby, A. (2002). Peppering Knowledge Sources with SALT: Boosting Conceptual Content for Ontology Generation. Proc. of AAAI Workshop on Semantic Web Meets Language Resource.
[20] Maedche, A. & Staab, S. (2000). Discovering Conceptual Relations from Text. Proc. of European Conference on Artificial Intelligence ECAI’00.
[21] Maedche, A. & Staab, S. (2001). Ontology Learning for the Semantic Web. IEEE Intelligent Systems, 16(2).
[22] Maedche, A. & Steffen, S. (2003). Ontology Learning. Handbook on Ontologies in Information Systems, S. Staab & R. Studer (eds.). Springer.
[23] Morin, E. (1999). Automatic Acquisition of Semantic Relations Between Terms from Technical Corpora. Proc. of International Congress on Terminology and Knowledge Engineering TKE’99.
[24] Nobecourt, J. (2000). A Method to Build Formal Ontologies from Texts. Proc. of EKAW’2000 Workshop on Ontologies and Texts.
[25] Sanderson, M. & Croft, B. (1999). Deriving Concept Hierarchies from Text. Proc. of ACM International Conference on Research and Development in Information Retrieval SIGIR’99.
[26] Wagner, A (2000). Enriching a Lexical Semantic Net with Selectional Preferences by Means of Statistical Corpus Analysis. Proc. of Workshop on Ontology Learning OL’01.
[27] Wen, J.R., Nie, J.Y. & Zhang, H.J. (2001). Clustering User Queries of a Search Engine. Proc. of International on World Wide Web WWW’01.
[28] Wen, J.R., Nie, J.Y. & Zhang, H.J. (2002). Query Clustering Using User Logs. ACM Transactions on Information Systems, 20(1)
描述 碩士
國立政治大學
資訊科學學系
90753003
90
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0090753003
資料類型 thesis
dc.contributor.advisor 沈錳坤zh_TW
dc.contributor.author (Authors) 陳茂富zh_TW
dc.creator (作者) 陳茂富zh_TW
dc.date (日期) 2001en_US
dc.date.accessioned 11-Sep-2009 16:02:50 (UTC+8)-
dc.date.available 11-Sep-2009 16:02:50 (UTC+8)-
dc.date.issued (上傳時間) 11-Sep-2009 16:02:50 (UTC+8)-
dc.identifier (Other Identifiers) G0090753003en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/29677-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學學系zh_TW
dc.description (描述) 90753003zh_TW
dc.description (描述) 90zh_TW
dc.description.abstract (摘要) Ontology可用來組織、管理與分享知識,Ontology Engineering是一種建構Ontology的過程,建構的過程中,多數的工作需要人費時費力地去完成,因此利用機器來輔助Ontology Engineering成了一門重要的課題。使用Knowledge Discovery的方法協助Ontology Engineering建構Ontology的過程,稱為Ontology Learning,本論文中提出的Ontology Learning方法為分析使用者在搜尋引擎下關鍵字查詢時的行為,加上利用與查詢關鍵字有關的網頁資訊,以輔助建構Ontology。本論文中的Ontology由使用者所查詢的關鍵字組成,我們要learning的,則是這些關鍵字彼此之間的關係,其中有上義詞、下義詞與同義詞等等,因此,自動尋找關鍵字彼此之間的關係以輔助建構Ontology,即為我們提出本論文的目的。除此之外,本論文亦實作了完整的Ontology Learning系統,從一開始使用者查詢記錄的蒐集,關鍵字擷取與分析,關鍵字之間的關係判定,直到最後Ontology的產生,都將由系統自動完成。zh_TW
dc.description.abstract (摘要) Ontology can be used to organize, manage and share knowledge. Ontology Engineering is the process of constructing Ontology. However, it’s usually a time-consuming and error-prone task. Thus, utilizing methods of Knowledge Discovery to help Ontology Engineering is called Ontology Learning. In this thesis, Ontology Learning process is done by using those pages related query terms and analyzing the querying behavior of users on search engines. The Ontology is organized by user query terms and relations among them. These relations we define are hyperonomy, hyponomy, synonymy and et al. Our goal of this thesis is to automatically learn the correct relations among these query terms. Besides, we implemented the complete system platform for Ontology Learning. The system can automatically collect logs, extract and analyze query keywords, and produce the final Ontology.en_US
dc.description.tableofcontents 目錄
     第一章 1
     1.1 簡介與動機 1
     第二章 3
     2.1 ONTOLOGY定義 3
     2.2 ONTOLOGY ENGINEERING AND LEARNING 4
     2.3 相關研究 4
     2.3.1 Ontology Learning 4
     2.3.2 Query Log Clustering 5
     第三章 7
     3.1 WEB LOG PREPROCESSING與USER SESSION IDENTIFICATION 7
     3.2 QUERY SESSION IDENTIFICATION 11
     3.3 PHRASE EXTRACTION 13
     3.3.1 Phrase Identification 16
     3.3.2 Phrase Domain Identification 17
     3.4 PHRASE RELATION DISCOVERY 17
     3.4.1 Candidate Discovery of Phrase Relation 18
     3.4.2 Phrase Feature Extraction 24
     3.4.3 Final Relation Validation 26
     第四章 29
     4.1 LOG檔的選擇 29
     4.2 PROXY LOG介紹 29
     4.3 取得不同搜尋引擎QUERY LOG 32
     4.4 WEBPAGE REPOSITORY的建立與網頁資訊處理 32
     第五章 35
     5.1 實驗環境與資料來源 35
     5.2 實驗評估 36
     5.3 實驗數據 38
     5.4 實驗結果與分析 40
     第六章 42
     REFERENCE 44
     
     
     LIST OF TABLES
     表3.1: User Path. 9
     表3.2: Query Session範例. 14
     表3.3: Query Session Identification結果. 14
     表3.4: 查詢範例. 16
     表3.5: User Session查詢的可能資料分佈. 19
     表3.6: Query Type. 20
     表3.7: Relation Validation. 28
     表4.1: 搜尋引擎查詢URL. 32
     表4.2: 處理Yahoo Webpage資訊. 33
     表5.1: 實驗資料. 36
     表5.2: Learning Accuracy Score Table. 38
     表5.3: 實驗數據. 39
     表5.4: Query Session. 40
     表5.5: Result Relation. 40
     表5.6: Ambiguous And Wrong Relation. 41
     
     
     LIST OF FIGURES
     圖2.1: Hierarchical Query Clustering. 6
     圖3.1: Web Log範例. 7
     圖3.2: 系統流程. 8
     圖3.3: 查詢動作. 13
     圖3.4: Query Session Identification Algorithm. 15
     圖3.5: Phrase Domain Identification. 18
     圖3.6: single phrase→single phrase範例. 21
     圖3.7: single phrase→multi phrase範例. 22
     圖3.8 : multi phrase→single phrase範例. 22
     圖3.9: Candidate Relation Discovery. 26
     圖3.10: Final Relation Validation. 27
     圖3.11: Subsumption範例. 28
     圖4.1: 系統架構. 30
     圖4.2: Proxy Log範例. 31
     圖4.4: Yahoo搜尋的網頁. 34
     圖5.1: 實驗方法. 37
     圖5.2: Learning Accuracy測量. 39
zh_TW
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0090753003en_US
dc.subject (關鍵詞) 搜尋引擎查詢紀錄zh_TW
dc.subject (關鍵詞) 學習Ontologyzh_TW
dc.subject (關鍵詞) Ontology learningen_US
dc.subject (關鍵詞) Query logen_US
dc.title (題名) 從搜尋引擎查詢紀錄中學習Ontologyzh_TW
dc.title (題名) Ontology Learning from Query Logs of Search Enginesen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) [1] Agrawal, R., Imielinski, T. & Swami, A. (1993). Mining Association Rules Between Sets of Items in Large Databases. Proc. of ACM SIGMOD Conference on Management of Data.zh_TW
dc.relation.reference (參考文獻) [2] Alfonseca, E. & Manandhar, S. (2002). Improving an Ontology Refinement Method with Hyponymy Patterns. Proc. of International Conference on Language Resources and Evaluation LREC’02.zh_TW
dc.relation.reference (參考文獻) [3] Beeferman, D. & Berger, A. (2000). Agglomerative Clustering of a Search Engine Query Log. Proc. of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.zh_TW
dc.relation.reference (參考文獻) [4] Berendt, B., Mobasher, B., Spiliopoulou, M. & Wiltshire, J. (2001). Measuring the Accuracy of Sessionizers for Web Usage Analysis. Proc. of Workshop on Web mining, SIAM Conference on Data Mining.zh_TW
dc.relation.reference (參考文獻) [5] Byrd, R. J. & Ravin, Y. (1999). Identifying and Extracting Relations in Text. Proc. of International Conference on Applications of Natural Language to Information Systems NLDB’99.zh_TW
dc.relation.reference (參考文獻) [6] Chen, Z., Fu, A.W.C. & Tong F.C.H (2003). Optimal Algorithms for Finding User Access Sessions from Very Large Web Logs. World Wide Web: Internet and Web Information Systems, 6(3).zh_TW
dc.relation.reference (參考文獻) [7] Chuang, S.L. & Chien, L.F. (2002). Towards Automatic Generation of Query Taxonomy: A Hierarchical Query Clustering Approach. Proc. of IEEE International Conference on Data Mining ICDM’02.zh_TW
dc.relation.reference (參考文獻) [8] Chuang, S.L. & Chien, L.F. (2003). Enriching Web Taxonomies Through Subject Categorization of Query Terms from Search Engine Logs. Decision Support Systems, 35 (1).zh_TW
dc.relation.reference (參考文獻) [9] Faure, D. & Nedellec, C. (1998). A Corpus-based Conceptual Clustering Method for Verb Frames and Ontology. Proc. of LREC Workshop on Adapting Lexical and Corpus Resources to Sublanguages and Applications.zh_TW
dc.relation.reference (參考文獻) [10] Faure, D. & Poibeau, T. (2000). First Experiments of Using Semantic Knowledge Learned by ASIUM for Information Extraction Task Using INTEX. Proc. of Workshop on Ontology Learning.zh_TW
dc.relation.reference (參考文獻) [11] Gomez-Perez, A. & Manzano-Macho, D. (2003). A Survey of Ontology Learning Methods and Techniques. Technical Report, Institute of Computer Science, Leopold Franzens University of Innsbruck.zh_TW
dc.relation.reference (參考文獻) [12] Hahn, U.& Klemens, S. (1998). Towards Text Knowledge Engineering. Proc. of Conference on Artificial Intelligence AI’98.zh_TW
dc.relation.reference (參考文獻) [13] Hahn, U. & Schulz, S. (2000). Towards Very Large Terminological Knowledge Bases: A Case Study from Medicine. Proc. of Canadian Conference on Artificial Intelligence AI’00.zh_TW
dc.relation.reference (參考文獻) [14] Hearst, M.A. (1992). Automatic Acquisition of Hyponyms from Large Text Corpora. Proc. of International Conference on Computational Linguistic.zh_TW
dc.relation.reference (參考文獻) [15] Huang, C.K., Chien, L.F. & Oyang, Y.J (2003). Relevant Term Suggestion in Interactive Web Search Based on Contextual Information in Query Session Logs. Journal of the American Society for Information Science and Technology, 54(7).zh_TW
dc.relation.reference (參考文獻) [16] Khan, L. & Luo, F. (2002). Ontology Construction for Information Selection. Proc. of IEEE International Conference on Tools with Artificial Intelligence ICTAI`02.zh_TW
dc.relation.reference (參考文獻) [17] Kietz, J.U., Maedche, A. & Volz, R. (2000). A Method of Semi-Automatic Ontology Acquisition from a Corporate Intranet. Proc. of EKAW’2000 Workshop on Ontologies and Texts.zh_TW
dc.relation.reference (參考文獻) [18] Lawrie, D. & Croft, W.B. (2000). Discovering and Comparing Topic Hierarchies. Proc. of RIAO 2000 Conference.zh_TW
dc.relation.reference (參考文獻) [19] Lonsdale, D., Ding, Y., Embley, D.W. & Melby, A. (2002). Peppering Knowledge Sources with SALT: Boosting Conceptual Content for Ontology Generation. Proc. of AAAI Workshop on Semantic Web Meets Language Resource.zh_TW
dc.relation.reference (參考文獻) [20] Maedche, A. & Staab, S. (2000). Discovering Conceptual Relations from Text. Proc. of European Conference on Artificial Intelligence ECAI’00.zh_TW
dc.relation.reference (參考文獻) [21] Maedche, A. & Staab, S. (2001). Ontology Learning for the Semantic Web. IEEE Intelligent Systems, 16(2).zh_TW
dc.relation.reference (參考文獻) [22] Maedche, A. & Steffen, S. (2003). Ontology Learning. Handbook on Ontologies in Information Systems, S. Staab & R. Studer (eds.). Springer.zh_TW
dc.relation.reference (參考文獻) [23] Morin, E. (1999). Automatic Acquisition of Semantic Relations Between Terms from Technical Corpora. Proc. of International Congress on Terminology and Knowledge Engineering TKE’99.zh_TW
dc.relation.reference (參考文獻) [24] Nobecourt, J. (2000). A Method to Build Formal Ontologies from Texts. Proc. of EKAW’2000 Workshop on Ontologies and Texts.zh_TW
dc.relation.reference (參考文獻) [25] Sanderson, M. & Croft, B. (1999). Deriving Concept Hierarchies from Text. Proc. of ACM International Conference on Research and Development in Information Retrieval SIGIR’99.zh_TW
dc.relation.reference (參考文獻) [26] Wagner, A (2000). Enriching a Lexical Semantic Net with Selectional Preferences by Means of Statistical Corpus Analysis. Proc. of Workshop on Ontology Learning OL’01.zh_TW
dc.relation.reference (參考文獻) [27] Wen, J.R., Nie, J.Y. & Zhang, H.J. (2001). Clustering User Queries of a Search Engine. Proc. of International on World Wide Web WWW’01.zh_TW
dc.relation.reference (參考文獻) [28] Wen, J.R., Nie, J.Y. & Zhang, H.J. (2002). Query Clustering Using User Logs. ACM Transactions on Information Systems, 20(1)zh_TW