Publications-Theses
Article View/Open
Publication Export
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
Title | 從搜尋引擎查詢紀錄中學習Ontology Ontology Learning from Query Logs of Search Engines |
Creator | 陳茂富 |
Contributor | 沈錳坤 陳茂富 |
Key Words | 搜尋引擎查詢紀錄 學習Ontology Ontology learning Query log |
Date | 2001 |
Date Issued | 11-Sep-2009 16:02:50 (UTC+8) |
Summary | Ontology可用來組織、管理與分享知識,Ontology Engineering是一種建構Ontology的過程,建構的過程中,多數的工作需要人費時費力地去完成,因此利用機器來輔助Ontology Engineering成了一門重要的課題。使用Knowledge Discovery的方法協助Ontology Engineering建構Ontology的過程,稱為Ontology Learning,本論文中提出的Ontology Learning方法為分析使用者在搜尋引擎下關鍵字查詢時的行為,加上利用與查詢關鍵字有關的網頁資訊,以輔助建構Ontology。本論文中的Ontology由使用者所查詢的關鍵字組成,我們要learning的,則是這些關鍵字彼此之間的關係,其中有上義詞、下義詞與同義詞等等,因此,自動尋找關鍵字彼此之間的關係以輔助建構Ontology,即為我們提出本論文的目的。除此之外,本論文亦實作了完整的Ontology Learning系統,從一開始使用者查詢記錄的蒐集,關鍵字擷取與分析,關鍵字之間的關係判定,直到最後Ontology的產生,都將由系統自動完成。 Ontology can be used to organize, manage and share knowledge. Ontology Engineering is the process of constructing Ontology. However, it’s usually a time-consuming and error-prone task. Thus, utilizing methods of Knowledge Discovery to help Ontology Engineering is called Ontology Learning. In this thesis, Ontology Learning process is done by using those pages related query terms and analyzing the querying behavior of users on search engines. The Ontology is organized by user query terms and relations among them. These relations we define are hyperonomy, hyponomy, synonymy and et al. Our goal of this thesis is to automatically learn the correct relations among these query terms. Besides, we implemented the complete system platform for Ontology Learning. The system can automatically collect logs, extract and analyze query keywords, and produce the final Ontology. |
參考文獻 | [1] Agrawal, R., Imielinski, T. & Swami, A. (1993). Mining Association Rules Between Sets of Items in Large Databases. Proc. of ACM SIGMOD Conference on Management of Data. [2] Alfonseca, E. & Manandhar, S. (2002). Improving an Ontology Refinement Method with Hyponymy Patterns. Proc. of International Conference on Language Resources and Evaluation LREC’02. [3] Beeferman, D. & Berger, A. (2000). Agglomerative Clustering of a Search Engine Query Log. Proc. of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. [4] Berendt, B., Mobasher, B., Spiliopoulou, M. & Wiltshire, J. (2001). Measuring the Accuracy of Sessionizers for Web Usage Analysis. Proc. of Workshop on Web mining, SIAM Conference on Data Mining. [5] Byrd, R. J. & Ravin, Y. (1999). Identifying and Extracting Relations in Text. Proc. of International Conference on Applications of Natural Language to Information Systems NLDB’99. [6] Chen, Z., Fu, A.W.C. & Tong F.C.H (2003). Optimal Algorithms for Finding User Access Sessions from Very Large Web Logs. World Wide Web: Internet and Web Information Systems, 6(3). [7] Chuang, S.L. & Chien, L.F. (2002). Towards Automatic Generation of Query Taxonomy: A Hierarchical Query Clustering Approach. Proc. of IEEE International Conference on Data Mining ICDM’02. [8] Chuang, S.L. & Chien, L.F. (2003). Enriching Web Taxonomies Through Subject Categorization of Query Terms from Search Engine Logs. Decision Support Systems, 35 (1). [9] Faure, D. & Nedellec, C. (1998). A Corpus-based Conceptual Clustering Method for Verb Frames and Ontology. Proc. of LREC Workshop on Adapting Lexical and Corpus Resources to Sublanguages and Applications. [10] Faure, D. & Poibeau, T. (2000). First Experiments of Using Semantic Knowledge Learned by ASIUM for Information Extraction Task Using INTEX. Proc. of Workshop on Ontology Learning. [11] Gomez-Perez, A. & Manzano-Macho, D. (2003). A Survey of Ontology Learning Methods and Techniques. Technical Report, Institute of Computer Science, Leopold Franzens University of Innsbruck. [12] Hahn, U.& Klemens, S. (1998). Towards Text Knowledge Engineering. Proc. of Conference on Artificial Intelligence AI’98. [13] Hahn, U. & Schulz, S. (2000). Towards Very Large Terminological Knowledge Bases: A Case Study from Medicine. Proc. of Canadian Conference on Artificial Intelligence AI’00. [14] Hearst, M.A. (1992). Automatic Acquisition of Hyponyms from Large Text Corpora. Proc. of International Conference on Computational Linguistic. [15] Huang, C.K., Chien, L.F. & Oyang, Y.J (2003). Relevant Term Suggestion in Interactive Web Search Based on Contextual Information in Query Session Logs. Journal of the American Society for Information Science and Technology, 54(7). [16] Khan, L. & Luo, F. (2002). Ontology Construction for Information Selection. Proc. of IEEE International Conference on Tools with Artificial Intelligence ICTAI`02. [17] Kietz, J.U., Maedche, A. & Volz, R. (2000). A Method of Semi-Automatic Ontology Acquisition from a Corporate Intranet. Proc. of EKAW’2000 Workshop on Ontologies and Texts. [18] Lawrie, D. & Croft, W.B. (2000). Discovering and Comparing Topic Hierarchies. Proc. of RIAO 2000 Conference. [19] Lonsdale, D., Ding, Y., Embley, D.W. & Melby, A. (2002). Peppering Knowledge Sources with SALT: Boosting Conceptual Content for Ontology Generation. Proc. of AAAI Workshop on Semantic Web Meets Language Resource. [20] Maedche, A. & Staab, S. (2000). Discovering Conceptual Relations from Text. Proc. of European Conference on Artificial Intelligence ECAI’00. [21] Maedche, A. & Staab, S. (2001). Ontology Learning for the Semantic Web. IEEE Intelligent Systems, 16(2). [22] Maedche, A. & Steffen, S. (2003). Ontology Learning. Handbook on Ontologies in Information Systems, S. Staab & R. Studer (eds.). Springer. [23] Morin, E. (1999). Automatic Acquisition of Semantic Relations Between Terms from Technical Corpora. Proc. of International Congress on Terminology and Knowledge Engineering TKE’99. [24] Nobecourt, J. (2000). A Method to Build Formal Ontologies from Texts. Proc. of EKAW’2000 Workshop on Ontologies and Texts. [25] Sanderson, M. & Croft, B. (1999). Deriving Concept Hierarchies from Text. Proc. of ACM International Conference on Research and Development in Information Retrieval SIGIR’99. [26] Wagner, A (2000). Enriching a Lexical Semantic Net with Selectional Preferences by Means of Statistical Corpus Analysis. Proc. of Workshop on Ontology Learning OL’01. [27] Wen, J.R., Nie, J.Y. & Zhang, H.J. (2001). Clustering User Queries of a Search Engine. Proc. of International on World Wide Web WWW’01. [28] Wen, J.R., Nie, J.Y. & Zhang, H.J. (2002). Query Clustering Using User Logs. ACM Transactions on Information Systems, 20(1) |
Description | 碩士 國立政治大學 資訊科學學系 90753003 90 |
資料來源 | http://thesis.lib.nccu.edu.tw/record/#G0090753003 |
Type | thesis |
dc.contributor.advisor | 沈錳坤 | zh_TW |
dc.contributor.author (Authors) | 陳茂富 | zh_TW |
dc.creator (作者) | 陳茂富 | zh_TW |
dc.date (日期) | 2001 | en_US |
dc.date.accessioned | 11-Sep-2009 16:02:50 (UTC+8) | - |
dc.date.available | 11-Sep-2009 16:02:50 (UTC+8) | - |
dc.date.issued (上傳時間) | 11-Sep-2009 16:02:50 (UTC+8) | - |
dc.identifier (Other Identifiers) | G0090753003 | en_US |
dc.identifier.uri (URI) | https://nccur.lib.nccu.edu.tw/handle/140.119/29677 | - |
dc.description (描述) | 碩士 | zh_TW |
dc.description (描述) | 國立政治大學 | zh_TW |
dc.description (描述) | 資訊科學學系 | zh_TW |
dc.description (描述) | 90753003 | zh_TW |
dc.description (描述) | 90 | zh_TW |
dc.description.abstract (摘要) | Ontology可用來組織、管理與分享知識,Ontology Engineering是一種建構Ontology的過程,建構的過程中,多數的工作需要人費時費力地去完成,因此利用機器來輔助Ontology Engineering成了一門重要的課題。使用Knowledge Discovery的方法協助Ontology Engineering建構Ontology的過程,稱為Ontology Learning,本論文中提出的Ontology Learning方法為分析使用者在搜尋引擎下關鍵字查詢時的行為,加上利用與查詢關鍵字有關的網頁資訊,以輔助建構Ontology。本論文中的Ontology由使用者所查詢的關鍵字組成,我們要learning的,則是這些關鍵字彼此之間的關係,其中有上義詞、下義詞與同義詞等等,因此,自動尋找關鍵字彼此之間的關係以輔助建構Ontology,即為我們提出本論文的目的。除此之外,本論文亦實作了完整的Ontology Learning系統,從一開始使用者查詢記錄的蒐集,關鍵字擷取與分析,關鍵字之間的關係判定,直到最後Ontology的產生,都將由系統自動完成。 | zh_TW |
dc.description.abstract (摘要) | Ontology can be used to organize, manage and share knowledge. Ontology Engineering is the process of constructing Ontology. However, it’s usually a time-consuming and error-prone task. Thus, utilizing methods of Knowledge Discovery to help Ontology Engineering is called Ontology Learning. In this thesis, Ontology Learning process is done by using those pages related query terms and analyzing the querying behavior of users on search engines. The Ontology is organized by user query terms and relations among them. These relations we define are hyperonomy, hyponomy, synonymy and et al. Our goal of this thesis is to automatically learn the correct relations among these query terms. Besides, we implemented the complete system platform for Ontology Learning. The system can automatically collect logs, extract and analyze query keywords, and produce the final Ontology. | en_US |
dc.description.tableofcontents | 目錄 第一章 1 1.1 簡介與動機 1 第二章 3 2.1 ONTOLOGY定義 3 2.2 ONTOLOGY ENGINEERING AND LEARNING 4 2.3 相關研究 4 2.3.1 Ontology Learning 4 2.3.2 Query Log Clustering 5 第三章 7 3.1 WEB LOG PREPROCESSING與USER SESSION IDENTIFICATION 7 3.2 QUERY SESSION IDENTIFICATION 11 3.3 PHRASE EXTRACTION 13 3.3.1 Phrase Identification 16 3.3.2 Phrase Domain Identification 17 3.4 PHRASE RELATION DISCOVERY 17 3.4.1 Candidate Discovery of Phrase Relation 18 3.4.2 Phrase Feature Extraction 24 3.4.3 Final Relation Validation 26 第四章 29 4.1 LOG檔的選擇 29 4.2 PROXY LOG介紹 29 4.3 取得不同搜尋引擎QUERY LOG 32 4.4 WEBPAGE REPOSITORY的建立與網頁資訊處理 32 第五章 35 5.1 實驗環境與資料來源 35 5.2 實驗評估 36 5.3 實驗數據 38 5.4 實驗結果與分析 40 第六章 42 REFERENCE 44 LIST OF TABLES 表3.1: User Path. 9 表3.2: Query Session範例. 14 表3.3: Query Session Identification結果. 14 表3.4: 查詢範例. 16 表3.5: User Session查詢的可能資料分佈. 19 表3.6: Query Type. 20 表3.7: Relation Validation. 28 表4.1: 搜尋引擎查詢URL. 32 表4.2: 處理Yahoo Webpage資訊. 33 表5.1: 實驗資料. 36 表5.2: Learning Accuracy Score Table. 38 表5.3: 實驗數據. 39 表5.4: Query Session. 40 表5.5: Result Relation. 40 表5.6: Ambiguous And Wrong Relation. 41 LIST OF FIGURES 圖2.1: Hierarchical Query Clustering. 6 圖3.1: Web Log範例. 7 圖3.2: 系統流程. 8 圖3.3: 查詢動作. 13 圖3.4: Query Session Identification Algorithm. 15 圖3.5: Phrase Domain Identification. 18 圖3.6: single phrase→single phrase範例. 21 圖3.7: single phrase→multi phrase範例. 22 圖3.8 : multi phrase→single phrase範例. 22 圖3.9: Candidate Relation Discovery. 26 圖3.10: Final Relation Validation. 27 圖3.11: Subsumption範例. 28 圖4.1: 系統架構. 30 圖4.2: Proxy Log範例. 31 圖4.4: Yahoo搜尋的網頁. 34 圖5.1: 實驗方法. 37 圖5.2: Learning Accuracy測量. 39 | zh_TW |
dc.language.iso | en_US | - |
dc.source.uri (資料來源) | http://thesis.lib.nccu.edu.tw/record/#G0090753003 | en_US |
dc.subject (關鍵詞) | 搜尋引擎查詢紀錄 | zh_TW |
dc.subject (關鍵詞) | 學習Ontology | zh_TW |
dc.subject (關鍵詞) | Ontology learning | en_US |
dc.subject (關鍵詞) | Query log | en_US |
dc.title (題名) | 從搜尋引擎查詢紀錄中學習Ontology | zh_TW |
dc.title (題名) | Ontology Learning from Query Logs of Search Engines | en_US |
dc.type (資料類型) | thesis | en |
dc.relation.reference (參考文獻) | [1] Agrawal, R., Imielinski, T. & Swami, A. (1993). Mining Association Rules Between Sets of Items in Large Databases. Proc. of ACM SIGMOD Conference on Management of Data. | zh_TW |
dc.relation.reference (參考文獻) | [2] Alfonseca, E. & Manandhar, S. (2002). Improving an Ontology Refinement Method with Hyponymy Patterns. Proc. of International Conference on Language Resources and Evaluation LREC’02. | zh_TW |
dc.relation.reference (參考文獻) | [3] Beeferman, D. & Berger, A. (2000). Agglomerative Clustering of a Search Engine Query Log. Proc. of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. | zh_TW |
dc.relation.reference (參考文獻) | [4] Berendt, B., Mobasher, B., Spiliopoulou, M. & Wiltshire, J. (2001). Measuring the Accuracy of Sessionizers for Web Usage Analysis. Proc. of Workshop on Web mining, SIAM Conference on Data Mining. | zh_TW |
dc.relation.reference (參考文獻) | [5] Byrd, R. J. & Ravin, Y. (1999). Identifying and Extracting Relations in Text. Proc. of International Conference on Applications of Natural Language to Information Systems NLDB’99. | zh_TW |
dc.relation.reference (參考文獻) | [6] Chen, Z., Fu, A.W.C. & Tong F.C.H (2003). Optimal Algorithms for Finding User Access Sessions from Very Large Web Logs. World Wide Web: Internet and Web Information Systems, 6(3). | zh_TW |
dc.relation.reference (參考文獻) | [7] Chuang, S.L. & Chien, L.F. (2002). Towards Automatic Generation of Query Taxonomy: A Hierarchical Query Clustering Approach. Proc. of IEEE International Conference on Data Mining ICDM’02. | zh_TW |
dc.relation.reference (參考文獻) | [8] Chuang, S.L. & Chien, L.F. (2003). Enriching Web Taxonomies Through Subject Categorization of Query Terms from Search Engine Logs. Decision Support Systems, 35 (1). | zh_TW |
dc.relation.reference (參考文獻) | [9] Faure, D. & Nedellec, C. (1998). A Corpus-based Conceptual Clustering Method for Verb Frames and Ontology. Proc. of LREC Workshop on Adapting Lexical and Corpus Resources to Sublanguages and Applications. | zh_TW |
dc.relation.reference (參考文獻) | [10] Faure, D. & Poibeau, T. (2000). First Experiments of Using Semantic Knowledge Learned by ASIUM for Information Extraction Task Using INTEX. Proc. of Workshop on Ontology Learning. | zh_TW |
dc.relation.reference (參考文獻) | [11] Gomez-Perez, A. & Manzano-Macho, D. (2003). A Survey of Ontology Learning Methods and Techniques. Technical Report, Institute of Computer Science, Leopold Franzens University of Innsbruck. | zh_TW |
dc.relation.reference (參考文獻) | [12] Hahn, U.& Klemens, S. (1998). Towards Text Knowledge Engineering. Proc. of Conference on Artificial Intelligence AI’98. | zh_TW |
dc.relation.reference (參考文獻) | [13] Hahn, U. & Schulz, S. (2000). Towards Very Large Terminological Knowledge Bases: A Case Study from Medicine. Proc. of Canadian Conference on Artificial Intelligence AI’00. | zh_TW |
dc.relation.reference (參考文獻) | [14] Hearst, M.A. (1992). Automatic Acquisition of Hyponyms from Large Text Corpora. Proc. of International Conference on Computational Linguistic. | zh_TW |
dc.relation.reference (參考文獻) | [15] Huang, C.K., Chien, L.F. & Oyang, Y.J (2003). Relevant Term Suggestion in Interactive Web Search Based on Contextual Information in Query Session Logs. Journal of the American Society for Information Science and Technology, 54(7). | zh_TW |
dc.relation.reference (參考文獻) | [16] Khan, L. & Luo, F. (2002). Ontology Construction for Information Selection. Proc. of IEEE International Conference on Tools with Artificial Intelligence ICTAI`02. | zh_TW |
dc.relation.reference (參考文獻) | [17] Kietz, J.U., Maedche, A. & Volz, R. (2000). A Method of Semi-Automatic Ontology Acquisition from a Corporate Intranet. Proc. of EKAW’2000 Workshop on Ontologies and Texts. | zh_TW |
dc.relation.reference (參考文獻) | [18] Lawrie, D. & Croft, W.B. (2000). Discovering and Comparing Topic Hierarchies. Proc. of RIAO 2000 Conference. | zh_TW |
dc.relation.reference (參考文獻) | [19] Lonsdale, D., Ding, Y., Embley, D.W. & Melby, A. (2002). Peppering Knowledge Sources with SALT: Boosting Conceptual Content for Ontology Generation. Proc. of AAAI Workshop on Semantic Web Meets Language Resource. | zh_TW |
dc.relation.reference (參考文獻) | [20] Maedche, A. & Staab, S. (2000). Discovering Conceptual Relations from Text. Proc. of European Conference on Artificial Intelligence ECAI’00. | zh_TW |
dc.relation.reference (參考文獻) | [21] Maedche, A. & Staab, S. (2001). Ontology Learning for the Semantic Web. IEEE Intelligent Systems, 16(2). | zh_TW |
dc.relation.reference (參考文獻) | [22] Maedche, A. & Steffen, S. (2003). Ontology Learning. Handbook on Ontologies in Information Systems, S. Staab & R. Studer (eds.). Springer. | zh_TW |
dc.relation.reference (參考文獻) | [23] Morin, E. (1999). Automatic Acquisition of Semantic Relations Between Terms from Technical Corpora. Proc. of International Congress on Terminology and Knowledge Engineering TKE’99. | zh_TW |
dc.relation.reference (參考文獻) | [24] Nobecourt, J. (2000). A Method to Build Formal Ontologies from Texts. Proc. of EKAW’2000 Workshop on Ontologies and Texts. | zh_TW |
dc.relation.reference (參考文獻) | [25] Sanderson, M. & Croft, B. (1999). Deriving Concept Hierarchies from Text. Proc. of ACM International Conference on Research and Development in Information Retrieval SIGIR’99. | zh_TW |
dc.relation.reference (參考文獻) | [26] Wagner, A (2000). Enriching a Lexical Semantic Net with Selectional Preferences by Means of Statistical Corpus Analysis. Proc. of Workshop on Ontology Learning OL’01. | zh_TW |
dc.relation.reference (參考文獻) | [27] Wen, J.R., Nie, J.Y. & Zhang, H.J. (2001). Clustering User Queries of a Search Engine. Proc. of International on World Wide Web WWW’01. | zh_TW |
dc.relation.reference (參考文獻) | [28] Wen, J.R., Nie, J.Y. & Zhang, H.J. (2002). Query Clustering Using User Logs. ACM Transactions on Information Systems, 20(1) | zh_TW |