Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 基植於作者協同推薦的學術文獻搜尋研究
Academic Literature Search Based on Collaborative Recommendation by Authors
作者 王仁良
Wang, Jen Liang
貢獻者 沈錳坤
Shan, Man Kwan
王仁良
Wang, Jen Liang
關鍵詞 Mixed Media Graph (MMG)
學術文獻搜尋
協同推薦
Mixed Media Graph (MMG)
Academic Literature Search
Collaborative Recommendation
日期 2008
上傳時間 9-May-2016 12:02:27 (UTC+8)
摘要 隨著全球資訊網的發展,人們享受了資訊快速流通的便利,也造就了搜尋引擎的發展。針對學術文獻,ACM, IEEE等學術組織也將學術文獻數位化,並提供關鍵字查詢文獻的功能。此外,Google也發展了Google Scholar搜尋全球資訊網上的學術文獻。Google在回傳查詢結果時,除了考慮文獻內容與查詢關鍵字的相似度之外,也利用PageRank技術來考量文獻間的引用關係。但是,有時後使用者想查詢的是與查詢相關的重要參考文獻。這些文獻的內容與查詢未必有很高的相似度。
     因此本論文的研究目的在研究並發展推薦重要參考文獻的技術。我們先利用蜘蛛程式( spider)與剖析程式( parser)擷取分析ACM Digital Library上所收錄的論文後設資料,並解析出論文篇名、作者、摘要、關鍵字、分類、參考文獻等論文的重要組成要素。接著利用Mixed Media Graph(MMG)以描述關鍵字與參考文獻間關係的MMG 模型。當使用者輸入關鍵字,利用MMG做random walk因此可以找出與輸入關鍵字相關性最高的參考文獻。
The rapid development of the Internet, people enjoy the rapid flow of information to facilitate, but also created a search engine of development. ACM and IEEE have developed the digital libraries to provide literature search. Moreover, there exist some search engines for academic literature, such as Google Scholar. Google Scholar collects academic literatures from WWW and provides users the capability to query literatures by keywords. However, sometimes what users need is to search for important citations specified by authors, such as seminal survey papers or books.
     The aim of this thesis is to investigate and develop the mechanism for search for important citations. In the developed mechanism, first the spider crawls and collects the literature from ACM Digital Library. Then the parser parse and extract the meta information for each literature. The Mixed Media Graph is employed to capture the relationships between keywords and citations. Given a set of query keywords, the important citations are generated by random walk over the constructed Mixed Media Graph. Performance analysis shows that the proposed mechanism performs well.
第一章 緒論 1
     1.1 研究動機 1
     1.2 研究目標 2
     1.3 本論文章節結構 2
     第二章 相關研究 3
     2.1 資訊檢索(INFORMATION RETRIEVAL) 3
     2.1.1 索引(INDEXING) 3
     2.1.2 擷取(RETRIEVAL) 4
     2.1.3 排名(RANKING) 4
     2.2 GOOGLE PAGERANK 5
     第三章 研究方法 8
     3.1 軟體系統架構 8
     3.2 網站結構分析 10
     3.3 論文後設資料分析 11
     3.4 論文篇名關鍵字解析 12
     3.5 相同參考文獻之判斷 13
     3.6 MIXED MEDIA GRAPH (MMG) 15
     第四章 實驗實作 20
     4.1 蜘蛛程式(SPIDER)實作 20
     4.2剖析程式(PARSER) 實作 23
     4.3論文篇名解析實作 25
     4.4參考文獻相似度比較實作 25
     4.5 MMG實作 28
     第五章 實驗結果 32
     5.1 實驗目的 32
     5.2 實驗步驟 32
     5.3 衡量評估方法 34
     5.4 實驗結果 37
     第六章 結論 42
     參考文獻 43
參考文獻 【1】林宣華(民89),「應用機器學習於網際網路的資訊檢索和管理」,國立成功大學工程科學系博士論文。
     【2】Wikipedia, http://en.wikipedia.org/wiki/Information_retrieval
     【3】 R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley, 1999.
     【4】http://nlg.csie.ntu.edu.tw/courses/IR/IR2008.html (Course Topics)
     【5】G. Salton, A. Wong, and C. S. Yang, A Vector Space Model for Automatic Indexing, Communications of the ACM, Vol. 18, No., 11, 1975.
     【6】http://www.google.com/technology/
     【7】Sergey Brin and Lawrence Page, The Anatomy of a Large-scale Hypertextual Web Search Engine,. Proceedings of the Seventh International Conference on World Wide Web 7, 1998.
     【8】http://www.webworkshop.net/pagerank.html
     【9】Wikipedia ,http://en.wikipedia.org/wiki/PageRank,2008.
     【10】Jia-Yu Pan, Hyung-Jeong Yang, Christos Faloutsos, and Pinar Duygulu, GCap: Graph-Based Automatic Image Captioning, Proceedings of 4th International Workshop on Multimedia Data and Document Engineering, Washington, 2004.
     【11】Jia-Yu Pan, Hyung-Jeong Yang, Christos Faloutsos, and Pinar Duygulu, Automatic Multimedia Cross-modal Correlation Discovery, In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ,2004.
     【12】The ACM Computing Classification System [1998 Version], Valid in 2007 (http://www.acm.org/class/1998/overview.html), 2007.
     【13】Wikipedia ,http://en.wikipedia.org/wiki/Cross-validation
     【14】B. Efron, and R. Tibshirani, An Introduction to the Bootstrap. Chapman and Hall, New York, London, 1993.
描述 碩士
國立政治大學
資訊科學學系
94971012
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0094971012
資料類型 thesis
dc.contributor.advisor 沈錳坤zh_TW
dc.contributor.advisor Shan, Man Kwanen_US
dc.contributor.author (Authors) 王仁良zh_TW
dc.contributor.author (Authors) Wang, Jen Liangen_US
dc.creator (作者) 王仁良zh_TW
dc.creator (作者) Wang, Jen Liangen_US
dc.date (日期) 2008en_US
dc.date.accessioned 9-May-2016 12:02:27 (UTC+8)-
dc.date.available 9-May-2016 12:02:27 (UTC+8)-
dc.date.issued (上傳時間) 9-May-2016 12:02:27 (UTC+8)-
dc.identifier (Other Identifiers) G0094971012en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/94857-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學學系zh_TW
dc.description (描述) 94971012zh_TW
dc.description.abstract (摘要) 隨著全球資訊網的發展,人們享受了資訊快速流通的便利,也造就了搜尋引擎的發展。針對學術文獻,ACM, IEEE等學術組織也將學術文獻數位化,並提供關鍵字查詢文獻的功能。此外,Google也發展了Google Scholar搜尋全球資訊網上的學術文獻。Google在回傳查詢結果時,除了考慮文獻內容與查詢關鍵字的相似度之外,也利用PageRank技術來考量文獻間的引用關係。但是,有時後使用者想查詢的是與查詢相關的重要參考文獻。這些文獻的內容與查詢未必有很高的相似度。
     因此本論文的研究目的在研究並發展推薦重要參考文獻的技術。我們先利用蜘蛛程式( spider)與剖析程式( parser)擷取分析ACM Digital Library上所收錄的論文後設資料,並解析出論文篇名、作者、摘要、關鍵字、分類、參考文獻等論文的重要組成要素。接著利用Mixed Media Graph(MMG)以描述關鍵字與參考文獻間關係的MMG 模型。當使用者輸入關鍵字,利用MMG做random walk因此可以找出與輸入關鍵字相關性最高的參考文獻。
zh_TW
dc.description.abstract (摘要) The rapid development of the Internet, people enjoy the rapid flow of information to facilitate, but also created a search engine of development. ACM and IEEE have developed the digital libraries to provide literature search. Moreover, there exist some search engines for academic literature, such as Google Scholar. Google Scholar collects academic literatures from WWW and provides users the capability to query literatures by keywords. However, sometimes what users need is to search for important citations specified by authors, such as seminal survey papers or books.
     The aim of this thesis is to investigate and develop the mechanism for search for important citations. In the developed mechanism, first the spider crawls and collects the literature from ACM Digital Library. Then the parser parse and extract the meta information for each literature. The Mixed Media Graph is employed to capture the relationships between keywords and citations. Given a set of query keywords, the important citations are generated by random walk over the constructed Mixed Media Graph. Performance analysis shows that the proposed mechanism performs well.
en_US
dc.description.abstract (摘要) 第一章 緒論 1
     1.1 研究動機 1
     1.2 研究目標 2
     1.3 本論文章節結構 2
     第二章 相關研究 3
     2.1 資訊檢索(INFORMATION RETRIEVAL) 3
     2.1.1 索引(INDEXING) 3
     2.1.2 擷取(RETRIEVAL) 4
     2.1.3 排名(RANKING) 4
     2.2 GOOGLE PAGERANK 5
     第三章 研究方法 8
     3.1 軟體系統架構 8
     3.2 網站結構分析 10
     3.3 論文後設資料分析 11
     3.4 論文篇名關鍵字解析 12
     3.5 相同參考文獻之判斷 13
     3.6 MIXED MEDIA GRAPH (MMG) 15
     第四章 實驗實作 20
     4.1 蜘蛛程式(SPIDER)實作 20
     4.2剖析程式(PARSER) 實作 23
     4.3論文篇名解析實作 25
     4.4參考文獻相似度比較實作 25
     4.5 MMG實作 28
     第五章 實驗結果 32
     5.1 實驗目的 32
     5.2 實驗步驟 32
     5.3 衡量評估方法 34
     5.4 實驗結果 37
     第六章 結論 42
     參考文獻 43
-
dc.description.tableofcontents 第一章 緒論 1
     1.1 研究動機 1
     1.2 研究目標 2
     1.3 本論文章節結構 2
     第二章 相關研究 3
     2.1 資訊檢索(INFORMATION RETRIEVAL) 3
     2.1.1 索引(INDEXING) 3
     2.1.2 擷取(RETRIEVAL) 4
     2.1.3 排名(RANKING) 4
     2.2 GOOGLE PAGERANK 5
     第三章 研究方法 8
     3.1 軟體系統架構 8
     3.2 網站結構分析 10
     3.3 論文後設資料分析 11
     3.4 論文篇名關鍵字解析 12
     3.5 相同參考文獻之判斷 13
     3.6 MIXED MEDIA GRAPH (MMG) 15
     第四章 實驗實作 20
     4.1 蜘蛛程式(SPIDER)實作 20
     4.2剖析程式(PARSER) 實作 23
     4.3論文篇名解析實作 25
     4.4參考文獻相似度比較實作 25
     4.5 MMG實作 28
     第五章 實驗結果 32
     5.1 實驗目的 32
     5.2 實驗步驟 32
     5.3 衡量評估方法 34
     5.4 實驗結果 37
     第六章 結論 42
     參考文獻 43
zh_TW
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0094971012en_US
dc.subject (關鍵詞) Mixed Media Graph (MMG)zh_TW
dc.subject (關鍵詞) 學術文獻搜尋zh_TW
dc.subject (關鍵詞) 協同推薦zh_TW
dc.subject (關鍵詞) Mixed Media Graph (MMG)en_US
dc.subject (關鍵詞) Academic Literature Searchen_US
dc.subject (關鍵詞) Collaborative Recommendationen_US
dc.title (題名) 基植於作者協同推薦的學術文獻搜尋研究zh_TW
dc.title (題名) Academic Literature Search Based on Collaborative Recommendation by Authorsen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) 【1】林宣華(民89),「應用機器學習於網際網路的資訊檢索和管理」,國立成功大學工程科學系博士論文。
     【2】Wikipedia, http://en.wikipedia.org/wiki/Information_retrieval
     【3】 R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley, 1999.
     【4】http://nlg.csie.ntu.edu.tw/courses/IR/IR2008.html (Course Topics)
     【5】G. Salton, A. Wong, and C. S. Yang, A Vector Space Model for Automatic Indexing, Communications of the ACM, Vol. 18, No., 11, 1975.
     【6】http://www.google.com/technology/
     【7】Sergey Brin and Lawrence Page, The Anatomy of a Large-scale Hypertextual Web Search Engine,. Proceedings of the Seventh International Conference on World Wide Web 7, 1998.
     【8】http://www.webworkshop.net/pagerank.html
     【9】Wikipedia ,http://en.wikipedia.org/wiki/PageRank,2008.
     【10】Jia-Yu Pan, Hyung-Jeong Yang, Christos Faloutsos, and Pinar Duygulu, GCap: Graph-Based Automatic Image Captioning, Proceedings of 4th International Workshop on Multimedia Data and Document Engineering, Washington, 2004.
     【11】Jia-Yu Pan, Hyung-Jeong Yang, Christos Faloutsos, and Pinar Duygulu, Automatic Multimedia Cross-modal Correlation Discovery, In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ,2004.
     【12】The ACM Computing Classification System [1998 Version], Valid in 2007 (http://www.acm.org/class/1998/overview.html), 2007.
     【13】Wikipedia ,http://en.wikipedia.org/wiki/Cross-validation
     【14】B. Efron, and R. Tibshirani, An Introduction to the Bootstrap. Chapman and Hall, New York, London, 1993.
zh_TW