以學名結構為基礎之網路搜尋負載量模型設計

柯怡芬; Ke, I Fen

Please use this identifier to cite or link to this item: https://ah.lib.nccu.edu.tw/handle/140.119/35243

DC Field	Value	Language
dc.contributor.advisor	管郁君<br>諶家蘭	zh_TW
dc.contributor.advisor	Huang, E.Y.<br>Seng, J.L.	en_US
dc.contributor.author	柯怡芬	zh_TW
dc.contributor.author	Ke, I Fen	en_US
dc.creator	柯怡芬	zh_TW
dc.creator	Ke, I Fen	en_US
dc.date	2006	en_US
dc.date.accessioned	2009-09-18T06:31:59Z	-
dc.date.available	2009-09-18T06:31:59Z	-
dc.date.issued	2009-09-18T06:31:59Z	-
dc.identifier	G0094356017	en_US
dc.identifier.uri	https://nccur.lib.nccu.edu.tw/handle/140.119/35243	-
dc.description	碩士	zh_TW
dc.description	國立政治大學	zh_TW
dc.description	資訊管理研究所	zh_TW
dc.description	94356017	zh_TW
dc.description	95	zh_TW
dc.description.abstract	網際網路搜尋是很重要的工具，可用以蒐集或尋找資訊。然而搜尋結果有時無法完全符合使用者的原意，所以網際網路搜尋引擎公司致力於發展更好的搜尋演算法，是為了增進搜尋結果的準確性並提高使用者對搜尋引擎的使用率，我們從探討的文獻中發現目前並沒有一個較彈性、開放的工具來評量網路搜尋的效能。本研究的目的就是希望能發展出一個較具彈性的負載量模型以針對網路搜尋進行效能評量。本研究著重在效能評量的負載量模型及測試套組的設計，我們希望透過以學名結構為基礎的方法擴展負載量模型的彈性，我們蒐集及研討幾個具代表性的網路搜尋演算法，並找出這些主要演算法的學名結構，以這些學名結構為基礎進行負載量模型的設計，負載量模型包含網頁模型、查詢模型與控制模型。最後，我們利用雛形實作來驗證本研究所提出的研究方法。	zh_TW
dc.description.abstract	Web search service is a vital way to find information on the web. However, not every piece of information found is relevant or useful. In order to improve search accuracy, most designers of the web search engines devote to working on search algorithms development and optimization. From literature, we realize that there are few open or flexible performance evaluation methods for web search service. The objective of this research is to develop a more flexible workload model based on generic construct for web search benchmarking and build an automated benchmarking environment of performance evaluation. Generic constructs are major components which can represent the web search algorithm. We collect and review literature related to web search algorithms and benchmarking. And we identify the generic constructs of key web search algorithms. The workload model consists of a page model, query model and control model. The page model describes the web page structure in web search. The query model defines some important criteria to query the web search engines. The control model defines the variables that used to set up the benchmark environment. Finally, we validate the research model through the prototype implementation.	en_US
dc.description.tableofcontents	ABSTRACT I\n中文摘要 II\n誌謝 III\nTABLE OF CONTENTS IV\nLIST OF FIGURES VI\nLIST OF TABLES VIII\nCHAPTER 1 INTRODUCTION 1\n1.1 Research Motivation 1\n1.2 Research Problem 1\n1.3 Research Objective 2\n1.4 Research Limitation 2\n1.5 Research Flow 2\n1.5 Organization of Thesis 3\nCHAPTER 2 LITERATURE REVIEW 5\n2.1 Link Structure 5\n2.2 PageRank 5\n2.2 HITS (\"hypertext induced topic selection\") 8\n2.3 Improvement of HITS 12\n2.4 TREC Web Track 20\n2.5 Summary 22\nCHAPTER 3 RESEARCH MODEL 26\n3.1 Research Structure 26\n3.2 Components of Research Model 28\n3.3 Page Model 28\n3.4 Query Model 30\n3.5 Control Model 33\n3.6 Performance Metrics 33\nCHAPTER 4 PROTOTYPE SYSTEM DEVELOPMENT 35\n4.1 Prototype Development Tool 35\n4.2 Prototype System Implementation 39\nCHAPTER 5 RESEARCH EXPERIMENT 44\n5.1 Experimental Design 44\n5.2 Experimental Execution and Results 50\n5.3 Summary 74\nCHAPTER 6 RESEARCH IMPLICATIONS AND CONCLUSIONS 75\n6.1 Research Implications 75\n6.2 Conclusions 77\n6.3 Future Research Work 77\nREFERENCES 79\n\n\n\n \nList of Figures\nFIGURE 1.1: RESEARCH FLOW 3\nFIGURE 2.1: LINK STRUCTURE 5\nFIGURE 2.2: ILLUSTRATION OF PAGERANK ALGORITHM 6\nFIGURE 2.3: EXAMPLE OF PAGERANK 7\nFIGURE 2.4: THE RELATIONSHIP BETWEEN HUBS AND AUTHORITIES 9\nFIGURE 2.5: AUTHORITY WEIGHT P = HUB WEIGHT Q1 + HUB WEIGHT Q2+HUB WEIGHT Q3 10\nFIGURE 2.6: HUB WEIGHT P=AUTHORITY WEIGHT Q1+AUTHORITY WEIGHT Q2+AUTHORITY WEIGHT Q3 11\nFIGURE 2.7: AUTHORITY WEIGHT OF P = HUB WEIGHT OF Q1 1/3+HUB WEIGHT OF Q21/3+HUB WEIGHT OF Q31/3 13\nFIGURE 2.8: HUB WEIGHT OF P =AUTHORITY WEIGHT OF Q11/3+AUTHORITY WEIGHT OF Q21/3+AUTHORITY WEIGHT OF Q31/3 13\nFIGURE 3.1: RESEARCH STRUCTURE 27\nFIGURE 3.2: THE PAGE MODEL HIERARCHY 29\nFIGURE 3.3: THE QUERY MODEL HIERARCHY 31\nFIGURE 4.2: THE MAIN MENU OF WORKLOAD MODEL 39\nFIGURE 4.3: PAGE MODEL SELECTION AND QUERY MODEL SELECTION 41\nFIGURE 4.4: QUERY SCRIPT OUTPUT OF THE WORKLOAD MODEL 42\nFIGURE 4.5: CONTROL MODEL INPUT OF SCHEDULER 43\nFIGURE 4.6: THE OUTPUT OF RESULT COLLECTOR 43\nFIGURE 5.1: SELECTED INPUT FORTEST1 51\nFIGURE 5.2: SCRIPT OUTPUT OF TEST1 51\nFIGURE 5.3: THE SEARCH RESULTS OF TEST1 52\nFIGURE 5.4: THE SEARCH RESULTS OF TEST1 BY GOOGLE 52\nFIGURE 5.5: SELECTED INPUT FOR TEST2 54\nFIGURE 5.6: SCRIPT OUTPUT OF TEST2 54\nFIGURE 5.7: THE SEARCH RESULTS OF TEST2 55\nFIGURE 5.8: THE SEARCH RESULTS OF TEST2 BY GOOGLE 55\nFIGURE 5.9: SELECTED INPUT FOR TEST3 57\nFIGURE 5.10: SCRIPT OUTPUT OF TEST3 57\nFIGURE 5.11: THE SEARCH RESULTS OF TEST3 58\nFIGURE 5.12: THE SEARCH RESULTS OF TEST3 BY GOOGLE 58\nFIGURE 5.13: SELECTED INPUT FOR TEST4 60\nFIGURE 5.14 SCRIPT OUTPUT OF TEST4 60\nFIGURE 5.15: THE SEARCH RESULTS OF TEST4 61\nFIGURE 5.16: SELECTED INPUT FOR TEST5 62\nFIGURE 5.17: SCRIPT OUTPUT OF TEST5 62\nFIGURE 5.18: THE SEARCH RESULTS OF TEST5 63\nFIGURE 5.19: SELECTED INPUT FOR TEST6 64\nFIGURE 5.20: SCRIPT OUTPUT OF TEST6 64\nFIGURE 5.21: THE SEARCH RESULTS OF TEST6 65\nFIGURE 5.22: SELECTED INPUT FOR TEST7 66\nFIGURE 5.23: SCRIPT OUTPUT OF TEST7 66\nFIGURE 5.24: THE SEARCH RESULTS OF TEST7 67\nFIGURE 5.25: SELECTED INPUT FOR TEST8 68\nFIGURE 5.26: SCRIPT OUTPUT OF TEST8 68\nFIGURE 5.27: THE SEARCH RESULTS OF TEST8 69\nFIGURE 5.28: SELECTED INPUT FOR TEST9 70\nFIGURE 5.29: SCRIPT OUTPUT OF TEST9 70\nFIGURE 5.30: THE SEARCH RESULTS OF TEST9 71\nFIGURE 5.31: SELECTED INPUT FOR TEST10 72\nFIGURE 5.32: SCRIPT OUTPUT OF TEST10 72\nFIGURE 5.33: THE SEARCH RESULTS OF TEST10 73\n\n \nList of Tables\nTABLE 2.1: PAGERANK OF PAGE A, B, C IN EACH ITERATION 7\nTABLE 2.2: GENERIC CONSTRUCTS OF PAGERANK 8\nTABLE 2.3: GENERIC CONSTRUCTS OF HITS 12\nTABLE 2.4: GENERIC CONSTRUCTS OF BHITS AND WBHITS 14\nTABLE 2.5: GENERIC CONSTRUCTS OF HITS BASED-VSM 16\nTABLE 2.6: GENERIC CONSTRUCTS OF HITS BASED-OKAPI 18\nTABLE 2.7: GENERIC CONSTRUCTS OF HITS BASED-CDR 19\nTABLE 2.8: GENERIC CONSTRUCTS OF HITS BASED-TLS 19\nTABLE 2.9: SUMMARY OF ALGORITHMS 22\nTABLE 2.10: GENERIC CONSTRUCTS OF ALGORITHMS 24\nTABLE 2.11: OPERATIONS OF GENERIC CONSTRUCTS OF ALGORITHMS 24\nTABLE 2.12: GENERIC CONSTRUCTS OF MSRA AT WEB TRACK OF TREC 2003 AND 2004 24\nTABLE 2.13: OPERATIONS OF GENERIC CONSTRUCTS OF MSRA AT WEB TRACK OF TREC 2003 AND 2004 25\nTABLE 3.1: THE MAPPING OF GENERIC CONSTRUCTS AND WORKLOAD MODEL 28\nTABLE 3.2: THE MAPPING OF OPERATIONS OF GENERIC CONSTRUCTS AND WORKLOAD MODEL 28\nTABLE 3.3: EXAMPLES OF THREE CATEGORIES 30\nTABLE 4.1: THE DESCRIPTION OF PROTOTYPE DEVELOPMENT TOOL 35\nTABLE 4.2: THE REQUEST METHOD AND URL OF WEB SEARCH SERVICES 35\nTABLE 4.3: THE REQUEST PARAMETERS OF WEB SEARCH SERVICES 36\nTABLE 5.1: THE TEST SPECIFICATIONS OF SINGLE PAGE-URL 44\nTABLE 5.2: THE TEST SPECIFICATIONS OF SINGLE PAGE-FONT SIZE, FONT COLOR, FRAME, META AND TABLE 45\nTABLE 5.3: THE TEST SPECIFICATIONS OF SINGLE PAGE-TITLE 45\nTABLE 5.4: THE TEST SPECIFICATIONS OF MULTI PAGE-COMPANY 46\nTABLE 5.5: THE TEST SPECIFICATIONS OF QUERY TYPE- HOMEPAGE FINDING 46\nTABLE 5.6: THE TEST SPECIFICATIONS OF QUERY TYPE- NAMED PAGE FINDING 47\nTABLE 5.7: THE TEST SPECIFICATIONS OF QUERY TYPE-TOPIC DISTILLATION 47\nTABLE 5.8: THE TEST SPECIFICATIONS OF LINK STRUCTURE-AUTHORITY-HUB 47\nTABLE 5.9: THE TEST SPECIFICATIONS OF SIMILARITY- TLS 48\nTABLE 5.10: THE TEST SPECIFICATIONS OF SYNONYM 48\nTABLE 5.11: EXPERIMENTAL METRIC 49\nTABLE 5.12: EXPERIMENTAL METRICS OF TEST1 53\nTABLE 5.13: EXPERIMENTAL METRICS OF TEST1 BY GOOGLE 53\nTABLE 5.14: EXPERIMENTAL METRICS OF TEST2 56\nTABLE 5.15: EXPERIMENTAL METRICS OF TEST2 BY GOOGLE 56\nTABLE 5.16: EXPERIMENTAL METRICS OF TEST3 59\nTABLE 5.17: EXPERIMENTAL METRICS OF TEST3 BY GOOGLE 59\nTABLE 5.18: EXPERIMENTAL METRICS OF TEST4 61\nTABLE 5.19: EXPERIMENTAL METRICS OF TEST5 63\nTABLE 5.20: EXPERIMENTAL METRICS OF TEST6 65\nTABLE 5.21: EXPERIMENTAL METRICS OF TEST7 67\nTABLE 5.22: EXPERIMENTAL METRICS OF TEST8 69\nTABLE 5.23: EXPERIMENTAL METRICS OF TEST9 71\nTABLE 5.24: EXPERIMENTAL METRICS OF TEST10 73	zh_TW
dc.format.extent	136690 bytes	-
dc.format.extent	143488 bytes	-
dc.format.extent	134776 bytes	-
dc.format.extent	132645 bytes	-
dc.format.extent	72585 bytes	-
dc.format.extent	192941 bytes	-
dc.format.extent	113611 bytes	-
dc.format.extent	194035 bytes	-
dc.format.extent	1329604 bytes	-
dc.format.extent	79294 bytes	-
dc.format.extent	90250 bytes	-
dc.format.mimetype	application/pdf	-
dc.format.mimetype	application/pdf	-
dc.format.mimetype	application/pdf	-
dc.format.mimetype	application/pdf	-
dc.format.mimetype	application/pdf	-
dc.format.mimetype	application/pdf	-
dc.format.mimetype	application/pdf	-
dc.format.mimetype	application/pdf	-
dc.format.mimetype	application/pdf	-
dc.format.mimetype	application/pdf	-
dc.format.mimetype	application/pdf	-
dc.language.iso	en_US	-
dc.source.uri	http://thesis.lib.nccu.edu.tw/record/#G0094356017	en_US
dc.subject	網路搜尋	zh_TW
dc.subject	績效評估	zh_TW
dc.subject	負載量模型	zh_TW
dc.subject	學名結構	zh_TW
dc.subject	web search	en_US
dc.subject	benchmark	en_US
dc.subject	workload model	en_US
dc.subject	generic construct	en_US
dc.subject	performance	en_US
dc.subject	evaluation	en_US
dc.title	以學名結構為基礎之網路搜尋負載量模型設計	zh_TW
dc.title	A Generic Construct based Workload Model for Web Search	en_US
dc.type	thesis	en
dc.relation.reference	[1]. Bharat, K., & Henzinger, M. R. (1998). Improved algorithms for topic distillation in a hyperlinked environment. SIGIR `98: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia. 104-111. from http://doi.acm.org/10.1145/290941.290972	zh_TW
dc.relation.reference	[2]. Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. [Electronic version]. Computer Networks & ISDN Systems, 30, 107-118.	zh_TW
dc.relation.reference	[3]. Can, F., Nuray, R., & Sevdik, A. B. (2004). Automatic performance evaluation of web search engines. [Electronic version]. Information Processing and Management, 40(3, May, 2004), 495-514.	zh_TW
dc.relation.reference	[4]. Chidlovskii, B., Roustant, B., & Brette, M. (2006). Documentum ECI self-repairing wrappers: Performance analysis. SIGMOD `06: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, Chicago, IL, USA. 708-717. from http://doi.acm.org/10.1145/1142473.1142555	zh_TW
dc.relation.reference	[5]. C. J., van Rijsbergen. Information retrieval (online book)., 2006 from http://www.dcs.gla.ac.uk/Keith/Preface.html	zh_TW
dc.relation.reference	[6]. Clarke, S., & Willett, P. (1997). Estimating the recall performance of search engines. ASLIB Proceedings, 49 (7), 184-189.	zh_TW
dc.relation.reference	[7]. David, H., Nick, C., Peter, B., & Kathleen, G. (2001). Measuring search engine quality. [Electronic version]. Information Retrieval, 4(1), 33-33.	zh_TW
dc.relation.reference	[8]. Hastie, T., Tibshirani, R., & Friedman, J. H. (2001). The elements of statistical learning : Data mining, inference, and prediction	zh_TW
dc.relation.reference	[9]. Jansen, B. J., & Spink, A. (2006). How are we searching the world wide web? A comparison of nine search engine transaction logs. [Electronic version]. Information Processing and Management, 1, January, 2006(42), 248-263.	zh_TW
dc.relation.reference	[10]. Ji-Rong , W., Ruihua, S., Deng, C., Kaihua, Z., Sphipeng, Y., & Shaozhi, Y., et al. (2003). MICROSOFT RESERACH ASIA AT THE WEB TRACK OF TREC 2003. Paper presented at the Text Retrieval Conference 2003, 408-408.	zh_TW
dc.relation.reference	[11]. Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. J.ACM, 46(5), 604-632.	zh_TW
dc.relation.reference	[12]. Kraaij, W., Westerveld, T., & Hiemstra, D. (2002). The importance of prior probabilities for entry page search. SIGIR `02: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland. 27-34. from http://doi.acm.org/10.1145/564376.564383	zh_TW
dc.relation.reference	[13]. Lawrence, P., Sergey, B., Rajeev, M., & Terry, W. (1998). The PageRank citation ranking: Bringing order to the web, Stanford Digital Libraries Working Paper.	zh_TW
dc.relation.reference	[14]. Li, L., Shang, Y., & Zhang, W. (2002). Improvement of HITS-based algorithms on web documents. WWW `02: Proceedings of the 11th International Conference on World Wide Web, Honolulu, Hawaii, USA. 527-535. from http://doi.acm.org/10.1145/511446.511514	zh_TW
dc.relation.reference	[15]. Nick, C., & David , H. (2004). Overview of the TREC-2004 web track. Paper presented at the Text Retrieval Conference 2004.	zh_TW
dc.relation.reference	[16]. Pant, G. (2003). Deriving link-context from HTML tag tree. DMKD `03: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, San Diego, California. 49-55. from http://doi.acm.org/10.1145/882082.882094	zh_TW
dc.relation.reference	[17]. Qin, T., Liu, T., Zhang, X., Feng, G., Wang, D., & Ma, W. (2007). Topic distillation via sub-site retrieval. [Electronic version]. Information Processing and Management, 43(2, March, 2007), 445-460.	zh_TW
dc.relation.reference	[18]. Richard, J. Measuring search effectiveness., 2006, from http://www.hsl.creighton.edu/hsl/Searching/Recall-Precision.html	zh_TW
dc.relation.reference	[19]. S E , R., & S, W. (1999). Okapi/Keenbow at TREC-8. Paper presented at the The Eighth Text Retrieval Conference (TREC 8), 151-162.	zh_TW
dc.relation.reference	[20]. S E, R., & K , S. J. (1976). Relevance weighting of search terms. [Electronic version]. Journal of the American Society for Information Science, 27(May-June), 129-146.	zh_TW
dc.relation.reference	[21]. Scarpa, M., Puliafito, A., Villari, M., & Zaia, A. (2004). A modeling technique for the performance analysis of web searching applications. IEEE Transactions on Knowledge and Data Engineering, 16(11), 1339-1356.	zh_TW
dc.relation.reference	[22]. Shafi, S. M., & Rather, R. A. (2005). \"Precision and Recall of Five Search Engines for Retrieval of Scholarly Information in the Field of Biotechnology.\" Webology, 2 (2), Article 12. Available at: http://www.webology.ir/2005/v2n2/a12.html	zh_TW
dc.relation.reference	[23]. Stephen, R. (2002). Threshold setting and performance optimization	zh_TW
dc.relation.reference	in adaptive filtering. [Electronic version]. Information Retrieval, 5(2-3), 239-239.	zh_TW
dc.relation.reference	[24]. Vapnik, V. N. (1998). Statistical learning theory Willey.	zh_TW
dc.relation.reference	[25]. Vaughan, L. (2004). New measurements for search engine evaluation proposed and tested. [Electronic version]. Information Processing and Management, 40(4, July, 2004), 677-691.	zh_TW
dc.identifier.doi	10.1016/j.ipm.2009.04.004	en_US
dc.doi.uri	http://dx.doi.org/10.1016/j.ipm.2009.04.004	en_US
item.fulltext	With Fulltext	-
item.grantfulltext	open	-
item.languageiso639-1	en_US	-
item.openairetype	thesis	-
item.openairecristype	http://purl.org/coar/resource_type/c_46ec	-
item.cerifentitytype	Publications	-
Appears in Collections:	學位論文

Files in This Item:

File	Size	Format
601701.pdf	133.49 kB	Adobe PDF2	View/Open
601702.pdf	140.12 kB	Adobe PDF2	View/Open
601703.pdf	131.62 kB	Adobe PDF2	View/Open
601704.pdf	129.54 kB	Adobe PDF2	View/Open
601705.pdf	70.88 kB	Adobe PDF2	View/Open
601706.pdf	188.42 kB	Adobe PDF2	View/Open
601707.pdf	110.95 kB	Adobe PDF2	View/Open
601708.pdf	189.49 kB	Adobe PDF2	View/Open
601709.pdf	1.3 MB	Adobe PDF2	View/Open
601710.pdf	77.44 kB	Adobe PDF2	View/Open
601711.pdf	88.13 kB	Adobe PDF2	View/Open

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Google Scholar^TM

Altmetric

Altmetric

Files in This Item:

Google ScholarTM

Altmetric

Altmetric

Google Scholar^TM