Publications-Periodical Articles

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 Mining Subtopics from Different Aspects for Diversifying Search Results
作者 Wang, Chieh-Jen ; Lin, Yung-Wei ; Tsai, Ming-Feng ; Chen, Hsin-Hsi
蔡銘峰
貢獻者 資科系
關鍵詞 Diversified retrieval ; Subtopic mining ; Search result re-ranking
日期 2013.08
上傳時間 6-Mar-2014 16:29:40 (UTC+8)
摘要 User queries to the Web tend to have more than one interpretation due to their ambiguity and other characteristics. How to diversify the ranking results to meet users’ various potential information needs has attracted considerable attention recently. This paper is aimed at mining the subtopics of a query either indirectly from the returned results of retrieval systems or directly from the query itself to diversify the search results. For the indirect subtopic mining approach, clustering the retrieval results and summarizing the content of clusters is investigated. In addition, labeling topic categories and concept tags on each returned document is explored. For the direct subtopic mining approach, several external resources, such as Wikipedia, Open Directory Project, search query logs, and the related search services of search engines, are consulted. Furthermore, we propose a diversified retrieval model to rank documents with respect to the mined subtopics for balancing relevance and diversity. Experiments are conducted on the ClueWeb09 dataset with the topics of the TREC09 and TREC10 Web Track diversity tasks. Experimental results show that the proposed subtopic-based diversification algorithm significantly outperforms the state-of-the-art models in the TREC09 and TREC10 Web Track diversity tasks. The best performance our proposed algorithm achieves is α-nDCG@5 0.307, IA-P@5 0.121, and α#-nDCG@5 0.214 on the TREC09, as well as α-nDCG@10 0.421, IA-P@10 0.201, and α#-nDCG@10 0.311 on the TREC10. The results conclude that the subtopic mining technique with the up-to-date users’ search query logs is the most effective way to generate the subtopics of a query, and the proposed subtopic-based diversification algorithm can select the documents covering various subtopics.
關聯 Information Retrieval, 16(4), 452-483
資料類型 article
DOI http://dx.doi.org/10.1007/s10791-012-9215-y
dc.contributor 資科系en_US
dc.creator (作者) Wang, Chieh-Jen ; Lin, Yung-Wei ; Tsai, Ming-Feng ; Chen, Hsin-Hsien_US
dc.creator (作者) 蔡銘峰zh_TW
dc.date (日期) 2013.08en_US
dc.date.accessioned 6-Mar-2014 16:29:40 (UTC+8)-
dc.date.available 6-Mar-2014 16:29:40 (UTC+8)-
dc.date.issued (上傳時間) 6-Mar-2014 16:29:40 (UTC+8)-
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/64482-
dc.description.abstract (摘要) User queries to the Web tend to have more than one interpretation due to their ambiguity and other characteristics. How to diversify the ranking results to meet users’ various potential information needs has attracted considerable attention recently. This paper is aimed at mining the subtopics of a query either indirectly from the returned results of retrieval systems or directly from the query itself to diversify the search results. For the indirect subtopic mining approach, clustering the retrieval results and summarizing the content of clusters is investigated. In addition, labeling topic categories and concept tags on each returned document is explored. For the direct subtopic mining approach, several external resources, such as Wikipedia, Open Directory Project, search query logs, and the related search services of search engines, are consulted. Furthermore, we propose a diversified retrieval model to rank documents with respect to the mined subtopics for balancing relevance and diversity. Experiments are conducted on the ClueWeb09 dataset with the topics of the TREC09 and TREC10 Web Track diversity tasks. Experimental results show that the proposed subtopic-based diversification algorithm significantly outperforms the state-of-the-art models in the TREC09 and TREC10 Web Track diversity tasks. The best performance our proposed algorithm achieves is α-nDCG@5 0.307, IA-P@5 0.121, and α#-nDCG@5 0.214 on the TREC09, as well as α-nDCG@10 0.421, IA-P@10 0.201, and α#-nDCG@10 0.311 on the TREC10. The results conclude that the subtopic mining technique with the up-to-date users’ search query logs is the most effective way to generate the subtopics of a query, and the proposed subtopic-based diversification algorithm can select the documents covering various subtopics.en_US
dc.format.extent 661585 bytes-
dc.format.mimetype application/pdf-
dc.language.iso en_US-
dc.relation (關聯) Information Retrieval, 16(4), 452-483en_US
dc.subject (關鍵詞) Diversified retrieval ; Subtopic mining ; Search result re-rankingen_US
dc.title (題名) Mining Subtopics from Different Aspects for Diversifying Search Resultsen_US
dc.type (資料類型) articleen
dc.identifier.doi (DOI) 10.1007/s10791-012-9215-yen_US
dc.doi.uri (DOI) http://dx.doi.org/10.1007/s10791-012-9215-yen_US