學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 資訊檢索之學術智慧
Research Intelligence Involving Information Retrieval
作者 杜逸寧
Tu, Yi-Ning
貢獻者 諶家蘭<br>林我聰
Seng, Jia-Lang<br>Lin, Woo-Tsong
杜逸寧
Tu, Yi-Ning
關鍵詞 議題的發現與追蹤
資料探勘
資訊檢索
學術智慧
貝氏估計
新穎度指標
發表量指標
引文分析
Topic discovery and tracking
data mining
information retrieval
Bayesian estimation
academic intelligence
novelty index
published volume index
citation analysis
日期 2009
上傳時間 4-Sep-2013 16:55:05 (UTC+8)
摘要 偵測新興議題對於研究者而言是一個相當重要的問題,研究者如何在有限的時間和資源下探討同一領域內的新興議題將比解決已經成熟的議題帶來較大的貢獻和影響力。本研究將致力於協助研究者偵測新興且具有未來潛力的研究議題,並且從學術論文中探究對於研究者在做研究中有幫助的學術智慧。在搜尋可能具有研究潛力的議題時,我們假設具有研究潛力的議題將會由同一領域中較具有影響力的作者和刊物發表出,因此本研究使用貝式估計的方法去推估同一領域中相關的研究者和學術刊物對於該領域的影響力,進而藉由這些資訊可以找出未來具有潛力的新興候選議題。此外就我們所知的議題偵測文獻中對於認定一個議題是否已經趨於成熟或者是否新穎且具有研究的潛力仍然缺乏有效及普遍使用的衡量工具,因此本研究試圖去發展有效的衡量工具以評估議題就本身的發展生命週期是否仍然具有繼續投入的學術價值。
本研究從許多重要的資料庫中挑選了和資料探勘和資訊檢索相關的論文並且驗證這些在會議論文中所涵蓋的議題將會領導後續幾年期刊論文相似的議題。此外本研究也使用了一些已經存在的演算法並且結合這些演算法發展一個檢測的流程幫助研究者去偵測學術論文中的領導趨勢並發掘學術智慧。本研究使用貝式估計的方法試圖從已經發表的資訊和被引用的資訊來建構估計作者和刊物的影響力的事前機率與概似函數,並且計算出同一領域重要的作者和刊物的影響力,當這些作者和刊物的論文發表時將會相對的具有被觀察的價值,進而檢定這些新興候選議題是否會成為新興議題。而找出的重要研究議題雖然已經縮小探索的範圍,但是仍然有可能是發展成熟的議題使得具有影響力的作者和刊物都必須討論,因此需要評估議題未來潛力的指標或工具。然而目前文獻中對於評估議題成熟的方法僅著重在議題的出現頻率而忽視了議題的新穎度也是重要的指標,另一方面也有只為了找出新議題並沒有顧及這個議題是否具有未來的潛力。更重要的是單一的使用出現頻率的曲線只能在議題已經成熟之後才能確定這是一個重要的議題,使得這種方法成為落後的指標。
本研究試圖提出解決這些困境的指標進而發展成衡量新興議題潛力的方法。這些指標包含了新穎度指標、發表量指標和偵測點指標,藉由這些指標和曲線可以在新興議題的偵測中提供更多前導性的資訊幫助研究者去建構各自領域中新興議題的偵測標準。偵測點所代表的意義並非這個議題開始新興的正確日期,它代表了這個議題在自己發展的生命週期上最具有研究的潛力和價值的時間點,因此偵測點會根據後來的蓬勃發展而在時間上產生遞延的結果,這表示我們的指標可以偵測出議題生命力的延續。相對於傳統的次數分配曲線可以看出議題的崛起和衰退,本研究的發表量指標更能以生命週期的概念去看出議題在各個時間點的發展潛力。本研究希望從這些過程中所發現的學術智慧可以幫助研究者建構各自領域的議題偵測標準,節省大量人力與時間於探究新興議題。本研究所提出的新方法不僅可以解決影響因子這個指標的缺點,此外還可以使用作者和刊物的影響力去針對一個尚未累積任何索引次數的論文進行潛力偵測,解決Google 學術搜尋目前總是在論文已經被很多檢索之後才能確定論文重要性的缺點,學者總是希望能夠領先發現重要的議題或論文。然而,我們以議題為導向的檢索方法相信可以更確實的滿足研究者在搜尋議題或論文上的需求。
This research presents endeavors that seek to identify the emerging topics for researchers and pinpoint research intelligence via academic papers. It is intended to reveal the connection between topics investigated by conference papers and journal papers which can help the research decrease the plenty of time and effort to detect all the academic papers. In order to detect the emerging research topics the study uses the Bayesian estimation approach to estimate the impact of the authors and publications may have on a topic and to discover candidate emerging topics by the combination of the impact authors and publications. Finally the research also develops the measurement tools which could assess the research potential of these topics to find the emerging topics.
This research selected huge of papers in data mining and information retrieval from well-known databases and showed that the topics covered by conference papers in a year often leads to similar topics covered by journal papers in the subsequent year and vice versa. This study also uses some existing algorithms and combination of these algorithms to propose a new detective procedure for the researchers to detect the new trend and get the academic intelligence from conferences and journals. The research uses the Bayesian estimation approach and citation analysis methods to construct the prior distribution and likelihood function of the authors and publications in a topic. Because the topics published by these authors and publications will get more attention and valuable than others. Researchers can assess the potential of these candidate emerging topics. Although the topics we recommend decrease the range of the searching space, these topics may so popular that even all of the impact authors and publications discuss it. The measurement tools or indices are need. But the current methods only focus on the frequency of subjects, and ignore the novelty of subjects which is critical and beyond the frequency study or only focus one of them and without considering the potential of the topics. Some of them only use the curve of published frequency will make the index as a backward one. This research tackles the inadequacy to propose a set of new indices of novelty for emerging topic detection. They are the novelty index (NI) and the published volume index (PVI). These indices are then utilized to determine the detection point (DP) of emerging topics. The detection point (DP) is not the real time which the topic starts to be emerging, but it represents the topic have the highest potential no matter in novelty or hotness for research in its life cycle. Different from the absolute frequent method which can really find the exact emerging period of the topic, the PVI uses the accumulative relative frequency and tries to detect the research potential timing of its life cycle. Following the detection points, the intersection decides the worthiness of a new topic. Readers following the algorithms presented this thesis will be able to decide the novelty and life span of an emerging topic in their field. The novel methods we proposed can improve the limitations of impact factor proposed by ISI. Besides, it uses the impact power of the authors and the publication in a topic to measure the impact power of a paper before it really has been an impact paper can solve the limitations of Google scholar’s approach. We suggest that the topic oriented thinking of our methods can really help the researchers to solve their problems of searching the valuable topics.
參考文獻 Allan, J., Carbonell, J., Doddington, G., Yamron, J., & Yang, T. (1998). Topic detection and tracking pilot study: Final report. In: Proceedings of the DARPA Broadcast News Transcription an Understanding Workshop.

Allan, J., Papka, R., & Lavrenko, V., (1998). On-line new event detection and tracking. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 37-45.

Aurora, P. P., Rafael, B. L., & Jose, R. S. (2007). Topic discovery based on text mining techniques. Information Processing & Management, 43, pp. 742-768.

Berry, M.W. (2004) Survey of text mining-clustering, classification, and retrieval. Springer, pp. 185-224.
Bolelli, L., Ertekin, S., Zhou, D., & Giles, C. L. (2009). Finding topic trends in digital libraries, In: Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries, pp. 69-72.

Chen, K.Y., Luesukprasert, L., & Chou, S. C. (2007). Hot topic extraction based on timeline analysis and multidimensional sentence modeling. IEEE Transactions on Knowlede and Data Enginerting, 19(8), pp. 1016-1025.

Chou, T. C., & Chen, M. C. (2008). Using incremental plsi for threshold-resilient online event analysis. IEEE Transactions on Knowlede and Data Enginerting, 20(3), pp. 289-299.
Clifton,
C., Cooley, R., & Rennie, J. (2004). Topcat: data mining for topic indentification in a text corpus. IEEE Transactions on Knowlede and Data Enginerting, 16(8), pp. 949-964.

Cui, C., & Kitagawa, H. (2005). Topic activation analysis for document streams based on document arrival rate and relevance. In: Proceedings of the 2005 ACM symposium on applied computing, pp. 1089-1095.

Felix, M. A., Benjamin, V. Q., Zaida, C. R., Elena, C. A., Victor, H. S., Francisco J. M. F. (2005). Domain analysis and information retrieval through the construction of heliocentric maps based on ISI-JCR category cocitation. Information Processing & Management, 41(6), pp. 1521-1533.

Franz, M., & McCarley, J. C. (2001). Unsupervised and supervised clustering for topic tracking. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 310-317.

Hatzivassiloglou, V., Gravano, L., & Maganti, A. (2000). An investigation of linguistic features and clustering algorithms. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 224-231.

Jin, Y., Myaeng, S. H., & Jung, Y. (2007). Use of place information for improved event tracking. Information Processing & Management, 43, pp. 365-378.

Jo, Y., Lagoze, C., & Giles, C. L. (2007). Detecting research topics via the correlation between graphs and texts. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.370-379.

Joachims, T. (1998). Text categorization with Support Vector Machines: learning with many relevant features. In: Proceedings of the EMNLP Conference.

Kollios, G., Gunopulos, D., Koudas, N., & Berchtold, S. (2003). Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Transactionson Knowlede and Data Enginerting, 15(5), pp. 1170-1187.

Kleinberg, J. (2002). Bursty and hierarchical structure in streams. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 91-101.

Kuramochi, M., & Karypis, G. (2004). An efficient algorithm for discovering frequent subgraphs. IEEE Transactionson on Knowlede and Data Enginerting, 16(9), pp. 1038-1051.

Lee, C., Lee, G. G., & J, M. (2007). Dependency structure language model for topic detection and tracking. Information Processing & Management, 43, pp. 1249-1259.

Lee, Z., Gosain, S., & Im, I. (1997). Topics of interest in IS: evolution of themes and differences between research and practice. Information & Management, 36, pp. 233-246.

Liu, Y., Niculescu-Mizil, A., & Gryc, W. (2009). Topic-link LDA: joint models of topic and author community, In :Proceedings of the 26th Annual International Conference on Machine Learning, pp. 665-672.

Malone, J., McGarry, K., & Bowerman, C. (2006). Automated trend analysis of proteomics data using an intelligent data mining architecture, Expert Systems with Applications, 30, pp. 24-33.

Manmatha, R., Feng, A., & Allan, J. (2002). A critical examination of TDT’s cost function. In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 403-404.

Markkonen, J., Ahonen-Myka, H., & Salmenkivi, M. (2004). Simple semantics in topic detection and tracking. Information Retrieval, 7, pp. 347-368.

Morinaga, S., & Yamanishi, K. (2004). Tracking dynamics of topic trends using a finite mixture model. In: Proceedings of the 10th ACM SIGKDD international
conference on Knowledge discovery and data mining, pp.811-816.

Moulinier, I., Raskinis, G., & Ganascia, J. (1996). Text categorization: A symbolic approach. In: Annual Symposium on Document Analysis and information retrieval (SDAIR).

Nallapati, R., Ahmed, A., Xing, E. P., & Cohen, W. W. (2008). Joint latent topic models for text and citations. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 542-550.

Ontrup, J., Ritter, H., Scholz, S. W., & Wagner R. (2008). Detecting, assessing and monitoring relevant topics in virtual information environments. IEEE Transactionson Knowlede and Data Enginerting, 20(7).

Ozmutlu, H. C., & Cavdur, F. (2005). Application of automatic topic identification on excited web search engine data logs. Information Processing & Management, 41, pp. 1243-1262.

Ozmutlu, S. (2006). Automatic new topic identification using multiple linear regression. Information Processing & Management, 42, pp. 934-950.

Porter, M. (1980). An algorithm for suffix stripping. Program (Automated Library and Information Systems), 14(3), pp. 130-137.

Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P., & Steyvers, M. (2010). Learning author-topic models from text corpora, Transactions on Information Systems, 28 (1).

Salton, G. (1989). Automatic text processing: The transformation, analysis and retrieval of information by computer, Addison-Wesley, Reading, MA.

Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), pp. 613-620.
Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), pp. 513-523.

Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. McGraw Hill Publishing Company.

Schultz, J. M., & Liberman, M. (1999). Topic detection and tracking using idf-weighted cosine coefficient. In: Proceedings of the DARPA Broadcast News Transcription an Understanding Workshop.

Schutze, H., Hull, D., & Pedersen, J. (1995). A comparison of classifiers and document representations for the routing problem. In: Proceedings of the 18st annual international ACM SIGIR conference on Research and development in information retrieval, pp.229-237.

Steyvers, M., Smyth, P., & Griffiths, T. (2004). Probabilistic author topic models for information discovery. In: Proceedings of the 10th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 306-315.

Stokes, N., & Carthy, J. (2001). Combining semantic and syntactic document classifiers to improve first story detection. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 424-425.

Swan, R., & Allan, J. (2000). Automatic generation of overview timelines. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 49-56.

Tu, Y. N., & Seng, J. L. (2009). Research Intelligence Involving Information Retrieval – An example of Conferences and Journals, Expert Systems with Applications, 47(6).

Tu, Y. N., & Seng, J. L. (2010). Indices of Novelty for Emerging Topic Detection. (working paper).

Tan, P. N., Steinbach, M. & Kumar, V. (2006). Introduction to data mining. Addison-Wesley, pp. 69-84.

Thelwall, M. (2005). Scientific web intelligence: Finding relationships in university webs, Communications of the ACM, 48(7), pp. 93-96.

Thelwall, M., & Harries, G. (2004). Do better scholars’ Web publications have significantly higher online impact? Journal of the American Society for Information Science and Technology, 55(2), pp. 149-159.

Thelwall, M., Vaughan, L., Cothey, V., Li, X., & Smith, A. (2003). Which academic subjects have most online impact? A pilot study and a new classification process, Online Information Review, 27(5), pp. 333-343.

Tho, Q. T., Hui, S. C., & Fong, A. C. M. (2007). A citation-based document retrieval system for finding research expertise, Information Processing and Management, 43(1), pp. 248-264.

Walls, F., Jin, H., Sista, S., & Schwartz, R. (1999). Topic detection in broadcast news, In: Proceedings of the DARPA Broadcast News Transcription an Understanding Workshop.

Wang, X., Zhai, C., Hu, X., & Sproat, R. (2007). Mining correlated bursty topic patterns from coordinated text streams, In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 784-793.

Wu, K., Chen, M., & Sun, Y. (2004). Automatic topics discovery from hyperlinked documents, Information Processing & Management, 40, pp. 239-255.

Yang, H. C., & Lee, C. H. (2004). A text mining approach on automatic generation of web directories and hierarchies, Expert Systems with Applications, 27, pp. 645-663.

Yang, H. C., & Lee, C. H. (2005). A text mining approach for automatic construction of hypertexts, Expert Systems with Applications, 29, pp. 723-734.

Yang, Y., Ault, T., Pierce T., & Lattimer, C. W. (2000). Improving text categorization methods for event tracking, In: Proceedings of the 23th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 65-72.

Yang, Y. & Pedersen, J. (1997). A comparative study on feature selection in text categorization, In: International Conference on Machine Learning.

Yang, Y. & Wilbur, J. (1996). Using corpus statistics to remove redundant words in text categorization, Journal of the American Society for Information Science, 47(5), pp. 357-369.

Yang, Y., Zhang, J., Carbonell, J., & Jin, Chun. (2002). Topic-conditioned novelty detection, In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.688-693.

Yang, Y., Yoo, S., Zhang, J., & Kisiel, B. (2005). Robustness of adaptive filtering methods in a cross-benchmark evaluation, In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 98-105.

Zhang, Y., Callan, J., & Minka, T. (2002). Novelty and redundancy detection in adaptive filtering, In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 81-88.

Zhang, Y., Surendran, A. C., Platt, J. C., & Narasimhan, M. (2008). Learning from multi-topic web documents for contextual advertisement, In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.1051-1059.
描述 博士
國立政治大學
資訊管理研究所
94356509
98
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0094356509
資料類型 thesis
dc.contributor.advisor 諶家蘭<br>林我聰zh_TW
dc.contributor.advisor Seng, Jia-Lang<br>Lin, Woo-Tsongen_US
dc.contributor.author (Authors) 杜逸寧zh_TW
dc.contributor.author (Authors) Tu, Yi-Ningen_US
dc.creator (作者) 杜逸寧zh_TW
dc.creator (作者) Tu, Yi-Ningen_US
dc.date (日期) 2009en_US
dc.date.accessioned 4-Sep-2013 16:55:05 (UTC+8)-
dc.date.available 4-Sep-2013 16:55:05 (UTC+8)-
dc.date.issued (上傳時間) 4-Sep-2013 16:55:05 (UTC+8)-
dc.identifier (Other Identifiers) G0094356509en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/60196-
dc.description (描述) 博士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊管理研究所zh_TW
dc.description (描述) 94356509zh_TW
dc.description (描述) 98zh_TW
dc.description.abstract (摘要) 偵測新興議題對於研究者而言是一個相當重要的問題,研究者如何在有限的時間和資源下探討同一領域內的新興議題將比解決已經成熟的議題帶來較大的貢獻和影響力。本研究將致力於協助研究者偵測新興且具有未來潛力的研究議題,並且從學術論文中探究對於研究者在做研究中有幫助的學術智慧。在搜尋可能具有研究潛力的議題時,我們假設具有研究潛力的議題將會由同一領域中較具有影響力的作者和刊物發表出,因此本研究使用貝式估計的方法去推估同一領域中相關的研究者和學術刊物對於該領域的影響力,進而藉由這些資訊可以找出未來具有潛力的新興候選議題。此外就我們所知的議題偵測文獻中對於認定一個議題是否已經趨於成熟或者是否新穎且具有研究的潛力仍然缺乏有效及普遍使用的衡量工具,因此本研究試圖去發展有效的衡量工具以評估議題就本身的發展生命週期是否仍然具有繼續投入的學術價值。
本研究從許多重要的資料庫中挑選了和資料探勘和資訊檢索相關的論文並且驗證這些在會議論文中所涵蓋的議題將會領導後續幾年期刊論文相似的議題。此外本研究也使用了一些已經存在的演算法並且結合這些演算法發展一個檢測的流程幫助研究者去偵測學術論文中的領導趨勢並發掘學術智慧。本研究使用貝式估計的方法試圖從已經發表的資訊和被引用的資訊來建構估計作者和刊物的影響力的事前機率與概似函數,並且計算出同一領域重要的作者和刊物的影響力,當這些作者和刊物的論文發表時將會相對的具有被觀察的價值,進而檢定這些新興候選議題是否會成為新興議題。而找出的重要研究議題雖然已經縮小探索的範圍,但是仍然有可能是發展成熟的議題使得具有影響力的作者和刊物都必須討論,因此需要評估議題未來潛力的指標或工具。然而目前文獻中對於評估議題成熟的方法僅著重在議題的出現頻率而忽視了議題的新穎度也是重要的指標,另一方面也有只為了找出新議題並沒有顧及這個議題是否具有未來的潛力。更重要的是單一的使用出現頻率的曲線只能在議題已經成熟之後才能確定這是一個重要的議題,使得這種方法成為落後的指標。
本研究試圖提出解決這些困境的指標進而發展成衡量新興議題潛力的方法。這些指標包含了新穎度指標、發表量指標和偵測點指標,藉由這些指標和曲線可以在新興議題的偵測中提供更多前導性的資訊幫助研究者去建構各自領域中新興議題的偵測標準。偵測點所代表的意義並非這個議題開始新興的正確日期,它代表了這個議題在自己發展的生命週期上最具有研究的潛力和價值的時間點,因此偵測點會根據後來的蓬勃發展而在時間上產生遞延的結果,這表示我們的指標可以偵測出議題生命力的延續。相對於傳統的次數分配曲線可以看出議題的崛起和衰退,本研究的發表量指標更能以生命週期的概念去看出議題在各個時間點的發展潛力。本研究希望從這些過程中所發現的學術智慧可以幫助研究者建構各自領域的議題偵測標準,節省大量人力與時間於探究新興議題。本研究所提出的新方法不僅可以解決影響因子這個指標的缺點,此外還可以使用作者和刊物的影響力去針對一個尚未累積任何索引次數的論文進行潛力偵測,解決Google 學術搜尋目前總是在論文已經被很多檢索之後才能確定論文重要性的缺點,學者總是希望能夠領先發現重要的議題或論文。然而,我們以議題為導向的檢索方法相信可以更確實的滿足研究者在搜尋議題或論文上的需求。
zh_TW
dc.description.abstract (摘要) This research presents endeavors that seek to identify the emerging topics for researchers and pinpoint research intelligence via academic papers. It is intended to reveal the connection between topics investigated by conference papers and journal papers which can help the research decrease the plenty of time and effort to detect all the academic papers. In order to detect the emerging research topics the study uses the Bayesian estimation approach to estimate the impact of the authors and publications may have on a topic and to discover candidate emerging topics by the combination of the impact authors and publications. Finally the research also develops the measurement tools which could assess the research potential of these topics to find the emerging topics.
This research selected huge of papers in data mining and information retrieval from well-known databases and showed that the topics covered by conference papers in a year often leads to similar topics covered by journal papers in the subsequent year and vice versa. This study also uses some existing algorithms and combination of these algorithms to propose a new detective procedure for the researchers to detect the new trend and get the academic intelligence from conferences and journals. The research uses the Bayesian estimation approach and citation analysis methods to construct the prior distribution and likelihood function of the authors and publications in a topic. Because the topics published by these authors and publications will get more attention and valuable than others. Researchers can assess the potential of these candidate emerging topics. Although the topics we recommend decrease the range of the searching space, these topics may so popular that even all of the impact authors and publications discuss it. The measurement tools or indices are need. But the current methods only focus on the frequency of subjects, and ignore the novelty of subjects which is critical and beyond the frequency study or only focus one of them and without considering the potential of the topics. Some of them only use the curve of published frequency will make the index as a backward one. This research tackles the inadequacy to propose a set of new indices of novelty for emerging topic detection. They are the novelty index (NI) and the published volume index (PVI). These indices are then utilized to determine the detection point (DP) of emerging topics. The detection point (DP) is not the real time which the topic starts to be emerging, but it represents the topic have the highest potential no matter in novelty or hotness for research in its life cycle. Different from the absolute frequent method which can really find the exact emerging period of the topic, the PVI uses the accumulative relative frequency and tries to detect the research potential timing of its life cycle. Following the detection points, the intersection decides the worthiness of a new topic. Readers following the algorithms presented this thesis will be able to decide the novelty and life span of an emerging topic in their field. The novel methods we proposed can improve the limitations of impact factor proposed by ISI. Besides, it uses the impact power of the authors and the publication in a topic to measure the impact power of a paper before it really has been an impact paper can solve the limitations of Google scholar’s approach. We suggest that the topic oriented thinking of our methods can really help the researchers to solve their problems of searching the valuable topics.
en_US
dc.description.tableofcontents Chapter 1 Introduction 1
1.1 Research Background 3
1.2 Research Issue 5
1.2.1 Research Intelligence between Conferences and Journals 5
1.2.2 Detecting Candidate Emerging Research Topics via the Bayesian Estimation of Author-Publication Correlations. 10
1.2.3 Developing the Emerging Topic Detection Indices 11
1.3 Thesis Organization 12
Chapter 2 Literature Review 13
2.1 Topic Detection and Tracking 13
2.2 Emerging Topic Detection 15
2.3 Aging Theory 16
2.4 Information Retrieval Approach 17
2.5 Summary 20
Chapter 3 The Leading Relationship between Conferences and Journals 22
3.1 Experimental Design 22
3.2 Data Selection 25
3.2.1 Select the Domain 25
3.2.2 Use the Keywords to Represent the Domain 25
3.2.3 Choose Databases and Search Engines 26
3.2.4 Pick the Descriptor of the Paper 28
3.3 Datasets Properties 29
3.3.1 Search Conference Papers 29
3.3.2 Search Journal Papers 31
3.4 Application of Information Retrieval 32
3.4.1 Identify Each Document 32
3.4.2 Calculate Frequency of Appearances 33
3.4.3 Summarize the Frequency of Conference Papers and Journal Papers 33
3.4.4 Compute Similarity between Conference Papers and Journal Papers 34
3.5 Relationship between Conference Papers and Journal Papers 35
Chapter 4 The Experimental Results of the Relationship between Conferences and Journals 37
4.1 Experimental Results 37
4.2 Discussion 40
4.3 Summery 43
Chapter 5 The Detection of Impact Research Topics via Bayesian Estimation of Author-Publication Correlations 45
5.1 The Idea of Detecting Impact Research Topics 46
5.2 Measuring an Author’s Impact Power 48
5.2.1 Prior Impact Power of an Author 48
5.2.2 Likelihood Function of the Impact Power of an Author 49
5.2.3 Posterior Impact Power of an Author 50
5.3 Measuring the Impact Power of a Publication 51
5.3.1 Prior Impact Power of a Publication 52
5.3.2 Likelihood Function of the Impact Power of a Publication 52
5.3.3 Posterior Impact Power of a Publication 53
5.4 Measuring the Impact Power of a Paper and a Topic 54
5.4.1 Impact Power of a Paper 55
5.4.2 Impact Power of a Topic 57
Chapter 6 Determination of Impact Research Topics via the Bayesian Estimation of Author-Publication Correlations 59
6.1 Experiment to Validate an Author’s Impact Power 59
6.1.1 Comparing the author’s impact power with previous work 59
6.1.2 Comparing the impact power of authors with the expert survey 63
6.2 Experiment to Validate the Impact Power of Publications 65
6.2.1 Comparing the impact power of publications using the impact factor 66
6.2.2 Comparing the impact power of publications using the publication list of authors recommended in previous work 70
6.2.3 Comparing the impact power of publications with the experts survey 73
6.3 How to find impact research topics using the proposed model 77
Chapter 7 The Indices for Emerging Topic Detection 82
7.1 Novelty of Emerging Topics 82
7.1.1 Term, Candidate Research Topic, Research Topic, Hot Topic and Emerging Topic 82
7.1.2 Novelty Index 83
7.1.3 Published Volume Index 85
7.1.4 Detection Point 87
7.2 Information Produced by the Emerging Topic Detection Indices 89
7.2.1 Year of the Detection Point 89
7.2.2 The Detection Point Value 89
7.3 The Properties of Emerging Topic Detection Indices 90
7.3.1 Novelty Index Properties 90
7.3.2 Published Volume Index Properties 91
7.3.3 Detection Point Properties 92
7.4 The Emerging Topic Detection Table 93
Chapter 8 The Research Experiment of the Development of Emerging Topic Detection Indices 95
8.1 Experimental Design 95
8.1.1 Choose the Field and Data Resource 95
8.1.2 Select the Descriptor 96
8.1.3 Investigate the Extracted Topics 96
8.2 Experimental Results 97
8.3 How to Use the Emerging Topic Detection Table to Predict Whether a Topic Warrant Further Research 99
8.4 Validate the Accuracy and Effectiveness of the Emerging Topic Detection Indices 101
8.5 Discussion 107
8.6 Summary 109
Chapter 9 Conclusions and Future Work 111
9.1 Conclusions 111
9.2 Future Work 113
References 115
Appendix A Questionnaire 121
zh_TW
dc.format.extent 1964480 bytes-
dc.format.extent 1964480 bytes-
dc.format.mimetype application/pdf-
dc.format.mimetype application/pdf-
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0094356509en_US
dc.subject (關鍵詞) 議題的發現與追蹤zh_TW
dc.subject (關鍵詞) 資料探勘zh_TW
dc.subject (關鍵詞) 資訊檢索zh_TW
dc.subject (關鍵詞) 學術智慧zh_TW
dc.subject (關鍵詞) 貝氏估計zh_TW
dc.subject (關鍵詞) 新穎度指標zh_TW
dc.subject (關鍵詞) 發表量指標zh_TW
dc.subject (關鍵詞) 引文分析zh_TW
dc.subject (關鍵詞) Topic discovery and trackingen_US
dc.subject (關鍵詞) data miningen_US
dc.subject (關鍵詞) information retrievalen_US
dc.subject (關鍵詞) Bayesian estimationen_US
dc.subject (關鍵詞) academic intelligenceen_US
dc.subject (關鍵詞) novelty indexen_US
dc.subject (關鍵詞) published volume indexen_US
dc.subject (關鍵詞) citation analysisen_US
dc.title (題名) 資訊檢索之學術智慧zh_TW
dc.title (題名) Research Intelligence Involving Information Retrievalen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) Allan, J., Carbonell, J., Doddington, G., Yamron, J., & Yang, T. (1998). Topic detection and tracking pilot study: Final report. In: Proceedings of the DARPA Broadcast News Transcription an Understanding Workshop.

Allan, J., Papka, R., & Lavrenko, V., (1998). On-line new event detection and tracking. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 37-45.

Aurora, P. P., Rafael, B. L., & Jose, R. S. (2007). Topic discovery based on text mining techniques. Information Processing & Management, 43, pp. 742-768.

Berry, M.W. (2004) Survey of text mining-clustering, classification, and retrieval. Springer, pp. 185-224.
Bolelli, L., Ertekin, S., Zhou, D., & Giles, C. L. (2009). Finding topic trends in digital libraries, In: Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries, pp. 69-72.

Chen, K.Y., Luesukprasert, L., & Chou, S. C. (2007). Hot topic extraction based on timeline analysis and multidimensional sentence modeling. IEEE Transactions on Knowlede and Data Enginerting, 19(8), pp. 1016-1025.

Chou, T. C., & Chen, M. C. (2008). Using incremental plsi for threshold-resilient online event analysis. IEEE Transactions on Knowlede and Data Enginerting, 20(3), pp. 289-299.
Clifton,
C., Cooley, R., & Rennie, J. (2004). Topcat: data mining for topic indentification in a text corpus. IEEE Transactions on Knowlede and Data Enginerting, 16(8), pp. 949-964.

Cui, C., & Kitagawa, H. (2005). Topic activation analysis for document streams based on document arrival rate and relevance. In: Proceedings of the 2005 ACM symposium on applied computing, pp. 1089-1095.

Felix, M. A., Benjamin, V. Q., Zaida, C. R., Elena, C. A., Victor, H. S., Francisco J. M. F. (2005). Domain analysis and information retrieval through the construction of heliocentric maps based on ISI-JCR category cocitation. Information Processing & Management, 41(6), pp. 1521-1533.

Franz, M., & McCarley, J. C. (2001). Unsupervised and supervised clustering for topic tracking. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 310-317.

Hatzivassiloglou, V., Gravano, L., & Maganti, A. (2000). An investigation of linguistic features and clustering algorithms. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 224-231.

Jin, Y., Myaeng, S. H., & Jung, Y. (2007). Use of place information for improved event tracking. Information Processing & Management, 43, pp. 365-378.

Jo, Y., Lagoze, C., & Giles, C. L. (2007). Detecting research topics via the correlation between graphs and texts. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.370-379.

Joachims, T. (1998). Text categorization with Support Vector Machines: learning with many relevant features. In: Proceedings of the EMNLP Conference.

Kollios, G., Gunopulos, D., Koudas, N., & Berchtold, S. (2003). Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Transactionson Knowlede and Data Enginerting, 15(5), pp. 1170-1187.

Kleinberg, J. (2002). Bursty and hierarchical structure in streams. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 91-101.

Kuramochi, M., & Karypis, G. (2004). An efficient algorithm for discovering frequent subgraphs. IEEE Transactionson on Knowlede and Data Enginerting, 16(9), pp. 1038-1051.

Lee, C., Lee, G. G., & J, M. (2007). Dependency structure language model for topic detection and tracking. Information Processing & Management, 43, pp. 1249-1259.

Lee, Z., Gosain, S., & Im, I. (1997). Topics of interest in IS: evolution of themes and differences between research and practice. Information & Management, 36, pp. 233-246.

Liu, Y., Niculescu-Mizil, A., & Gryc, W. (2009). Topic-link LDA: joint models of topic and author community, In :Proceedings of the 26th Annual International Conference on Machine Learning, pp. 665-672.

Malone, J., McGarry, K., & Bowerman, C. (2006). Automated trend analysis of proteomics data using an intelligent data mining architecture, Expert Systems with Applications, 30, pp. 24-33.

Manmatha, R., Feng, A., & Allan, J. (2002). A critical examination of TDT’s cost function. In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 403-404.

Markkonen, J., Ahonen-Myka, H., & Salmenkivi, M. (2004). Simple semantics in topic detection and tracking. Information Retrieval, 7, pp. 347-368.

Morinaga, S., & Yamanishi, K. (2004). Tracking dynamics of topic trends using a finite mixture model. In: Proceedings of the 10th ACM SIGKDD international
conference on Knowledge discovery and data mining, pp.811-816.

Moulinier, I., Raskinis, G., & Ganascia, J. (1996). Text categorization: A symbolic approach. In: Annual Symposium on Document Analysis and information retrieval (SDAIR).

Nallapati, R., Ahmed, A., Xing, E. P., & Cohen, W. W. (2008). Joint latent topic models for text and citations. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 542-550.

Ontrup, J., Ritter, H., Scholz, S. W., & Wagner R. (2008). Detecting, assessing and monitoring relevant topics in virtual information environments. IEEE Transactionson Knowlede and Data Enginerting, 20(7).

Ozmutlu, H. C., & Cavdur, F. (2005). Application of automatic topic identification on excited web search engine data logs. Information Processing & Management, 41, pp. 1243-1262.

Ozmutlu, S. (2006). Automatic new topic identification using multiple linear regression. Information Processing & Management, 42, pp. 934-950.

Porter, M. (1980). An algorithm for suffix stripping. Program (Automated Library and Information Systems), 14(3), pp. 130-137.

Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P., & Steyvers, M. (2010). Learning author-topic models from text corpora, Transactions on Information Systems, 28 (1).

Salton, G. (1989). Automatic text processing: The transformation, analysis and retrieval of information by computer, Addison-Wesley, Reading, MA.

Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), pp. 613-620.
Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), pp. 513-523.

Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. McGraw Hill Publishing Company.

Schultz, J. M., & Liberman, M. (1999). Topic detection and tracking using idf-weighted cosine coefficient. In: Proceedings of the DARPA Broadcast News Transcription an Understanding Workshop.

Schutze, H., Hull, D., & Pedersen, J. (1995). A comparison of classifiers and document representations for the routing problem. In: Proceedings of the 18st annual international ACM SIGIR conference on Research and development in information retrieval, pp.229-237.

Steyvers, M., Smyth, P., & Griffiths, T. (2004). Probabilistic author topic models for information discovery. In: Proceedings of the 10th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 306-315.

Stokes, N., & Carthy, J. (2001). Combining semantic and syntactic document classifiers to improve first story detection. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 424-425.

Swan, R., & Allan, J. (2000). Automatic generation of overview timelines. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 49-56.

Tu, Y. N., & Seng, J. L. (2009). Research Intelligence Involving Information Retrieval – An example of Conferences and Journals, Expert Systems with Applications, 47(6).

Tu, Y. N., & Seng, J. L. (2010). Indices of Novelty for Emerging Topic Detection. (working paper).

Tan, P. N., Steinbach, M. & Kumar, V. (2006). Introduction to data mining. Addison-Wesley, pp. 69-84.

Thelwall, M. (2005). Scientific web intelligence: Finding relationships in university webs, Communications of the ACM, 48(7), pp. 93-96.

Thelwall, M., & Harries, G. (2004). Do better scholars’ Web publications have significantly higher online impact? Journal of the American Society for Information Science and Technology, 55(2), pp. 149-159.

Thelwall, M., Vaughan, L., Cothey, V., Li, X., & Smith, A. (2003). Which academic subjects have most online impact? A pilot study and a new classification process, Online Information Review, 27(5), pp. 333-343.

Tho, Q. T., Hui, S. C., & Fong, A. C. M. (2007). A citation-based document retrieval system for finding research expertise, Information Processing and Management, 43(1), pp. 248-264.

Walls, F., Jin, H., Sista, S., & Schwartz, R. (1999). Topic detection in broadcast news, In: Proceedings of the DARPA Broadcast News Transcription an Understanding Workshop.

Wang, X., Zhai, C., Hu, X., & Sproat, R. (2007). Mining correlated bursty topic patterns from coordinated text streams, In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 784-793.

Wu, K., Chen, M., & Sun, Y. (2004). Automatic topics discovery from hyperlinked documents, Information Processing & Management, 40, pp. 239-255.

Yang, H. C., & Lee, C. H. (2004). A text mining approach on automatic generation of web directories and hierarchies, Expert Systems with Applications, 27, pp. 645-663.

Yang, H. C., & Lee, C. H. (2005). A text mining approach for automatic construction of hypertexts, Expert Systems with Applications, 29, pp. 723-734.

Yang, Y., Ault, T., Pierce T., & Lattimer, C. W. (2000). Improving text categorization methods for event tracking, In: Proceedings of the 23th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 65-72.

Yang, Y. & Pedersen, J. (1997). A comparative study on feature selection in text categorization, In: International Conference on Machine Learning.

Yang, Y. & Wilbur, J. (1996). Using corpus statistics to remove redundant words in text categorization, Journal of the American Society for Information Science, 47(5), pp. 357-369.

Yang, Y., Zhang, J., Carbonell, J., & Jin, Chun. (2002). Topic-conditioned novelty detection, In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.688-693.

Yang, Y., Yoo, S., Zhang, J., & Kisiel, B. (2005). Robustness of adaptive filtering methods in a cross-benchmark evaluation, In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 98-105.

Zhang, Y., Callan, J., & Minka, T. (2002). Novelty and redundancy detection in adaptive filtering, In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 81-88.

Zhang, Y., Surendran, A. C., Platt, J. C., & Narasimhan, M. (2008). Learning from multi-topic web documents for contextual advertisement, In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.1051-1059.
zh_TW