學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 主題分析方法在經濟文獻學上的應用:隱含狄利克雷分配與代理人基計算經濟學
Topic Analysis in the Automatic Organization of Economic Literature: The Case of Agent-Based Computational Economics with the Use of Latent Dirichlet Allocation
作者 胡瑞軒
Hu, Ruei-Xsuan
貢獻者 陳樹衡
Chen, Shu-Heng
胡瑞軒
Hu, Ruei-Xsuan
關鍵詞 代理人基建模
非監督學習
詞彙頻率-逆文檔頻率
文字雲
自然語言處理
主題一致性
主題相似度
Agent-Based Modeling
Unsupervised Learning
TF-IDF
Wordcloud
NLP
Topic coherence
Topic similarity
日期 2022
上傳時間 2-Sep-2022 15:26:59 (UTC+8)
摘要 本文將多個期刊的代理人基建模(Agent-Based Modeling, ABM) 的論文用主題模型中的隱含狄利克雷分配(Latent Dirichlet Allocation, LDA) 進行分類,接著用詞彙頻率-逆文檔頻率(Term Frequency-Inverse Document Frequency, TF-IDF) 與文字雲分別找出與該主題相關卻被過濾掉的詞彙以及主題之間的相同詞彙並且對於每個主題所屬的期刊進行分類並分析主題隨時間的變化。最後,主題相似度、主題排名與主題一致性分析結果顯示每個主題的重疊度不大,主題解釋比例與一致性都很高。本文有別於過往研究,進行多個期刊的分析以及分類之後的評估,主題相似度、主題排名與主題一致性評估方式顯示隱含狄利克雷分配模型能有效地量化具體的方式將文檔分類,且比人為的分類方式降低更多時間成本與資料複雜度。
In this paper, we classify Agent-Based Modeling (ABM) papers in multiple journals with Latent Dirichlet Allocation (LDA) in topic model. By applying analyses of TF-IDF algorithm and word cloud, we recollect words related to the topic but filtered out in the first place and gather same words belonging to different topics. Also, we analyze the dynamics of topics in several journals over time. Finally, the results of topic similarity, topic ranking and topic consistency analysis show that each topic has little overlap, and the topic explanation ratio and consistency are high. Different from previous studies, we classify ABM papers in multiply journals and have further evaluations. The evaluation methods of topic similarity, topic ranking and topic consistency show that the implicit Dirichlet allocation model can effectively quantitatively classify documents. And it reduces more time cost and data complexity than artificial classification.
參考文獻 [1] Ambrosino, A., Cedrini, M., Davis, J. B., Fiori, S. Guerzoni, M., & Nuccio, M. (2018). What topic modeling could reveal about the evolution of economics. Journal of Economic Methodology, 25(4), 329-348.
[2] Alexakis, C., Doolig, M., Eleftheriou, K., & Polemis, M. (2020). Textual Machine Learning: An Application to Computational Economics Research. Computational Economics, 57(1), 369-385.
[3] Blei, D. M., Jordan, M. I, & Ng, A. Y.. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(2003), 993-1022.
[4] Boyd-Graber, J., Hu, Y., & Mimno, D. (2017). Applications of topic models. Foundations and Trends in Information Retrieval, 11(2-3), 143–296.
[5] Hannigan, T. R., Haans, R. F., Vakili, K., Tchalian, H., Glaser, V. L., Wang, M. S., et al. (2019). Topic modeling in management research: rendering new theory from textual data. Academy of Management Annals, 13(2), 586–632.
[6] Hofmann, T. (1999). Probabilistic Latent Semantic Analysis. Proceedings of Conference on Uncertainty in Artificial Intelligence (UAI-99), Stockholm, 289-296.
[7] Huang, A. H., Lehavy, R., Zang, A. Y., & Zheng, R. (2018). Analyst information discovery and interpretation roles: a topic modeling approach. Management Science, 64(6), 2833-2855.
[8] Kao, Y. F., & Venkatachalam, R. (2018). Human and Machine Learning. Computational Economics, 57(4), 889-909.
[9] Kumar, A., & Paul, A. (2016). Mastering Text Mining with R. UK:Packt Publishing Ltd.
[10] Mimno, D., Leenders, M., McCallum, A., Talley, E., & Wallach, H. M. (2011). Optimizing Semantic Coherence in Topic Models. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 262-272.
[11] Newman, D., Lau, J. H., Grieser, K., & Baldwin, T. (2010). Automatic Evaluation of Topic Coherence. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, 100-108.
[12] Papadimitriou, C. H., Raghavan, P., Tamaki, H., & Vempala, S. (1999). Latent Semantic Indexing: A Probabilistic Analysis. Journal of Computer and System Sciences, 61(2), 217-235.
[13] Polyakov, M., Chalak, M., Iftekhar, M. S., Pandit, R., Tapsuwan, S., Zhang, F., & Ma, C. (2017). Authorship, Collaboration, Topics, and Research Gaps in Environmental and Resource Economics 1991–2015. Environmental and Resource Economics volume 71(1), 217-239.
[14] Piepenbrink, A., & Nurmammadov, E. (2015). Topics in the literature of transition economies and emerging markets. Scientometrics, 102(3), 2107-2130.
[15] Tesfatsion, L. (2021). Agent-Based Computational Economics: Overview and Brief History. Working Paper 21004, Department of Economics, Iowa State University.
[16] Tesfatsion, L. (2022, January 1). Agent-Based Computational Economics(ACE). Intro Materials and Research Area Sites. http://www2.econ.iastate.edu/tesfatsi/aapplic.htm
描述 碩士
國立政治大學
經濟學系
109258032
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0109258032
資料類型 thesis
dc.contributor.advisor 陳樹衡zh_TW
dc.contributor.advisor Chen, Shu-Hengen_US
dc.contributor.author (Authors) 胡瑞軒zh_TW
dc.contributor.author (Authors) Hu, Ruei-Xsuanen_US
dc.creator (作者) 胡瑞軒zh_TW
dc.creator (作者) Hu, Ruei-Xsuanen_US
dc.date (日期) 2022en_US
dc.date.accessioned 2-Sep-2022 15:26:59 (UTC+8)-
dc.date.available 2-Sep-2022 15:26:59 (UTC+8)-
dc.date.issued (上傳時間) 2-Sep-2022 15:26:59 (UTC+8)-
dc.identifier (Other Identifiers) G0109258032en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/141743-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 經濟學系zh_TW
dc.description (描述) 109258032zh_TW
dc.description.abstract (摘要) 本文將多個期刊的代理人基建模(Agent-Based Modeling, ABM) 的論文用主題模型中的隱含狄利克雷分配(Latent Dirichlet Allocation, LDA) 進行分類,接著用詞彙頻率-逆文檔頻率(Term Frequency-Inverse Document Frequency, TF-IDF) 與文字雲分別找出與該主題相關卻被過濾掉的詞彙以及主題之間的相同詞彙並且對於每個主題所屬的期刊進行分類並分析主題隨時間的變化。最後,主題相似度、主題排名與主題一致性分析結果顯示每個主題的重疊度不大,主題解釋比例與一致性都很高。本文有別於過往研究,進行多個期刊的分析以及分類之後的評估,主題相似度、主題排名與主題一致性評估方式顯示隱含狄利克雷分配模型能有效地量化具體的方式將文檔分類,且比人為的分類方式降低更多時間成本與資料複雜度。zh_TW
dc.description.abstract (摘要) In this paper, we classify Agent-Based Modeling (ABM) papers in multiple journals with Latent Dirichlet Allocation (LDA) in topic model. By applying analyses of TF-IDF algorithm and word cloud, we recollect words related to the topic but filtered out in the first place and gather same words belonging to different topics. Also, we analyze the dynamics of topics in several journals over time. Finally, the results of topic similarity, topic ranking and topic consistency analysis show that each topic has little overlap, and the topic explanation ratio and consistency are high. Different from previous studies, we classify ABM papers in multiply journals and have further evaluations. The evaluation methods of topic similarity, topic ranking and topic consistency show that the implicit Dirichlet allocation model can effectively quantitatively classify documents. And it reduces more time cost and data complexity than artificial classification.en_US
dc.description.tableofcontents 摘要. . . . . . . . . . . . . . . . . . I
Abstractv. . . . . . . . . . . . . . . II
1 緒論. . . . . . . . . . . . . . . . . 1
2 研究流程. . . . . . . . . . . . . . . 4
3 理論架構. . . . . . . . . . . . . . . 6
3.1 主題建模. . . . . . . . . . . . . 6
3.2 隱含狄利克雷分配的基本概念. . . . . 7
3.3 分類方式與採用理論. . . . . . . . .11
3.3.1 TF-IDF 演算法. . . . . . . . . 12
3.3.2 文字雲. . . . . . . . . . . . 14
3.3.3 餘弦距離. . . . . . . . . . . 14
3.3.4 主題一致性. . . . . . . . . . 15
4 數據與數據分析方法. . . . . . . . . . .16
4.1 數據. . . . . . . . . . . . . . . 16
4.2 數據分析方法 . . . . . . . . . . . 17
5 研究結果. . . . . . . . . . . . . . . .18
5.1 解釋主題. . . . . . . . . . . . . .19
5.2 分析主題類別. . . . . . . . . . . .24
5.3 主題相關詞彙. . . . . . . . . . . .34
5.4 與一般文檔分類相異之處. . . . . . . 38
5.5 TF-IDF 演算法. . . . . . . . . . . 45
5.6 主題隨時間的變化. . . . . . . . . . 52
5.7 使用文字雲來識別主題. . . . . . . . 62
5.8 主題相似程度. . . . . . . . . . . .68
5.9 主題排名. . . . . . . . . . . . . .70
5.10 主題一致性. . . . . . . . . . . . 73
6 結論與建議. . . . . . . . . . . . . . .75
6.1 結論. . . . . . . . . . . . . . . .75
6.2 建議. . . . . . . . . . . . . . . .76
7 參考文獻. . . . . . . . . . . . . . . .77
zh_TW
dc.format.extent 20186703 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0109258032en_US
dc.subject (關鍵詞) 代理人基建模zh_TW
dc.subject (關鍵詞) 非監督學習zh_TW
dc.subject (關鍵詞) 詞彙頻率-逆文檔頻率zh_TW
dc.subject (關鍵詞) 文字雲zh_TW
dc.subject (關鍵詞) 自然語言處理zh_TW
dc.subject (關鍵詞) 主題一致性zh_TW
dc.subject (關鍵詞) 主題相似度zh_TW
dc.subject (關鍵詞) Agent-Based Modelingen_US
dc.subject (關鍵詞) Unsupervised Learningen_US
dc.subject (關鍵詞) TF-IDFen_US
dc.subject (關鍵詞) Wordclouden_US
dc.subject (關鍵詞) NLPen_US
dc.subject (關鍵詞) Topic coherenceen_US
dc.subject (關鍵詞) Topic similarityen_US
dc.title (題名) 主題分析方法在經濟文獻學上的應用:隱含狄利克雷分配與代理人基計算經濟學zh_TW
dc.title (題名) Topic Analysis in the Automatic Organization of Economic Literature: The Case of Agent-Based Computational Economics with the Use of Latent Dirichlet Allocationen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] Ambrosino, A., Cedrini, M., Davis, J. B., Fiori, S. Guerzoni, M., & Nuccio, M. (2018). What topic modeling could reveal about the evolution of economics. Journal of Economic Methodology, 25(4), 329-348.
[2] Alexakis, C., Doolig, M., Eleftheriou, K., & Polemis, M. (2020). Textual Machine Learning: An Application to Computational Economics Research. Computational Economics, 57(1), 369-385.
[3] Blei, D. M., Jordan, M. I, & Ng, A. Y.. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(2003), 993-1022.
[4] Boyd-Graber, J., Hu, Y., & Mimno, D. (2017). Applications of topic models. Foundations and Trends in Information Retrieval, 11(2-3), 143–296.
[5] Hannigan, T. R., Haans, R. F., Vakili, K., Tchalian, H., Glaser, V. L., Wang, M. S., et al. (2019). Topic modeling in management research: rendering new theory from textual data. Academy of Management Annals, 13(2), 586–632.
[6] Hofmann, T. (1999). Probabilistic Latent Semantic Analysis. Proceedings of Conference on Uncertainty in Artificial Intelligence (UAI-99), Stockholm, 289-296.
[7] Huang, A. H., Lehavy, R., Zang, A. Y., & Zheng, R. (2018). Analyst information discovery and interpretation roles: a topic modeling approach. Management Science, 64(6), 2833-2855.
[8] Kao, Y. F., & Venkatachalam, R. (2018). Human and Machine Learning. Computational Economics, 57(4), 889-909.
[9] Kumar, A., & Paul, A. (2016). Mastering Text Mining with R. UK:Packt Publishing Ltd.
[10] Mimno, D., Leenders, M., McCallum, A., Talley, E., & Wallach, H. M. (2011). Optimizing Semantic Coherence in Topic Models. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 262-272.
[11] Newman, D., Lau, J. H., Grieser, K., & Baldwin, T. (2010). Automatic Evaluation of Topic Coherence. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, 100-108.
[12] Papadimitriou, C. H., Raghavan, P., Tamaki, H., & Vempala, S. (1999). Latent Semantic Indexing: A Probabilistic Analysis. Journal of Computer and System Sciences, 61(2), 217-235.
[13] Polyakov, M., Chalak, M., Iftekhar, M. S., Pandit, R., Tapsuwan, S., Zhang, F., & Ma, C. (2017). Authorship, Collaboration, Topics, and Research Gaps in Environmental and Resource Economics 1991–2015. Environmental and Resource Economics volume 71(1), 217-239.
[14] Piepenbrink, A., & Nurmammadov, E. (2015). Topics in the literature of transition economies and emerging markets. Scientometrics, 102(3), 2107-2130.
[15] Tesfatsion, L. (2021). Agent-Based Computational Economics: Overview and Brief History. Working Paper 21004, Department of Economics, Iowa State University.
[16] Tesfatsion, L. (2022, January 1). Agent-Based Computational Economics(ACE). Intro Materials and Research Area Sites. http://www2.econ.iastate.edu/tesfatsi/aapplic.htm
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202201265en_US