應用主題建模技術探討數位媒體經營策略

Publications-Theses

Article View/Open

Publication Export

Google Scholar^TM

題名	應用主題建模技術探討數位媒體經營策略 Exploring digital media management strategies using topic modeling techniques
作者	賴冠州 Lai, Kuan-Chou
貢獻者	鄭宇庭 Cheng, Yu-Ting 賴冠州 Lai, Kuan-Chou
關鍵詞	數位媒體自然語言處理文章分群主題模型資料降維 Digital media Natural language processing Document clustering Topic modeling Dimensionality reduction
日期	2023
上傳時間	6-Jul-2023 15:19:12 (UTC+8)
摘要	隨著現代科技的進步與普及，越來越多人開始依賴網路來取得所需資訊，這也改變了人們獲取資訊的方式。在這個資訊遍佈的時代，瞭解資訊的結構、內容以及主題成分變得非常重要。本研究旨在運用 LDA 主題模型，針對數位媒體過去 2018 至 2022 年共約 56.3 萬篇文章進行分析，以期瞭解文章的主題成分表徵和各主題分布等洞察，進而探討主題模型在經營上的應用與意涵。研究發現，在使用 LDA 主題模型的過程中，詞彙表的大小會直接影響模型的成效。詞彙表越大，模型的成效就越差。因此，最佳的詞彙表大小為 1000。此外，經過實驗得知，主題數的選擇也是非常關鍵的，最佳的主題數介於 20 至 30 之間。總結來說，選擇 1000 大小的詞彙表和 20 個主題數，可以有效地進行主題建模任務。另一方面，原文章類別能提供的資訊有限，沒辦法進行有效的文章成效分析。相比之下，LDA 模型不僅能夠捕捉更細緻地文章主題成分，這些主題資訊更真實地反映出經營策略和社會脈動的轉變。在經營策略上，數位媒體可以利用 LDA 模型提供的資訊做出更明智的決策，進而提升讀者的閱讀體驗。值得注意的是，研究結果顯示，平均每篇文章瀏覽數最好的前三名主題分別為娛樂、家庭和台灣國際關係，而這些面向的商業洞察是過往無法得到的。這些發現對於數位媒體的經營策略提供了非常有價值的決策依據。最後，LDA 模型不僅提供了許多應用情境的可能性，包括延伸閱讀推薦、文章檢索系統等，還可以進一步結合訪客瀏覽行為資料，進行受眾主題偏好分析、相似受眾搜尋、個人化推薦和精準廣告投放等，提升數位媒體營運效率。 With the advancement and popularization of modern technology, more and more people are relying on the internet to obtain the information they need. In this era of abundant information, it has become very important to understand the structure, content, and thematic components of information. This study aims to use topic modeling techniques to analyze a total of approximately 563,000 articles from digital media published from 2018 to 2022, in order to gain insights into the representation of thematic components and the distribution of each topic in the articles, and to explore the applications and implications of topic modeling in business. The study found that selecting a vocabulary size of 1000 and a number of topics of 20 can effectively perform the task of topic modeling. On the other hand, the LDA model can not only capture the topics of articles, but also analyze the thematic proportions of articles in more detail, reflecting the changes in business strategies and social trends. In terms of business strategy, digital media can use the information provided by the LDA model to make more informed decisions and enhance readers` reading experience. It is worth noting that the study results show that the top three topics with the best average number of page views are entertainment, family, and Taiwan`s international relations. These findings provide valuable decision-making basis for the business strategies of digital media. Finally, the LDA model provides many possibilities for applications, including recommender systems, article retrieval systems, audience thematic preference analysis, etc., enhancing the operational efficiency of digital media.
參考文獻	英文文獻 Angelov, D. (2020). Top2vec: Distributed representations of topics. arXiv preprint arXiv:2008.09470. Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84. Blei, D. M., & Jordan, M. I. (2004). Variational methods for the Dirichlet process. Proceedings of the twenty-first international conference on Machine learning, Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518), 859-877. Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. Proceedings of the 23rd international conference on Machine learning, Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993-1022. Broderick, T., Boyd, N., Wibisono, A., Wilson, A. C., & Jordan, M. I. (2013). Streaming variational bayes. advances in neural information processing systems, 26. Chen, X., Hu, X., Shen, X., & Rosen, G. (2010). Probabilistic topic modeling for genomic data interpretation. 2010 IEEE international conference on bioinformatics and biomedicine (BIBM), Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259. Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. GitHub. (2017). Stop Words. GitHub. https://github.com/goto456/stopwords. Graves, A., Jaitly, N., & Mohamed, A.-r. (2013). Hybrid speech recognition with deep bidirectional LSTM. 2013 IEEE workshop on automatic speech recognition and understanding, Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl_1), 5228-5235. Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780. Hoffman, M., Bach, F., & Blei, D. (2010). Online learning for latent dirichlet allocation. advances in neural information processing systems, 23. Hoffman, M. D., Blei, D. M., Wang, C., & Paisley, J. (2013). Stochastic variational inference. Journal of Machine Learning Research. Huang, Z., Xu, W., & Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint rXiv:1508.01991. Konietzny, S. G., Dietz, L., & McHardy, A. C. (2011). Inferring functional modules of protein families with probabilistic topic models. BMC bioinformatics, 12, 1-14. Li, P.-H., & Ma, W. (2019). CkipTagger. GitHub. https://github.com/ckiplab/ckiptagger. Liu, B., Liu, L., Tsykin, A., Goodall, G. J., Green, J. E., Zhu, M., Kim, C. H., & Li, J. (2010). Identifying functional miRNA–mRNA regulatory modules with correspondence latent dirichlet allocation. Bioinformatics, 26(24), 3105-3111. Liu, C., Jin, T., Hoi, S. C., Zhao, P., & Sun, J. (2017). Collaborative topic regression for online recommender systems: an online and Bayesian approach. Machine Learning, 106, 651-670. McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Moody, C. E. (2016). Mixing dirichlet topic models and word embeddings to make lda2vec. arXiv preprint arXiv:1605.02019. Olah, C. (2015). Understanding lstm networks. https://colah.github.io/posts/2015-08-Understanding-LSTMs/ Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., & Dubourg, V. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830. Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., & Welling, M. (2008). Fast collapsed gibbs sampling for latent dirichlet allocation. Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE 43 transactions on Signal Processing, 45(11), 2673-2681. Siami-Namini, S., Tavakoli, N., & Namin, A. S. (2019). The performance of LSTM and BiLSTM in forecasting time series. 2019 IEEE International Conference on Big Data (Big Data), Teh, Y., Jordan, M., Beal, M., & Blei, D. (2004). Sharing clusters among related groups: Hierarchical Dirichlet processes. advances in neural information processing systems, 17. Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11). Wang, C., & Blei, D. M. (2011). Collaborative topic modeling for recommending scientific articles. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, Wang, C., Paisley, J., & Blei, D. M. (2011). Online variational inference for the hierarchical Dirichlet process. Proceedings of the fourteenth international conference on artificial intelligence and statistics, Wang, H., Wang, N., & Yeung, D.-Y. (2015). Collaborative deep learning for recommender systems. Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, Wattenberg, M., Viégas, F., & Johnson, I. (2016). How to use t-SNE effectively. Distill, 1(10), e2. https://distill.pub/2016/misread-tsne/ Yang, M., & Ma, W. (2022). CkipTransformer. GitHub. https://github.com/ckiplab/ckip-transformers. 中文文獻台灣數位媒體應用暨行銷協會. (2022). 2021 台灣數位廣告統計報告. https://www.magazine.org.tw/uploads/editors/hide_article_list/165543710352.pdf 資誠聯合會計師事務所. (2022). 2022-2026 台灣娛樂暨媒體業展望. https://www.pwc.tw/zh/publications/topic-report/assets/taiwan-entertainment- and-media-outlook-2022-2026.pdf
描述	碩士國立政治大學企業管理研究所(MBA學位學程) 106363079
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0106363079
資料類型	thesis

dc.contributor.advisor	鄭宇庭	zh_TW
dc.contributor.advisor	Cheng, Yu-Ting	en_US
dc.contributor.author (Authors)	賴冠州	zh_TW
dc.contributor.author (Authors)	Lai, Kuan-Chou	en_US
dc.creator (作者)	賴冠州	zh_TW
dc.creator (作者)	Lai, Kuan-Chou	en_US
dc.date (日期)	2023	en_US
dc.date.accessioned	6-Jul-2023 15:19:12 (UTC+8)	-
dc.date.available	6-Jul-2023 15:19:12 (UTC+8)	-
dc.date.issued (上傳時間)	6-Jul-2023 15:19:12 (UTC+8)	-
dc.identifier (Other Identifiers)	G0106363079	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/145717	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	企業管理研究所(MBA學位學程)	zh_TW
dc.description (描述)	106363079	zh_TW
dc.description.abstract (摘要)	隨著現代科技的進步與普及，越來越多人開始依賴網路來取得所需資訊，這也改變了人們獲取資訊的方式。在這個資訊遍佈的時代，瞭解資訊的結構、內容以及主題成分變得非常重要。本研究旨在運用 LDA 主題模型，針對數位媒體過去 2018 至 2022 年共約 56.3 萬篇文章進行分析，以期瞭解文章的主題成分表徵和各主題分布等洞察，進而探討主題模型在經營上的應用與意涵。研究發現，在使用 LDA 主題模型的過程中，詞彙表的大小會直接影響模型的成效。詞彙表越大，模型的成效就越差。因此，最佳的詞彙表大小為 1000。此外，經過實驗得知，主題數的選擇也是非常關鍵的，最佳的主題數介於 20 至 30 之間。總結來說，選擇 1000 大小的詞彙表和 20 個主題數，可以有效地進行主題建模任務。另一方面，原文章類別能提供的資訊有限，沒辦法進行有效的文章成效分析。相比之下，LDA 模型不僅能夠捕捉更細緻地文章主題成分，這些主題資訊更真實地反映出經營策略和社會脈動的轉變。在經營策略上，數位媒體可以利用 LDA 模型提供的資訊做出更明智的決策，進而提升讀者的閱讀體驗。值得注意的是，研究結果顯示，平均每篇文章瀏覽數最好的前三名主題分別為娛樂、家庭和台灣國際關係，而這些面向的商業洞察是過往無法得到的。這些發現對於數位媒體的經營策略提供了非常有價值的決策依據。最後，LDA 模型不僅提供了許多應用情境的可能性，包括延伸閱讀推薦、文章檢索系統等，還可以進一步結合訪客瀏覽行為資料，進行受眾主題偏好分析、相似受眾搜尋、個人化推薦和精準廣告投放等，提升數位媒體營運效率。	zh_TW
dc.description.abstract (摘要)	With the advancement and popularization of modern technology, more and more people are relying on the internet to obtain the information they need. In this era of abundant information, it has become very important to understand the structure, content, and thematic components of information. This study aims to use topic modeling techniques to analyze a total of approximately 563,000 articles from digital media published from 2018 to 2022, in order to gain insights into the representation of thematic components and the distribution of each topic in the articles, and to explore the applications and implications of topic modeling in business. The study found that selecting a vocabulary size of 1000 and a number of topics of 20 can effectively perform the task of topic modeling. On the other hand, the LDA model can not only capture the topics of articles, but also analyze the thematic proportions of articles in more detail, reflecting the changes in business strategies and social trends. In terms of business strategy, digital media can use the information provided by the LDA model to make more informed decisions and enhance readers` reading experience. It is worth noting that the study results show that the top three topics with the best average number of page views are entertainment, family, and Taiwan`s international relations. These findings provide valuable decision-making basis for the business strategies of digital media. Finally, the LDA model provides many possibilities for applications, including recommender systems, article retrieval systems, audience thematic preference analysis, etc., enhancing the operational efficiency of digital media.	en_US
dc.description.tableofcontents	第一章緒論 1 第一節研究背景與動機 1 第二節研究目的及問題 3 第三節研究流程 4 第二章文獻回顧與探討 5 第一節主題模型 5 一、LDA 5 二、貝氏推論 6 三、實際應用 9 第二節循環神經網路 11 一、RNN 11 二、LSTM 12 三、其它改良方法 13 第三節資料降維 14 一、t-SNE 14 二、UMAP 15 三、比較t-SNE和UMAP 16 第三章研究方法 17 第一節研究資料 17 第二節研究架構 18 第三節分析工具 20 第四章研究分析 21 第一節文字前處理 21 一、文章斷詞 21 二、詞彙表建立 21 第二節模型訓練 23 第三節文章主題探討 25 第四節經營策略探討 31 一、以「類別」為視角 31 二、以「主題」為視角 34 三、綜合比較 36 第五章結論 39 第一節研究發現 39 第二節研究貢獻 39 第三節研究限制 40 第四節研究建議 40 第六章參考文獻 42	zh_TW
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0106363079	en_US
dc.subject (關鍵詞)	數位媒體	zh_TW
dc.subject (關鍵詞)	自然語言處理	zh_TW
dc.subject (關鍵詞)	文章分群	zh_TW
dc.subject (關鍵詞)	主題模型	zh_TW
dc.subject (關鍵詞)	資料降維	zh_TW
dc.subject (關鍵詞)	Digital media	en_US
dc.subject (關鍵詞)	Natural language processing	en_US
dc.subject (關鍵詞)	Document clustering	en_US
dc.subject (關鍵詞)	Topic modeling	en_US
dc.subject (關鍵詞)	Dimensionality reduction	en_US
dc.title (題名)	應用主題建模技術探討數位媒體經營策略	zh_TW
dc.title (題名)	Exploring digital media management strategies using topic modeling techniques	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	英文文獻 Angelov, D. (2020). Top2vec: Distributed representations of topics. arXiv preprint arXiv:2008.09470. Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84. Blei, D. M., & Jordan, M. I. (2004). Variational methods for the Dirichlet process. Proceedings of the twenty-first international conference on Machine learning, Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518), 859-877. Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. Proceedings of the 23rd international conference on Machine learning, Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993-1022. Broderick, T., Boyd, N., Wibisono, A., Wilson, A. C., & Jordan, M. I. (2013). Streaming variational bayes. advances in neural information processing systems, 26. Chen, X., Hu, X., Shen, X., & Rosen, G. (2010). Probabilistic topic modeling for genomic data interpretation. 2010 IEEE international conference on bioinformatics and biomedicine (BIBM), Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259. Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. GitHub. (2017). Stop Words. GitHub. https://github.com/goto456/stopwords. Graves, A., Jaitly, N., & Mohamed, A.-r. (2013). Hybrid speech recognition with deep bidirectional LSTM. 2013 IEEE workshop on automatic speech recognition and understanding, Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl_1), 5228-5235. Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780. Hoffman, M., Bach, F., & Blei, D. (2010). Online learning for latent dirichlet allocation. advances in neural information processing systems, 23. Hoffman, M. D., Blei, D. M., Wang, C., & Paisley, J. (2013). Stochastic variational inference. Journal of Machine Learning Research. Huang, Z., Xu, W., & Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint rXiv:1508.01991. Konietzny, S. G., Dietz, L., & McHardy, A. C. (2011). Inferring functional modules of protein families with probabilistic topic models. BMC bioinformatics, 12, 1-14. Li, P.-H., & Ma, W. (2019). CkipTagger. GitHub. https://github.com/ckiplab/ckiptagger. Liu, B., Liu, L., Tsykin, A., Goodall, G. J., Green, J. E., Zhu, M., Kim, C. H., & Li, J. (2010). Identifying functional miRNA–mRNA regulatory modules with correspondence latent dirichlet allocation. Bioinformatics, 26(24), 3105-3111. Liu, C., Jin, T., Hoi, S. C., Zhao, P., & Sun, J. (2017). Collaborative topic regression for online recommender systems: an online and Bayesian approach. Machine Learning, 106, 651-670. McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Moody, C. E. (2016). Mixing dirichlet topic models and word embeddings to make lda2vec. arXiv preprint arXiv:1605.02019. Olah, C. (2015). Understanding lstm networks. https://colah.github.io/posts/2015-08-Understanding-LSTMs/ Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., & Dubourg, V. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830. Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., & Welling, M. (2008). Fast collapsed gibbs sampling for latent dirichlet allocation. Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE 43 transactions on Signal Processing, 45(11), 2673-2681. Siami-Namini, S., Tavakoli, N., & Namin, A. S. (2019). The performance of LSTM and BiLSTM in forecasting time series. 2019 IEEE International Conference on Big Data (Big Data), Teh, Y., Jordan, M., Beal, M., & Blei, D. (2004). Sharing clusters among related groups: Hierarchical Dirichlet processes. advances in neural information processing systems, 17. Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11). Wang, C., & Blei, D. M. (2011). Collaborative topic modeling for recommending scientific articles. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, Wang, C., Paisley, J., & Blei, D. M. (2011). Online variational inference for the hierarchical Dirichlet process. Proceedings of the fourteenth international conference on artificial intelligence and statistics, Wang, H., Wang, N., & Yeung, D.-Y. (2015). Collaborative deep learning for recommender systems. Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, Wattenberg, M., Viégas, F., & Johnson, I. (2016). How to use t-SNE effectively. Distill, 1(10), e2. https://distill.pub/2016/misread-tsne/ Yang, M., & Ma, W. (2022). CkipTransformer. GitHub. https://github.com/ckiplab/ckip-transformers. 中文文獻台灣數位媒體應用暨行銷協會. (2022). 2021 台灣數位廣告統計報告. https://www.magazine.org.tw/uploads/editors/hide_article_list/165543710352.pdf 資誠聯合會計師事務所. (2022). 2022-2026 台灣娛樂暨媒體業展望. https://www.pwc.tw/zh/publications/topic-report/assets/taiwan-entertainment- and-media-outlook-2022-2026.pdf	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM