學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 應用主題建模技術探討數位媒體經營策略
Exploring digital media management strategies using topic modeling techniques
作者 賴冠州
Lai, Kuan-Chou
貢獻者 鄭宇庭
Cheng, Yu-Ting
賴冠州
Lai, Kuan-Chou
關鍵詞 數位媒體
自然語言處理
文章分群
主題模型
資料降維
Digital media
Natural language processing
Document clustering
Topic modeling
Dimensionality reduction
日期 2023
上傳時間 6-Jul-2023 15:19:12 (UTC+8)
摘要 隨著現代科技的進步與普及,越來越多人開始依賴網路來取得所需資訊,這 也改變了人們獲取資訊的方式。在這個資訊遍佈的時代,瞭解資訊的結構、內容 以及主題成分變得非常重要。本研究旨在運用 LDA 主題模型,針對數位媒體過 去 2018 至 2022 年共約 56.3 萬篇文章進行分析,以期瞭解文章的主題成分表徵 和各主題分布等洞察,進而探討主題模型在經營上的應用與意涵。

研究發現,在使用 LDA 主題模型的過程中,詞彙表的大小會直接影響模型 的成效。詞彙表越大,模型的成效就越差。因此,最佳的詞彙表大小為 1000。此 外,經過實驗得知,主題數的選擇也是非常關鍵的,最佳的主題數介於 20 至 30 之間。總結來說,選擇 1000 大小的詞彙表和 20 個主題數,可以有效地進行主題 建模任務。

另一方面,原文章類別能提供的資訊有限,沒辦法進行有效的文章成效分析。 相比之下,LDA 模型不僅能夠捕捉更細緻地文章主題成分,這些主題資訊更真 實地反映出經營策略和社會脈動的轉變。在經營策略上,數位媒體可以利用 LDA 模型提供的資訊做出更明智的決策,進而提升讀者的閱讀體驗。值得注意的是, 研究結果顯示,平均每篇文章瀏覽數最好的前三名主題分別為娛樂、家庭和台灣 國際關係,而這些面向的商業洞察是過往無法得到的。這些發現對於數位媒體的 經營策略提供了非常有價值的決策依據。

最後,LDA 模型不僅提供了許多應用情境的可能性,包括延伸閱讀推薦、文 章檢索系統等,還可以進一步結合訪客瀏覽行為資料,進行受眾主題偏好分析、 相似受眾搜尋、個人化推薦和精準廣告投放等,提升數位媒體營運效率。
With the advancement and popularization of modern technology, more and more people are relying on the internet to obtain the information they need. In this era of abundant information, it has become very important to understand the structure, content, and thematic components of information. This study aims to use topic modeling techniques to analyze a total of approximately 563,000 articles from digital media published from 2018 to 2022, in order to gain insights into the representation of thematic components and the distribution of each topic in the articles, and to explore the applications and implications of topic modeling in business.

The study found that selecting a vocabulary size of 1000 and a number of topics of 20 can effectively perform the task of topic modeling. On the other hand, the LDA model can not only capture the topics of articles, but also analyze the thematic proportions of articles in more detail, reflecting the changes in business strategies and social trends. In terms of business strategy, digital media can use the information provided by the LDA model to make more informed decisions and enhance readers` reading experience. It is worth noting that the study results show that the top three topics with the best average number of page views are entertainment, family, and Taiwan`s international relations. These findings provide valuable decision-making basis for the business strategies of digital media.

Finally, the LDA model provides many possibilities for applications, including recommender systems, article retrieval systems, audience thematic preference analysis, etc., enhancing the operational efficiency of digital media.
參考文獻 英文文獻
Angelov, D. (2020). Top2vec: Distributed representations of topics. arXiv preprint arXiv:2008.09470.
Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84.
Blei, D. M., & Jordan, M. I. (2004). Variational methods for the Dirichlet process. Proceedings of the twenty-first international conference on Machine learning,
Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518), 859-877.
Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. Proceedings of the 23rd international conference on Machine learning,
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993-1022.
Broderick, T., Boyd, N., Wibisono, A., Wilson, A. C., & Jordan, M. I. (2013). Streaming variational bayes. advances in neural information processing systems, 26.
Chen, X., Hu, X., Shen, X., & Rosen, G. (2010). Probabilistic topic modeling for genomic data interpretation. 2010 IEEE international conference on bioinformatics and biomedicine (BIBM),
Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259.
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
GitHub. (2017). Stop Words. GitHub. https://github.com/goto456/stopwords.
Graves, A., Jaitly, N., & Mohamed, A.-r. (2013). Hybrid speech recognition with deep
bidirectional LSTM. 2013 IEEE workshop on automatic speech recognition
and understanding,
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the
National academy of Sciences, 101(suppl_1), 5228-5235.
Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
Hoffman, M., Bach, F., & Blei, D. (2010). Online learning for latent dirichlet
allocation. advances in neural information processing systems, 23. Hoffman, M. D., Blei, D. M., Wang, C., & Paisley, J. (2013). Stochastic variational
inference. Journal of Machine Learning Research.
Huang, Z., Xu, W., & Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint rXiv:1508.01991.
Konietzny, S. G., Dietz, L., & McHardy, A. C. (2011). Inferring functional modules of protein families with probabilistic topic models. BMC bioinformatics, 12, 1-14.
Li, P.-H., & Ma, W. (2019). CkipTagger. GitHub.
https://github.com/ckiplab/ckiptagger.
Liu, B., Liu, L., Tsykin, A., Goodall, G. J., Green, J. E., Zhu, M., Kim, C. H., & Li, J. (2010). Identifying functional miRNA–mRNA regulatory modules with correspondence latent dirichlet allocation. Bioinformatics, 26(24), 3105-3111. Liu, C., Jin, T., Hoi, S. C., Zhao, P., & Sun, J. (2017). Collaborative topic regression for online recommender systems: an online and Bayesian approach. Machine Learning, 106, 651-670.
McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Moody, C. E. (2016). Mixing dirichlet topic models and word embeddings to make lda2vec. arXiv preprint arXiv:1605.02019.
Olah, C. (2015). Understanding lstm networks. https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., & Dubourg, V. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830.
Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., & Welling, M. (2008). Fast collapsed gibbs sampling for latent dirichlet allocation. Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining,
Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE 43
transactions on Signal Processing, 45(11), 2673-2681.
Siami-Namini, S., Tavakoli, N., & Namin, A. S. (2019). The performance of LSTM and BiLSTM in forecasting time series. 2019 IEEE International Conference on Big Data (Big Data),
Teh, Y., Jordan, M., Beal, M., & Blei, D. (2004). Sharing clusters among related groups: Hierarchical Dirichlet processes. advances in neural information processing systems, 17.
Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11).
Wang, C., & Blei, D. M. (2011). Collaborative topic modeling for recommending scientific articles. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining,
Wang, C., Paisley, J., & Blei, D. M. (2011). Online variational inference for the hierarchical Dirichlet process. Proceedings of the fourteenth international
conference on artificial intelligence and statistics,
Wang, H., Wang, N., & Yeung, D.-Y. (2015). Collaborative deep learning for recommender systems. Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining,
Wattenberg, M., Viégas, F., & Johnson, I. (2016). How to use t-SNE effectively.
Distill, 1(10), e2. https://distill.pub/2016/misread-tsne/ Yang, M., & Ma, W. (2022). CkipTransformer. GitHub.
https://github.com/ckiplab/ckip-transformers.

中文文獻
台灣數位媒體應用暨行銷協會. (2022). 2021 台灣數位廣告統計報告.
https://www.magazine.org.tw/uploads/editors/hide_article_list/165543710352.pdf
資誠聯合會計師事務所. (2022). 2022-2026 台灣娛樂暨媒體業展望.
https://www.pwc.tw/zh/publications/topic-report/assets/taiwan-entertainment- and-media-outlook-2022-2026.pdf
描述 碩士
國立政治大學
企業管理研究所(MBA學位學程)
106363079
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0106363079
資料類型 thesis
dc.contributor.advisor 鄭宇庭zh_TW
dc.contributor.advisor Cheng, Yu-Tingen_US
dc.contributor.author (Authors) 賴冠州zh_TW
dc.contributor.author (Authors) Lai, Kuan-Chouen_US
dc.creator (作者) 賴冠州zh_TW
dc.creator (作者) Lai, Kuan-Chouen_US
dc.date (日期) 2023en_US
dc.date.accessioned 6-Jul-2023 15:19:12 (UTC+8)-
dc.date.available 6-Jul-2023 15:19:12 (UTC+8)-
dc.date.issued (上傳時間) 6-Jul-2023 15:19:12 (UTC+8)-
dc.identifier (Other Identifiers) G0106363079en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/145717-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 企業管理研究所(MBA學位學程)zh_TW
dc.description (描述) 106363079zh_TW
dc.description.abstract (摘要) 隨著現代科技的進步與普及,越來越多人開始依賴網路來取得所需資訊,這 也改變了人們獲取資訊的方式。在這個資訊遍佈的時代,瞭解資訊的結構、內容 以及主題成分變得非常重要。本研究旨在運用 LDA 主題模型,針對數位媒體過 去 2018 至 2022 年共約 56.3 萬篇文章進行分析,以期瞭解文章的主題成分表徵 和各主題分布等洞察,進而探討主題模型在經營上的應用與意涵。

研究發現,在使用 LDA 主題模型的過程中,詞彙表的大小會直接影響模型 的成效。詞彙表越大,模型的成效就越差。因此,最佳的詞彙表大小為 1000。此 外,經過實驗得知,主題數的選擇也是非常關鍵的,最佳的主題數介於 20 至 30 之間。總結來說,選擇 1000 大小的詞彙表和 20 個主題數,可以有效地進行主題 建模任務。

另一方面,原文章類別能提供的資訊有限,沒辦法進行有效的文章成效分析。 相比之下,LDA 模型不僅能夠捕捉更細緻地文章主題成分,這些主題資訊更真 實地反映出經營策略和社會脈動的轉變。在經營策略上,數位媒體可以利用 LDA 模型提供的資訊做出更明智的決策,進而提升讀者的閱讀體驗。值得注意的是, 研究結果顯示,平均每篇文章瀏覽數最好的前三名主題分別為娛樂、家庭和台灣 國際關係,而這些面向的商業洞察是過往無法得到的。這些發現對於數位媒體的 經營策略提供了非常有價值的決策依據。

最後,LDA 模型不僅提供了許多應用情境的可能性,包括延伸閱讀推薦、文 章檢索系統等,還可以進一步結合訪客瀏覽行為資料,進行受眾主題偏好分析、 相似受眾搜尋、個人化推薦和精準廣告投放等,提升數位媒體營運效率。
zh_TW
dc.description.abstract (摘要) With the advancement and popularization of modern technology, more and more people are relying on the internet to obtain the information they need. In this era of abundant information, it has become very important to understand the structure, content, and thematic components of information. This study aims to use topic modeling techniques to analyze a total of approximately 563,000 articles from digital media published from 2018 to 2022, in order to gain insights into the representation of thematic components and the distribution of each topic in the articles, and to explore the applications and implications of topic modeling in business.

The study found that selecting a vocabulary size of 1000 and a number of topics of 20 can effectively perform the task of topic modeling. On the other hand, the LDA model can not only capture the topics of articles, but also analyze the thematic proportions of articles in more detail, reflecting the changes in business strategies and social trends. In terms of business strategy, digital media can use the information provided by the LDA model to make more informed decisions and enhance readers` reading experience. It is worth noting that the study results show that the top three topics with the best average number of page views are entertainment, family, and Taiwan`s international relations. These findings provide valuable decision-making basis for the business strategies of digital media.

Finally, the LDA model provides many possibilities for applications, including recommender systems, article retrieval systems, audience thematic preference analysis, etc., enhancing the operational efficiency of digital media.
en_US
dc.description.tableofcontents 第一章 緒論 1
第一節 研究背景與動機 1
第二節 研究目的及問題 3
第三節 研究流程 4
第二章 文獻回顧與探討 5
第一節 主題模型 5
一、LDA 5
二、貝氏推論 6
三、實際應用 9
第二節 循環神經網路 11
一、RNN 11
二、LSTM 12
三、其它改良方法 13
第三節 資料降維 14
一、t-SNE 14
二、UMAP 15
三、比較t-SNE和UMAP 16
第三章 研究方法 17
第一節 研究資料 17
第二節 研究架構 18
第三節 分析工具 20
第四章 研究分析 21
第一節 文字前處理 21
一、文章斷詞 21
二、詞彙表建立 21
第二節 模型訓練 23
第三節 文章主題探討 25
第四節 經營策略探討 31
一、以「類別」為視角 31
二、以「主題」為視角 34
三、綜合比較 36
第五章 結論 39
第一節 研究發現 39
第二節 研究貢獻 39
第三節 研究限制 40
第四節 研究建議 40
第六章 參考文獻 42
zh_TW
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0106363079en_US
dc.subject (關鍵詞) 數位媒體zh_TW
dc.subject (關鍵詞) 自然語言處理zh_TW
dc.subject (關鍵詞) 文章分群zh_TW
dc.subject (關鍵詞) 主題模型zh_TW
dc.subject (關鍵詞) 資料降維zh_TW
dc.subject (關鍵詞) Digital mediaen_US
dc.subject (關鍵詞) Natural language processingen_US
dc.subject (關鍵詞) Document clusteringen_US
dc.subject (關鍵詞) Topic modelingen_US
dc.subject (關鍵詞) Dimensionality reductionen_US
dc.title (題名) 應用主題建模技術探討數位媒體經營策略zh_TW
dc.title (題名) Exploring digital media management strategies using topic modeling techniquesen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) 英文文獻
Angelov, D. (2020). Top2vec: Distributed representations of topics. arXiv preprint arXiv:2008.09470.
Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84.
Blei, D. M., & Jordan, M. I. (2004). Variational methods for the Dirichlet process. Proceedings of the twenty-first international conference on Machine learning,
Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518), 859-877.
Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. Proceedings of the 23rd international conference on Machine learning,
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993-1022.
Broderick, T., Boyd, N., Wibisono, A., Wilson, A. C., & Jordan, M. I. (2013). Streaming variational bayes. advances in neural information processing systems, 26.
Chen, X., Hu, X., Shen, X., & Rosen, G. (2010). Probabilistic topic modeling for genomic data interpretation. 2010 IEEE international conference on bioinformatics and biomedicine (BIBM),
Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259.
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
GitHub. (2017). Stop Words. GitHub. https://github.com/goto456/stopwords.
Graves, A., Jaitly, N., & Mohamed, A.-r. (2013). Hybrid speech recognition with deep
bidirectional LSTM. 2013 IEEE workshop on automatic speech recognition
and understanding,
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the
National academy of Sciences, 101(suppl_1), 5228-5235.
Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
Hoffman, M., Bach, F., & Blei, D. (2010). Online learning for latent dirichlet
allocation. advances in neural information processing systems, 23. Hoffman, M. D., Blei, D. M., Wang, C., & Paisley, J. (2013). Stochastic variational
inference. Journal of Machine Learning Research.
Huang, Z., Xu, W., & Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint rXiv:1508.01991.
Konietzny, S. G., Dietz, L., & McHardy, A. C. (2011). Inferring functional modules of protein families with probabilistic topic models. BMC bioinformatics, 12, 1-14.
Li, P.-H., & Ma, W. (2019). CkipTagger. GitHub.
https://github.com/ckiplab/ckiptagger.
Liu, B., Liu, L., Tsykin, A., Goodall, G. J., Green, J. E., Zhu, M., Kim, C. H., & Li, J. (2010). Identifying functional miRNA–mRNA regulatory modules with correspondence latent dirichlet allocation. Bioinformatics, 26(24), 3105-3111. Liu, C., Jin, T., Hoi, S. C., Zhao, P., & Sun, J. (2017). Collaborative topic regression for online recommender systems: an online and Bayesian approach. Machine Learning, 106, 651-670.
McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Moody, C. E. (2016). Mixing dirichlet topic models and word embeddings to make lda2vec. arXiv preprint arXiv:1605.02019.
Olah, C. (2015). Understanding lstm networks. https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., & Dubourg, V. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830.
Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., & Welling, M. (2008). Fast collapsed gibbs sampling for latent dirichlet allocation. Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining,
Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE 43
transactions on Signal Processing, 45(11), 2673-2681.
Siami-Namini, S., Tavakoli, N., & Namin, A. S. (2019). The performance of LSTM and BiLSTM in forecasting time series. 2019 IEEE International Conference on Big Data (Big Data),
Teh, Y., Jordan, M., Beal, M., & Blei, D. (2004). Sharing clusters among related groups: Hierarchical Dirichlet processes. advances in neural information processing systems, 17.
Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11).
Wang, C., & Blei, D. M. (2011). Collaborative topic modeling for recommending scientific articles. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining,
Wang, C., Paisley, J., & Blei, D. M. (2011). Online variational inference for the hierarchical Dirichlet process. Proceedings of the fourteenth international
conference on artificial intelligence and statistics,
Wang, H., Wang, N., & Yeung, D.-Y. (2015). Collaborative deep learning for recommender systems. Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining,
Wattenberg, M., Viégas, F., & Johnson, I. (2016). How to use t-SNE effectively.
Distill, 1(10), e2. https://distill.pub/2016/misread-tsne/ Yang, M., & Ma, W. (2022). CkipTransformer. GitHub.
https://github.com/ckiplab/ckip-transformers.

中文文獻
台灣數位媒體應用暨行銷協會. (2022). 2021 台灣數位廣告統計報告.
https://www.magazine.org.tw/uploads/editors/hide_article_list/165543710352.pdf
資誠聯合會計師事務所. (2022). 2022-2026 台灣娛樂暨媒體業展望.
https://www.pwc.tw/zh/publications/topic-report/assets/taiwan-entertainment- and-media-outlook-2022-2026.pdf
zh_TW