應用文字探勘於業配文揭露偵測

洪御哲; Hung, Yu-Jhe

Please use this identifier to cite or link to this item: https://ah.lib.nccu.edu.tw/handle/140.119/137282

DC Field	Value	Language
dc.contributor.advisor	洪為璽	zh_TW
dc.contributor.advisor	Hung, Wei-Hsi	en_US
dc.contributor.author	洪御哲	zh_TW
dc.contributor.author	Hung, Yu-Jhe	en_US
dc.creator	洪御哲	zh_TW
dc.creator	Hung, Yu-Jhe	en_US
dc.date	2021	en_US
dc.date.accessioned	2021-10-01T02:02:40Z	-
dc.date.available	2021-10-01T02:02:40Z	-
dc.date.issued	2021-10-01T02:02:40Z	-
dc.identifier	G0108356021	en_US
dc.identifier.uri	http://nccur.lib.nccu.edu.tw/handle/140.119/137282	-
dc.description	碩士	zh_TW
dc.description	國立政治大學	zh_TW
dc.description	資訊管理學系	zh_TW
dc.description	108356021	zh_TW
dc.description.abstract	業配文是在廣告媒體內容中有目的地整合品牌或品牌說服性訊息，以換取贊助商的報酬。在網際網路與行動裝置的普及下，社群媒體快速成長，捧紅了許多「網紅」高影響力者，看上此高度個人化與可控制內容的特性，使廠商將資源投入在這些人身上，以獲取商品的曝光與銷售。但是業配文常常會有假分享真業配的問題，讓消費者認為是自己的真實體驗分享，而非商業贊助，可能誤導消費者進行消費，故本研究目的在於能否建立一個模型找出背後可能是未揭露的業配文章。首先，先搜集痞客邦百大部落客的資料，建立會揭露業配之部落客名冊，再搜集該部落客發表過的所有文章，藉由揭露文字標注業配文與非業配文。然後透過機器學習方法SVM、CNN與Google所開發的深度語言模型BERT進行訓練與比較，最後以CNN平均得出最高的準確度83.625%，同時，在我們標注的未揭露業配文章資料中，CNN能夠偵測業配文的準確度為90.69%。最後，應用逐層相關傳播LRP解釋CNN模型，觀察哪些常出現業配文文字最可能被預測為業配文，比較模型與人為觀點，並藉此找出業配文的特徵，以提供給消費者進行判斷。	zh_TW
dc.description.abstract	Sponsored content is purposefully incorporating commercial brands into editorial content. With the popularization of the Internet and mobile devices, social media has proliferated and gained popularity among key opinion leaders (KOLs) who have substantial influencing power in the specific social network. This highly personalized and controllable content allows manufacturers to invest resources in KOLs to obtain more exposure and sales of goods. However, sponsored content often has the problem of undisclosed sponsorship. It makes consumers feel it is a personal and authentic experience rather than sponsored content. Undisclosed sponsored content may mislead consumers to buy their products. Therefore, this research aims to build a model to find out the undisclosed sponsored content. This paper establishes the roster from the top 100 ranks of bloggers who will disclose sponsorship in their articles in Pixnet. Afterward, all the published articles are labeled sponsored and non-sponsored by the sentences they used in the disclosure. The datasets with labels of whether disclosed or undisclosed sponsored content are completed. These datasets will be trained and compared through machine learning methods Support Vector Machine (SVM), Convolutional Neural Network (CNN) and the deep language model Bidirectional Encoder Representations from Transformers (BERT) developed by Google. Finally, CNN has the highest accuracy of 83.625%. At the same time, CNN can detect sponsored content with an accuracy of 90.69% in the undisclosed sponsored content we labeled. Finally, the Layer-wise Relevance Propagation (LRP) explains the CNN model and observes which word frequently appears in sponsored content. We can find out the characteristics of sponsored content and provide it for consumers to make a purchase decision.	en_US
dc.description.tableofcontents	第壹章、緒論 8\n第一節、研究動機與背景 8\n第二節、研究目的 9\n第三節、研究架構 10\n第貳章、文獻探討 11\n第一節、業配文 11\n第二節、中文斷詞系統 13\n第三節、語言模型 (LANGUAGE MODEL) 15\n第四節、文本分類 (TEXT CLASSIFICATION) 19\n第五節、逐層相關傳播 22\n第參章、研究方法 25\n第一節、研究架構 25\n第二節、資料搜集 26\n第三節、資料預處理 27\n第四節、模型建立 30\n第肆章、實驗設計與分析 32\n第一節、實驗資料 32\n第二節、實驗結果 35\n第三節、小結 42\n第伍章、結論 44\n參考文獻 46	zh_TW
dc.format.extent	5754170 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri	http://thesis.lib.nccu.edu.tw/record/#G0108356021	en_US
dc.subject	業配文	zh_TW
dc.subject	內容行銷	zh_TW
dc.subject	文字探勘	zh_TW
dc.subject	機器學習	zh_TW
dc.subject	自然語言處理	zh_TW
dc.subject	Sponsored Content	en_US
dc.subject	Content Marketing	en_US
dc.subject	Text Mining	en_US
dc.subject	Machine Learning	en_US
dc.subject	Natural Language Processing	en_US
dc.title	應用文字探勘於業配文揭露偵測	zh_TW
dc.title	Sponsored Content Detection with Text Mining Approach	en_US
dc.type	thesis	en_US
dc.relation.reference	財團法人臺灣網路資訊中心（2019）。2019 臺灣網路報告。2019 年 12 月 22 日，資料引自 https://report.twnic.tw/2019/。\n公平交易委員會（2017）。公平交易委員會對於薦證廣告之規範說明。https://www.ftc.gov.tw/internet/main/doc/docDetail.aspx?uid=165&docid=13021\n王毓莉（2014）。台灣新聞記者對「業配新聞」的馴服與抗拒。新聞學研究(119), 45-79。http://ir.lib.pccu.edu.tw/handle/987654321/38722\nActivate. (2018). Exploring the Brand and Influencer Relationship in Influencer Marketing. Retrieved from: https://try.activate.social/2018-state-of-influencer-study\nAggarwal, C. C., & Zhai, C. (2012). A Survey of Text Classification Algorithms. In C. C. Aggarwal & C. Zhai (Eds.), Mining Text Data (pp. 163-222). Boston, MA: Springer US.\nArras, L., Horn, F., Montavon, G., Müller, K.-R., & Samek, W. (2017). "What is relevant in a text document?": An interpretable machine learning approach. PloS one, 12(8), e0181142. doi:10.1371/journal.pone.0181142\nBach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., & Samek, W. (2015). On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PloS one, 10(7), e0130140. doi:10.1371/journal.pone.0130140\nBahdanau, D., Cho, K., & Bengio, Y. (2014). Neural Machine Translation by Jointly Learning to Align and Translate. arXiv:1409.0473. Retrieved from https://ui.adsabs.harvard.edu/abs/2014arXiv1409.0473B\nBecker-Olsen, K. L. (2003). And Now, A Word from Our Sponsor--A Look at the Effects of Sponsored Content and Banner Advertising. Journal of Advertising, 32(2), 17-32. doi:10.1080/00913367.2003.10639130\nBengio, Y., Ducharme, R., Vincent, P., & Janvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137–1155.\nBhatnagar, N., Aksoy, L., & Malkoc, S. A. (2003). Embedding Brands Within Media Content: The Impact of Message, Media, and Consumer Characteristics on Placement Efﬁcacy. In The psychology of entertainment media (pp. 110-127): Erlbaum Psych Press.\nBivins, T. (2017). Mixed media: Moral distinctions in advertising, public relations, and journalism: Routledge. Journalism and Mass Communication Quarterly, 81(1), 187-188.\nCommission, F. T. (2017). The FTC’s endorsement guides: What people are asking. Retrieved from https://www.ftc.gov/tips-advice/business-center/guidance/ftcs-endorsement-guides-what-people-are-asking\nCortes, C., & Vapnik, V. (1995). Support-Vector Networks. Machine Learning, 20(3), 273-297. doi:10.1007/BF00994018\nDevlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805. Retrieved from https://ui.adsabs.harvard.edu/abs/2018arXiv181004805D\nDurbhakula, V. V. K., & Kim, D. J. (2011). E-business for Nations: A Study of National Level E-business Adoption Factors Using Country Characteristics-Business-Technology-Government Framework. Journal of Theoretical and Applied Electronic Commerce Research, 6(3), 1-12. Retrieved from https://search.proquest.com/scholarly-journals/e-business-nations-study-national-level-adoption/docview/915869254/se-2?accountid=13877\nGeyser, W. (2021). The State of Influencer Marketing 2020: Benchmark Report. Retrieved from https://influencermarketinghub.com/influencer-marketing-benchmark-report-2020/\nHartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., & Aluisio, S. (2017). Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks. arXiv:1708.06025. Retrieved from https://ui.adsabs.harvard.edu/abs/2017arXiv170806025H\nHulse, J. V., Khoshgoftaar, T. M., & Napolitano, A. (2007). Experimental perspectives on learning from imbalanced data. Paper presented at the Proceedings of the 24th international conference on Machine learning, Corvalis, Oregon, USA. https://doi.org/10.1145/1273496.1273614\nIkonen, P., Luoma-aho, V., & Bowen, S. A. (2017). Transparency for Sponsored Content: Analysing Codes of Ethics in Public Relations, Marketing, Advertising and Journalism. International Journal of Strategic Communication, 11(2), 165-178. doi:10.1080/1553118X.2016.1252917\nJapkowicz, N. (2000). Learning from imbalanced data sets: a comparison of various strategies. Paper presented at the AAAI workshop on learning from imbalanced data sets.\nJiang, M., Liang, Y., Feng, X., Fan, X., Pei, Z., Xue, Y., & Guan, R. (2018). Text classification based on deep belief network and softmax regression. Neural Computing and Applications, 29(1), 61-70. doi:10.1007/s00521-016-2401-x\nJohnson, R., & Zhang, T. (2014). Effective Use of Word Order for Text Categorization with Convolutional Neural Networks. arXiv:1412.1058. Retrieved from https://ui.adsabs.harvard.edu/abs/2014arXiv1412.1058J\nJu-Pak, G.-H., Kim, B.-H., & Cameron, G. (1995). Trends in the use and abuse of advertorials in magazines. Mass Communication Review, 22, 112-128.\nKapitan, S., & Silvera, D. H. (2016). From digital media influencers to celebrity endorsers: attributions drive endorser effectiveness. Marketing Letters, 27(3), 553-567. doi:10.1007/s11002-015-9363-0\nKim, Y. (2014). Convolutional Neural Networks for Sentence Classification. arXiv:1408.5882. Retrieved from https://ui.adsabs.harvard.edu/abs/2014arXiv1408.5882K\nKowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text Classification Algorithms: A Survey. Information, 10(4), 150.\nLecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. doi:10.1109/5.726791\nLi, L., Xiao, L., Wang, N., Yang, G., & Zhang, J. (2017, 13-16 Dec. 2017). Text classification method based on convolution neural network. Paper presented at the 2017 3rd IEEE International Conference on Computer and Communications (ICCC).\nLiao, H.-L., Liu, S.-H., & Chou, C.-H. (2015). An exploratory study of product placement in social media. Internet Research, 25(2), 300-316. doi:10.1108/IntR-12-2013-0267\nLiu, T., Fang, S., Zhao, Y., Wang, P., & Zhang, J. (2015). Implementation of Training Convolutional Neural Networks. arXiv:1506.01195. Retrieved from https://ui.adsabs.harvard.edu/abs/2015arXiv150601195L\nMcHugh, M. L. (2012). Interrater reliability: the kappa statistic. Biochemia medica, 22(3), 276-282.\nMikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781. Retrieved from https://ui.adsabs.harvard.edu/abs/2013arXiv1301.3781M\nMontavon, G., Binder, A., Lapuschkin, S., Samek, W., & Müller, K.-R. (2019). Layer-Wise Relevance Propagation: An Overview. In W. Samek, G. Montavon, A. Vedaldi, L. K. Hansen, & K.-R. Müller (Eds.), Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (pp. 193-209). Cham: Springer International Publishing.\nPeters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2017). Deep contextualized word representations. arXiv:1802.05365. Retrieved from https://ui.adsabs.harvard.edu/abs/2018arXiv180205365P\nPrusa, J., Khoshgoftaar, T. M., Dittman, D. J., & Napolitano, A. (2015). Using Random Undersampling to Alleviate Class Imbalance on Tweet Sentiment Data. Paper presented at the 2015 IEEE International Conference on Information Reuse and Integration.\nSingla, Z., Randhawa, S., & Jain, S. (2017). Sentiment analysis of customer product reviews using machine learning. Paper presented at the 2017 International Conference on Intelligent Computing and Control (I2C2).\nvan Reijmersdal, E., Neijens, P., & Smit, E. G. (2009). A New Branch of Advertising. Journal of Advertising Research, 49(4), 429.\nVaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., . . . Polosukhin, I. (2017). Attention Is All You Need. arXiv:1706.03762. Retrieved from https://ui.adsabs.harvard.edu/abs/2017arXiv170603762V\nWang, Z., Sun, X., Zhang, D., & Li, X. (2006, 13-16 Aug. 2006). An Optimal SVM-Based Text Classification Algorithm. Paper presented at the 2006 International Conference on Machine Learning and Cybernetics.\nWojdynski, B. W., Evans, N. J., & Hoy, M. G. (2018). Measuring Sponsorship Transparency in the Age of Native Advertising. Journal of Consumer Affairs, 52(1), 115-137. doi:https://doi.org/10.1111/joca.12144\nWu, H., Li, D., & Cheng, M. (2019). Chinese text classification based on character-level CNN and SVM. International Journal of Intelligent Information and Database Systems, 12(3), 212-228.\nYang, Y., Tresp, V., Wunderle, M., & Fasching, P. A. (2018, 4-7 June 2018). Explaining Therapy Predictions with Layer-Wise Relevance Propagation in Neural Networks. Paper presented at the 2018 IEEE International Conference on Healthcare Informatics (ICHI).\nYang, X., Kim, S., & Sun, Y. (2019). How do influencers mention brands in social media? sponsorship prediction of Instagram posts. Paper presented at the Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Vancouver, British Columbia, Canada. https://doi.org/10.1145/3341161.3342925\nZarei, K., Ibosiola, D., Farahbakhsh, R., Gilani, Z., Garimella, K., Crespi, N., & Tyson, G. (2020). Characterising and Detecting Sponsored Influencer Posts on Instagram. arXiv:2011.05757. Retrieved from https://ui.adsabs.harvard.edu/abs/2020arXiv201105757Z	zh_TW
dc.identifier.doi	10.6814/NCCU202101593	en_US
item.grantfulltext	restricted	-
item.openairecristype	http://purl.org/coar/resource_type/c_46ec	-
item.openairetype	thesis	-
item.cerifentitytype	Publications	-
item.fulltext	With Fulltext	-
Appears in Collections:	學位論文

Files in This Item:

File	Description	Size	Format
602101.pdf		5.62 MB	Adobe PDF2	View/Open

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Google Scholar^TM

Altmetric

Altmetric

Files in This Item:

Google ScholarTM

Altmetric

Altmetric

Google Scholar^TM