學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 系統化評估矩陣分解於社群媒體之假帳號偵測
Systematic Evaluation of Matrix Factorization for Social Media Fake Profile Detection
作者 陳靖淮
Chen, Ching-Huai
貢獻者 沈錳坤
Shan, Man-Kwan
陳靖淮
Chen, Ching-Huai
關鍵詞 假帳號偵測
社群媒體
矩陣分解
Fake Profile Detection
Social Media
Matrix Factorization
日期 2021
上傳時間 2-Sep-2021 16:50:42 (UTC+8)
摘要 網際網路的興盛,帶動了社群媒體的蓬勃發展。Web 2.0 讓人們可以分享資訊到網際網路上帶來大量的資訊以及資訊來源。
然而,龐大的資訊量及資訊來源大幅增加辨別真偽的難度,導致更多不實的資訊出現造成社會危害,如疫情期間的錯誤訊息,讓民眾對於疫情可能做出錯誤的決定。而不實的資訊大都由 Malicious Accounts 所散布。
每個社群媒體上的帳號皆有自己的 Profile。Malicious Accounts多由電腦自動操控,因此產生的Profiles多是偽造的,也就是 Fake Profiles。常見的社群媒體Profile包括Demographic Data及 Psychographic Data。研究指出電腦利用Psychographic Data 中的喜好預測Profile的結果比其朋友更精準。
雖然現已有研究利用 Matrix Factorization 偵測 Bipartite Graph上的Anomalies,但是 Fake Profiles 與這些 Anomalies 目標不同。Fake Profiles目標在偽裝身份下達成特定惡意的行為像是帶風向。
本論文的研究目的在針對 Profile 中的喜好,系統化地評估 Matrix Factorization 偵測 Fake Profiles 的效果。首先我們人工合成五種不同類型的 Fakes,並將他們與我們由 Facebook Crawl 的 Profiles合併,最後我們以實驗評估Matrix Factorization 演算法偵測不同類型 Fake Profiles的效果。
The rapid growth of the Internet triggers the rise of social media. People shares information over social media. However, much fake information spread over social media. One example is the fake information during the pandemic that mislead people to make wrong decisions.
Most of the fake information is spread by fake accounts. Most fake accounts are operated by social bots and create fake profiles automatically. On social media each account has its own profile. In general, social media profile includes of demographic data and psychographic data. One of the important psychographic data is the preference information. For example, on Facebook, the pages liked by a user indicate the user’s preference.
The relationships between the users and the liked pages can be represented by bipartite graph. There exists research on anomaly detection in bipartite graph using matrix factorization. However, the objective of the fake profile is to disguise its identity in order to manipulate public opinion without being detected which is different from that of the ordinary anomaly in bipartite graph.
This thesis focuses on preference profiles and aims to systematically evaluate the performance of matrix factorization on fake profile detection. We propose five types of fakes, inject the synthesized fake profiles into real profiles crawled from Facebook crawler and performed the experiments to evaluate the effectiveness of the matrix factorization algorithm.
參考文獻 [1] S. Adali and J. Golbeck, Predicting Personality with Social Behavior, IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2012.
[2] F. Ahmed and M. Abulaish, An Mcl-Based Approach for Spam Profile Detection in Online Social Networks, IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications, 2012.
[3] L. Akoglu, H. Tong and D. Koutra, Graph Based Aanomaly Detection and Description: A Survey, Data Mining and Knowledge Discovery, vol. 29, no. 3, 2015.
[4] R. Albright, J. Cox, D. Duling, A. N. Langville and C. Meyer, Algorithms, Initializations, and Convergence for the Nonnegative Matrix Factorization, 2006.
[5] M. Bilal, A. Gani, M. I. U. Lali, M. Marjani and N. Malik, Social Profiling: A Review, Taxonomy, and Challenges, Cyberpsychology, Behavior, and Social Networking, vol. 22, no. 7, 2019.
[6] A. ElAzab, Fake Accounts Detection in Twitter Based on Minimum Weighted Feature, World 2016.
[7] G. Farnadi, S. Zoghbi, M.-F. Moens and M. De Cock, Recognising Personality Traits Using Facebook Status Updates, Seventh International AAAI Conference on Weblogs and Social Media, 2013.
[8] J. Golbeck, C. Robles, M. Edmondson and K. Turner, Predicting Personality from Twitter, IEEE Third International Conference on Privacy, Security, Risk and Trust and IEEE Third International Conference on Social Computing, 2011.
[9] I. Gunes, C. Kaleli, A. Bilge and H. Polat, Shilling Attacks against Recommender Systems: A Comprehensive Survey, Artificial Intelligence Review, vol. 42, no. 4, 2014.
[10] I. Guy, N. Zwerdling, I. Ronen, D. Carmel and E. Uziel, Social Media Recommendation Based on People and Tags, Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2010.
[11] R. A. Hanneman and M. Riddle, Introduction to Social Network Methods, University of California Riverside, 2005.
[12] S. Joshi, H. G. Nagariya, N. Dhanotiya and S. Jain, Identifying Fake Profile in Online Social Network: An Overview and Survey, International Conference on Machine Learning, Image Processing, Network Security and Data Sciences, 2020.
[13] R. Kareem and W. Bhaya, Fake Profiles Types of Online Social Networks: A Survey, International Journal of Engineering & Technology, vol. 7, no. 4.19, 2018.
[14] G. F. Khan, B. Swar and S. K. Lee, Social Media Risks and Benefits: A Public Sector Perspective, Social Science Computer Review, vol. 32, no. 5, 2014.
[15] Y. Koren, R. Bell and C. Volinsky, Matrix Factorization Techniques for Recommender Systems, Computer, vol. 42, no. 8, 2009.
[16] L. Liu, D. Preotiuc-Pietro, Z. R. Samani, M. E. Moghaddam and L. Ungar, Analyzing Personality through Social Media Profile Picture Choice, Tenth International AAAI Conference on Web and Social Media, 2016.
[17] D. Markovikj, S. Gievska, M. Kosinski and D. J. Stillwell, Mining Facebook Data for Predictive Personality Modeling, Seventh International AAAI Conference on Weblogs and Social Media, 2013.
[18] J. W. Pennebaker, M. E. Francis and R. J. Booth, Linguistic Inquiry and Word Count: Liwc 2001, Mahway: Lawrence Erlbaum Associates, vol. 71, no. 2001, 2001.
[19] P. Savyan and S. M. S. Bhanu, Behaviour Profiling of Reactions in Facebook Posts for Anomaly Detection, 2017 Ninth International Conference on Advanced Computing (ICoAC), 2017.
[20] K. Shu, A. Sliva, S. Wang, J. Tang and H. Liu, Fake News Detection on Social Media: A Data Mining Perspective, ACM SIGKDD Explorations Newsletter, vol. 19, no. 1, 2017.
[21] T. Stein, E. Chen and K. Mangla, Facebook Immune System, Proceedings of the 4th Workshop on Social Network Systems, 2011.
[22] H. Tong and C.-Y. Lin, Non-Negative Residual Matrix Factorization with Application to Graph Anomaly Detection, Proceedings of the 2011 SIAM International Conference on Data Mining, 2011.
[23] Y. Wu, M. Kosinski and D. Stillwell, Computer-Based Personality Judgments Are More Accurate Than Those Made by Humans, Proceedings of the National Academy of Sciences, vol. 112, no. 4, 2015.
[24] Z. Yang, Q. Sun and B. Zhang, Evaluating Prediction Error for Anomaly Detection by Exploiting Matrix Factorization in Rating Systems, IEEE Access, vol. 6 2018.
[25] Z. Zhang, T. Li, C. Ding and X. Zhang, Binary Matrix Factorization with Applications, Seventh IEEE International Conference on Data Mining, 2007.
[26] W. X. Zhao, S. Li, Y. He, L. Wang, J.-R. Wen and X. Li, Exploring Demographic Information in Social Media for Product Recommendation, Knowledge and Information Systems, vol. 49, no. 1, 2016.
描述 碩士
國立政治大學
資訊科學系
105753034
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0105753034
資料類型 thesis
dc.contributor.advisor 沈錳坤zh_TW
dc.contributor.advisor Shan, Man-Kwanen_US
dc.contributor.author (Authors) 陳靖淮zh_TW
dc.contributor.author (Authors) Chen, Ching-Huaien_US
dc.creator (作者) 陳靖淮zh_TW
dc.creator (作者) Chen, Ching-Huaien_US
dc.date (日期) 2021en_US
dc.date.accessioned 2-Sep-2021 16:50:42 (UTC+8)-
dc.date.available 2-Sep-2021 16:50:42 (UTC+8)-
dc.date.issued (上傳時間) 2-Sep-2021 16:50:42 (UTC+8)-
dc.identifier (Other Identifiers) G0105753034en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/136957-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系zh_TW
dc.description (描述) 105753034zh_TW
dc.description.abstract (摘要) 網際網路的興盛,帶動了社群媒體的蓬勃發展。Web 2.0 讓人們可以分享資訊到網際網路上帶來大量的資訊以及資訊來源。
然而,龐大的資訊量及資訊來源大幅增加辨別真偽的難度,導致更多不實的資訊出現造成社會危害,如疫情期間的錯誤訊息,讓民眾對於疫情可能做出錯誤的決定。而不實的資訊大都由 Malicious Accounts 所散布。
每個社群媒體上的帳號皆有自己的 Profile。Malicious Accounts多由電腦自動操控,因此產生的Profiles多是偽造的,也就是 Fake Profiles。常見的社群媒體Profile包括Demographic Data及 Psychographic Data。研究指出電腦利用Psychographic Data 中的喜好預測Profile的結果比其朋友更精準。
雖然現已有研究利用 Matrix Factorization 偵測 Bipartite Graph上的Anomalies,但是 Fake Profiles 與這些 Anomalies 目標不同。Fake Profiles目標在偽裝身份下達成特定惡意的行為像是帶風向。
本論文的研究目的在針對 Profile 中的喜好,系統化地評估 Matrix Factorization 偵測 Fake Profiles 的效果。首先我們人工合成五種不同類型的 Fakes,並將他們與我們由 Facebook Crawl 的 Profiles合併,最後我們以實驗評估Matrix Factorization 演算法偵測不同類型 Fake Profiles的效果。
zh_TW
dc.description.abstract (摘要) The rapid growth of the Internet triggers the rise of social media. People shares information over social media. However, much fake information spread over social media. One example is the fake information during the pandemic that mislead people to make wrong decisions.
Most of the fake information is spread by fake accounts. Most fake accounts are operated by social bots and create fake profiles automatically. On social media each account has its own profile. In general, social media profile includes of demographic data and psychographic data. One of the important psychographic data is the preference information. For example, on Facebook, the pages liked by a user indicate the user’s preference.
The relationships between the users and the liked pages can be represented by bipartite graph. There exists research on anomaly detection in bipartite graph using matrix factorization. However, the objective of the fake profile is to disguise its identity in order to manipulate public opinion without being detected which is different from that of the ordinary anomaly in bipartite graph.
This thesis focuses on preference profiles and aims to systematically evaluate the performance of matrix factorization on fake profile detection. We propose five types of fakes, inject the synthesized fake profiles into real profiles crawled from Facebook crawler and performed the experiments to evaluate the effectiveness of the matrix factorization algorithm.
en_US
dc.description.tableofcontents 致謝 i
摘要 ii
Abstract iii
目錄 iv
圖目錄 vi
表目錄 ix
第一章 緒論 1
1.1研究背景 1
1.2研究動機與目的 2
第二章 相關研究 4
2.1 Profile 分析及應用 4
2.1.1 社群媒體上的人格特質預測 4
2.1.2 社群媒體上的推薦 5
2.2 異常偵測 5
2.2.1 異常類型 5
2.2.2 異常偵測作法 6
2.2.3 以Matrix Factorization作異常偵測 6
第三章 研究方法 8
3.1 研究架構 8
3.2 資料類型 8
3.3 前處理 10
3.4假帳號的人工合成 11
3.4.1 Filler Size Distribution 11
3.4.2 Filler Selection Method 13
3.4.3 Fake Size 17
3.5 偵測演算法與評估 17
3.5.1 Updating Rules 18
3.5.2 Matrix Factorization的類型 19
3.5.3 Residual Matrix 19
第四章 實驗設計 20
4.1 資料來源 20
4.2 實驗設計與評估方法 21
4.2.1 資料前處理及分析 21
4.2.2評估方法 22
4.3 實驗結果 23
4.3.1 Random 23
4.3.2 Popularity 28
4.3.3 Polarity 29
4.3.4 Strange Connection 33
4.3.5 Bipartite Core 37
第五章結論與未來研究 42
參考文獻 43
zh_TW
dc.format.extent 2934527 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0105753034en_US
dc.subject (關鍵詞) 假帳號偵測zh_TW
dc.subject (關鍵詞) 社群媒體zh_TW
dc.subject (關鍵詞) 矩陣分解zh_TW
dc.subject (關鍵詞) Fake Profile Detectionen_US
dc.subject (關鍵詞) Social Mediaen_US
dc.subject (關鍵詞) Matrix Factorizationen_US
dc.title (題名) 系統化評估矩陣分解於社群媒體之假帳號偵測zh_TW
dc.title (題名) Systematic Evaluation of Matrix Factorization for Social Media Fake Profile Detectionen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] S. Adali and J. Golbeck, Predicting Personality with Social Behavior, IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2012.
[2] F. Ahmed and M. Abulaish, An Mcl-Based Approach for Spam Profile Detection in Online Social Networks, IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications, 2012.
[3] L. Akoglu, H. Tong and D. Koutra, Graph Based Aanomaly Detection and Description: A Survey, Data Mining and Knowledge Discovery, vol. 29, no. 3, 2015.
[4] R. Albright, J. Cox, D. Duling, A. N. Langville and C. Meyer, Algorithms, Initializations, and Convergence for the Nonnegative Matrix Factorization, 2006.
[5] M. Bilal, A. Gani, M. I. U. Lali, M. Marjani and N. Malik, Social Profiling: A Review, Taxonomy, and Challenges, Cyberpsychology, Behavior, and Social Networking, vol. 22, no. 7, 2019.
[6] A. ElAzab, Fake Accounts Detection in Twitter Based on Minimum Weighted Feature, World 2016.
[7] G. Farnadi, S. Zoghbi, M.-F. Moens and M. De Cock, Recognising Personality Traits Using Facebook Status Updates, Seventh International AAAI Conference on Weblogs and Social Media, 2013.
[8] J. Golbeck, C. Robles, M. Edmondson and K. Turner, Predicting Personality from Twitter, IEEE Third International Conference on Privacy, Security, Risk and Trust and IEEE Third International Conference on Social Computing, 2011.
[9] I. Gunes, C. Kaleli, A. Bilge and H. Polat, Shilling Attacks against Recommender Systems: A Comprehensive Survey, Artificial Intelligence Review, vol. 42, no. 4, 2014.
[10] I. Guy, N. Zwerdling, I. Ronen, D. Carmel and E. Uziel, Social Media Recommendation Based on People and Tags, Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2010.
[11] R. A. Hanneman and M. Riddle, Introduction to Social Network Methods, University of California Riverside, 2005.
[12] S. Joshi, H. G. Nagariya, N. Dhanotiya and S. Jain, Identifying Fake Profile in Online Social Network: An Overview and Survey, International Conference on Machine Learning, Image Processing, Network Security and Data Sciences, 2020.
[13] R. Kareem and W. Bhaya, Fake Profiles Types of Online Social Networks: A Survey, International Journal of Engineering & Technology, vol. 7, no. 4.19, 2018.
[14] G. F. Khan, B. Swar and S. K. Lee, Social Media Risks and Benefits: A Public Sector Perspective, Social Science Computer Review, vol. 32, no. 5, 2014.
[15] Y. Koren, R. Bell and C. Volinsky, Matrix Factorization Techniques for Recommender Systems, Computer, vol. 42, no. 8, 2009.
[16] L. Liu, D. Preotiuc-Pietro, Z. R. Samani, M. E. Moghaddam and L. Ungar, Analyzing Personality through Social Media Profile Picture Choice, Tenth International AAAI Conference on Web and Social Media, 2016.
[17] D. Markovikj, S. Gievska, M. Kosinski and D. J. Stillwell, Mining Facebook Data for Predictive Personality Modeling, Seventh International AAAI Conference on Weblogs and Social Media, 2013.
[18] J. W. Pennebaker, M. E. Francis and R. J. Booth, Linguistic Inquiry and Word Count: Liwc 2001, Mahway: Lawrence Erlbaum Associates, vol. 71, no. 2001, 2001.
[19] P. Savyan and S. M. S. Bhanu, Behaviour Profiling of Reactions in Facebook Posts for Anomaly Detection, 2017 Ninth International Conference on Advanced Computing (ICoAC), 2017.
[20] K. Shu, A. Sliva, S. Wang, J. Tang and H. Liu, Fake News Detection on Social Media: A Data Mining Perspective, ACM SIGKDD Explorations Newsletter, vol. 19, no. 1, 2017.
[21] T. Stein, E. Chen and K. Mangla, Facebook Immune System, Proceedings of the 4th Workshop on Social Network Systems, 2011.
[22] H. Tong and C.-Y. Lin, Non-Negative Residual Matrix Factorization with Application to Graph Anomaly Detection, Proceedings of the 2011 SIAM International Conference on Data Mining, 2011.
[23] Y. Wu, M. Kosinski and D. Stillwell, Computer-Based Personality Judgments Are More Accurate Than Those Made by Humans, Proceedings of the National Academy of Sciences, vol. 112, no. 4, 2015.
[24] Z. Yang, Q. Sun and B. Zhang, Evaluating Prediction Error for Anomaly Detection by Exploiting Matrix Factorization in Rating Systems, IEEE Access, vol. 6 2018.
[25] Z. Zhang, T. Li, C. Ding and X. Zhang, Binary Matrix Factorization with Applications, Seventh IEEE International Conference on Data Mining, 2007.
[26] W. X. Zhao, S. Li, Y. He, L. Wang, J.-R. Wen and X. Li, Exploring Demographic Information in Social Media for Product Recommendation, Knowledge and Information Systems, vol. 49, no. 1, 2016.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202101306en_US