基於基因演算法的惡意軟體標籤共識評分系統 | Publication

Publications-Theses

Article View/Open

pdf(129)

Publication Export

Google Scholar^TM

題名	基於基因演算法的惡意軟體標籤共識評分系統 A Novel Scoring System with Genetic Algorithm for Consensus Reaching in Malware Labels
作者	王詩渝 Wang, Shih-Yu
貢獻者	蕭舜文 Hsiao, Shun-Wen 王詩渝 Wang, Shih-Yu
關鍵詞	惡意軟體分群基因演算法成對比較共識達成系統 Malware clustering Genetic algorithm Pairwise comparison Consensus reaching system
日期	2023
上傳時間	1-Sep-2023 14:55:23 (UTC+8)
摘要	識別惡意軟體家族對於網絡安全研究人員來說至關重要。通常，防病毒軟體分析商會提供稱為AV標籤的惡意軟體標籤，其標籤根據病毒行為對惡意軟體樣本進行分類。然而，由於每個防病毒軟體分析商的觀點和分析方法不同，這些標籤經常具有不一致的格式和名稱。這種不一致性造成了標籤參考的混亂並降低了可信度。一些過往的方法為了解決這個問題，依賴於不一定有意義的加權方式來對分析商做篩選，或可能依賴於有偏見的投票制度。為了解決這個問題，我們提出了一種名為成對共識分數(PCS)的新穎評分系統。這種評分方法基於命名邏輯，以找出該群集是否與其他意見相似，而不是使用標籤名稱來判斷結果的質量。我們的共識達成過程結合了PCS和基因演算法，以根據不同的防病毒軟體分析商之間的協議對惡意軟體樣本進行分群分析，並找到最佳的標籤以良好地將惡意軟體進行分群並貼標。實驗結果顯示，我們的方法優於現有的方法，為惡意軟體樣本提供了更一致且可信的AV標籤。 Identifying malware families is crucial for researchers in cybersecurity. Usually, antivirus vendors provide malware labels called AV labels to categorize malware samples based on their behavior. However, due to the different viewpoints and analysis methods of each antivirus vendor, the labels often have inconsistent formats and names. This inconsistency creates clutter and reduces trustworthiness. Some previous approaches to address this issue relied on weightings that are not necessarily meaningful, or majority voting that can be biased. To solve this problem, we propose a novel scoring system called Pairwise Consensus Score (PCS). The scoring method is based on naming logic to determine whether the cluster is similar to other opinions instead of using labels to judge the quality of the results. Our consensus reaching process combines PCS and a Genetic Algorithm to cluster malware samples based on agreement among different antivirus vendors and find the best label that clusters the malware well. Experimental results show that our method outperforms existing methods, providing more consistent and trustworthy AV labels for malware samples.
參考文獻	Afianian, A., Niksefat, S., Sadeghiyan, B., and Baptiste, D. (2019). Malware dynamic analysis evasion techniques: A survey. ACM Computing Surveys (CSUR), 52(6):1–28. angavarapu, T. and Patil, N. (2019). A novel filter–wrapper hybrid greedy ensemble approach optimized using the genetic algorithm to reduce the dimensionality of high- dimensional biomedical datasets. Applied Soft Computing, 81:105538. Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., and Siemens, C. (2014). Drebin: Effective and explainable detection of android malware in your pocket. Ndss, 14:23–26. Babaagba, K. O. and Adesanya, S. O. (2019). A study on the effect of feature selection on malware analysis using machine learning. In Proceedings of the 2019 8th international conference on educational and information technology, pages 51–55. Bacci, A., Bartoli, A., Martinelli, F., Medvet, E., Mercaldo, F., Visaggio, C. A., et al. (2018). Impact of code obfuscation on android malware detection based on static and dynamic analysis. ICISSP, pages 379–385. Bakour, K. and Ünver, H. M. (2020). Visdroid: Android malware classification based on local and global image features, bag of visual words and machine learning techniques. Neural Computing and Applications, 33:3133–3153. Bontchev, V. (2005). Current status of the caro malware naming scheme. Virus Bulletin, 15. Dib, M. (2021). On Leveraging Next-Generation Deep Learning Techniques for IoT Mal- ware Classification, Family Attribution and Lineage Analysis. PhD thesis, Concordia University. Ducau, F. N., Rudd, E. M., Heppner, T. M., Long, A., and Berlin, K. (2019). Automatic malware description via attribute tagging and similarity embedding. arXiv preprint arXiv:1905.06262. D’Angelo, G., Ficco, M., and Palmieri, F. (2021). Association rule-based malware classification using common subsequences of api calls. Applied Soft Computing, 105:107234. Fatima, A., Maurya, R., Dutta, M. K., Burget, R., and Masek, J. (2019). ndroid mal- ware detection using genetic algorithm based optimized feature selection and machine learning. 42nd International conference on telecommunications and signal processing (TSP), pages 220–223. Fejrskov, M., Vasilomanolakis, E., and Pedersen, J. M. (2022). A study on the use of 3rd party dns resolvers for malware filtering or censorship circumvention. ICT Systems Security and Privacy Protection, 648. Garg, V. and Yadav, R. K. (2020). Malware detection using multilevel ensemble super- vised learning. In International Conference on Communication and Intelligent Systems, pages 219–231. Springer. Hamid, I. R. A., Khalid, N. S., Abdullah, N. A., Rahman, N. H. A., and Wen, C. C. (2017). Android malware classification using k-means clustering algorithm. IOP Conference Series: Materials Science and Engineering, 226:012105. Holland, J. H. (1922). Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press. Hsiao, S.-W., Sun, Y. S., and Chen, M. C. (2016). Behavior grouping of android malware family. 2016 IEEE International Conference on Communications (ICC), pages 1–6. Hurier, M., Allix, K., Bissyandé, T. F., Klein, J., and Le Traon, Y. (2016). n the lack of consensus in anti-virus decisions: Metrics and insights on building ground truths of android malware. Detection of Intrusions and Malware, and Vulnerability Assessment: 13th International Conference DIMVA, pages 142–162. Hurier, M., Suarez-Tangil, G., Dash, S. K., Bissyandé, T. F., Le Traon, Y., Klein, J., and Cavallaro, L. (2017). Euphony: Harmonious unification of cacophonous anti-virus ven- dor labels for android malware. International Conference on Mining Software Reposi- tories, 14:425–435. Jang, J., Brumley, D., and Venkataraman, S. (2011). Bitshred: feature hashing malware for scalable triage and semantic analysis. Proceedings of the 18th ACM conference on Computer and communications security, pages 309–320. Kotzias, P., Matic, S., Rivera, R., and Caballero, J. (2015). Certified pup: abuse in authen- ticode code signing. Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 465–478. Kumar, S. and Mittal, S. K. (2020). Email spam and malware filtering using machine learning and its applications. In Performance Management, pages 25–32. CRC Press. Laboratories, N. A. R. (2021). Narlabs. https://owl.nchc.org.tw/malware.php. Pektaş, A. and Acarman, T. (2018). Malware classification based on api calls and be- haviour analysis. IET Information Security, 12(2):107–117. Perdisci, R. and U, M. (2012). Vamo: towards a fully automated malware clustering va- lidity analysis. Proceedings of the 28th Annual Computer Security Applications Con- ference, pages 329–338. Salem, A., Banescu, s., and Pretschner, A. (2021). Maat: Automatically analyzing virusto- tal for accurate labeling and effective malware detection. ACM Transactions on Privacy and Security (TOPS), 24(4):1–35. Sebastin, M., Rivera, R., Kotzias, P., and Caballero, J. (2016). Avclass: A tool for massive malware labeling. Research in Attacks, Intrusions, and Defenses, 9854:230––253. Shukla, A., Pandey, H. M., and Mehrotra, D. (2015). Comparative review of selection techniques in genetic algorithm. International Conference on Futuristic Trends on Com- putational Analysis and Knowledge Management (ABLAZE), pages 515–519. SonicWall (2023). 2022 cyber threat report. Sung, A. H., Xu, J., Chavez, P., and Mukkamala, S. (2004). Static analyzer of vicious executables (save). 20th Annual Computer Security Applications Conference, 326–334. Usharani, S., Bala, P. M., and Mary, M. M. J. (2021). Dynamic analysis on crypto- ransomware by using machine learning: Gandcrab ransomware. Journal of Physics: Conference Series, 1717(1):012024. Virustotal (2023). Virustotal. Visalakshi, P. (2020). Detecting android malware using an improved filter based technique in embedded software. Microprocessors and Microsystems, 76:103115. Wu, Z. and Chen, Y. (2001). Genetic algorithm based selective neural network ensemble. IJCAI-01: proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, Seattle, Washington. Yoo, S., Kim, S., Kim, S., and Kang, B. B. (2021). Ai-hydra: Advanced hybrid approach using random forest and deep learning for malware classification. Information Sciences, 546:420–435. Zhu, S., Shi, J., Yang, L., Qin, B., Zhang, Z., Song, L., and Wang, G. (2020). Measuring and modeling the label dynamics of online anti-malware engines. USENIX Security Symposium, pages 2361–23
描述	碩士國立政治大學資訊管理學系 110356044
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0110356044
資料類型	thesis

dc.contributor.advisor	蕭舜文	zh_TW
dc.contributor.advisor	Hsiao, Shun-Wen	en_US
dc.contributor.author (Authors)	王詩渝	zh_TW
dc.contributor.author (Authors)	Wang, Shih-Yu	en_US
dc.creator (作者)	王詩渝	zh_TW
dc.creator (作者)	Wang, Shih-Yu	en_US
dc.date (日期)	2023	en_US
dc.date.accessioned	1-Sep-2023 14:55:23 (UTC+8)	-
dc.date.available	1-Sep-2023 14:55:23 (UTC+8)	-
dc.date.issued (上傳時間)	1-Sep-2023 14:55:23 (UTC+8)	-
dc.identifier (Other Identifiers)	G0110356044	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/146896	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊管理學系	zh_TW
dc.description (描述)	110356044	zh_TW
dc.description.abstract (摘要)	識別惡意軟體家族對於網絡安全研究人員來說至關重要。通常，防病毒軟體分析商會提供稱為AV標籤的惡意軟體標籤，其標籤根據病毒行為對惡意軟體樣本進行分類。然而，由於每個防病毒軟體分析商的觀點和分析方法不同，這些標籤經常具有不一致的格式和名稱。這種不一致性造成了標籤參考的混亂並降低了可信度。一些過往的方法為了解決這個問題，依賴於不一定有意義的加權方式來對分析商做篩選，或可能依賴於有偏見的投票制度。為了解決這個問題，我們提出了一種名為成對共識分數(PCS)的新穎評分系統。這種評分方法基於命名邏輯，以找出該群集是否與其他意見相似，而不是使用標籤名稱來判斷結果的質量。我們的共識達成過程結合了PCS和基因演算法，以根據不同的防病毒軟體分析商之間的協議對惡意軟體樣本進行分群分析，並找到最佳的標籤以良好地將惡意軟體進行分群並貼標。實驗結果顯示，我們的方法優於現有的方法，為惡意軟體樣本提供了更一致且可信的AV標籤。	zh_TW
dc.description.abstract (摘要)	Identifying malware families is crucial for researchers in cybersecurity. Usually, antivirus vendors provide malware labels called AV labels to categorize malware samples based on their behavior. However, due to the different viewpoints and analysis methods of each antivirus vendor, the labels often have inconsistent formats and names. This inconsistency creates clutter and reduces trustworthiness. Some previous approaches to address this issue relied on weightings that are not necessarily meaningful, or majority voting that can be biased. To solve this problem, we propose a novel scoring system called Pairwise Consensus Score (PCS). The scoring method is based on naming logic to determine whether the cluster is similar to other opinions instead of using labels to judge the quality of the results. Our consensus reaching process combines PCS and a Genetic Algorithm to cluster malware samples based on agreement among different antivirus vendors and find the best label that clusters the malware well. Experimental results show that our method outperforms existing methods, providing more consistent and trustworthy AV labels for malware samples.	en_US
dc.description.tableofcontents	摘要 i Abstract ii Contents iv List of Figures v List of Tables vi 1 Introduction 1 2 Related work 6 2.1 Existing efforts on malware labeling 6 2.2 Genetic Algorithm 8 3 Methodology 10 3.1 Preprocessing for malware family extraction 10 3.2 Pairwise consensus score 13 3.3 Genetic algorithm for consensus reaching 16 4 Evaluation 21 4.1 Dataset Description 21 4.2 Experiments on different configuration 23 4.3 Pairwise consensus score evaluation 25 4.4 Evaluation with ground truth from dynamic analysis 29 5 Conclusions 32 Reference 34	zh_TW
dc.format.extent	1733942 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0110356044	en_US
dc.subject (關鍵詞)	惡意軟體分群	zh_TW
dc.subject (關鍵詞)	基因演算法	zh_TW
dc.subject (關鍵詞)	成對比較	zh_TW
dc.subject (關鍵詞)	共識達成系統	zh_TW
dc.subject (關鍵詞)	Malware clustering	en_US
dc.subject (關鍵詞)	Genetic algorithm	en_US
dc.subject (關鍵詞)	Pairwise comparison	en_US
dc.subject (關鍵詞)	Consensus reaching system	en_US
dc.title (題名)	基於基因演算法的惡意軟體標籤共識評分系統	zh_TW
dc.title (題名)	A Novel Scoring System with Genetic Algorithm for Consensus Reaching in Malware Labels	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Afianian, A., Niksefat, S., Sadeghiyan, B., and Baptiste, D. (2019). Malware dynamic analysis evasion techniques: A survey. ACM Computing Surveys (CSUR), 52(6):1–28. angavarapu, T. and Patil, N. (2019). A novel filter–wrapper hybrid greedy ensemble approach optimized using the genetic algorithm to reduce the dimensionality of high- dimensional biomedical datasets. Applied Soft Computing, 81:105538. Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., and Siemens, C. (2014). Drebin: Effective and explainable detection of android malware in your pocket. Ndss, 14:23–26. Babaagba, K. O. and Adesanya, S. O. (2019). A study on the effect of feature selection on malware analysis using machine learning. In Proceedings of the 2019 8th international conference on educational and information technology, pages 51–55. Bacci, A., Bartoli, A., Martinelli, F., Medvet, E., Mercaldo, F., Visaggio, C. A., et al. (2018). Impact of code obfuscation on android malware detection based on static and dynamic analysis. ICISSP, pages 379–385. Bakour, K. and Ünver, H. M. (2020). Visdroid: Android malware classification based on local and global image features, bag of visual words and machine learning techniques. Neural Computing and Applications, 33:3133–3153. Bontchev, V. (2005). Current status of the caro malware naming scheme. Virus Bulletin, 15. Dib, M. (2021). On Leveraging Next-Generation Deep Learning Techniques for IoT Mal- ware Classification, Family Attribution and Lineage Analysis. PhD thesis, Concordia University. Ducau, F. N., Rudd, E. M., Heppner, T. M., Long, A., and Berlin, K. (2019). Automatic malware description via attribute tagging and similarity embedding. arXiv preprint arXiv:1905.06262. D’Angelo, G., Ficco, M., and Palmieri, F. (2021). Association rule-based malware classification using common subsequences of api calls. Applied Soft Computing, 105:107234. Fatima, A., Maurya, R., Dutta, M. K., Burget, R., and Masek, J. (2019). ndroid mal- ware detection using genetic algorithm based optimized feature selection and machine learning. 42nd International conference on telecommunications and signal processing (TSP), pages 220–223. Fejrskov, M., Vasilomanolakis, E., and Pedersen, J. M. (2022). A study on the use of 3rd party dns resolvers for malware filtering or censorship circumvention. ICT Systems Security and Privacy Protection, 648. Garg, V. and Yadav, R. K. (2020). Malware detection using multilevel ensemble super- vised learning. In International Conference on Communication and Intelligent Systems, pages 219–231. Springer. Hamid, I. R. A., Khalid, N. S., Abdullah, N. A., Rahman, N. H. A., and Wen, C. C. (2017). Android malware classification using k-means clustering algorithm. IOP Conference Series: Materials Science and Engineering, 226:012105. Holland, J. H. (1922). Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press. Hsiao, S.-W., Sun, Y. S., and Chen, M. C. (2016). Behavior grouping of android malware family. 2016 IEEE International Conference on Communications (ICC), pages 1–6. Hurier, M., Allix, K., Bissyandé, T. F., Klein, J., and Le Traon, Y. (2016). n the lack of consensus in anti-virus decisions: Metrics and insights on building ground truths of android malware. Detection of Intrusions and Malware, and Vulnerability Assessment: 13th International Conference DIMVA, pages 142–162. Hurier, M., Suarez-Tangil, G., Dash, S. K., Bissyandé, T. F., Le Traon, Y., Klein, J., and Cavallaro, L. (2017). Euphony: Harmonious unification of cacophonous anti-virus ven- dor labels for android malware. International Conference on Mining Software Reposi- tories, 14:425–435. Jang, J., Brumley, D., and Venkataraman, S. (2011). Bitshred: feature hashing malware for scalable triage and semantic analysis. Proceedings of the 18th ACM conference on Computer and communications security, pages 309–320. Kotzias, P., Matic, S., Rivera, R., and Caballero, J. (2015). Certified pup: abuse in authen- ticode code signing. Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 465–478. Kumar, S. and Mittal, S. K. (2020). Email spam and malware filtering using machine learning and its applications. In Performance Management, pages 25–32. CRC Press. Laboratories, N. A. R. (2021). Narlabs. https://owl.nchc.org.tw/malware.php. Pektaş, A. and Acarman, T. (2018). Malware classification based on api calls and be- haviour analysis. IET Information Security, 12(2):107–117. Perdisci, R. and U, M. (2012). Vamo: towards a fully automated malware clustering va- lidity analysis. Proceedings of the 28th Annual Computer Security Applications Con- ference, pages 329–338. Salem, A., Banescu, s., and Pretschner, A. (2021). Maat: Automatically analyzing virusto- tal for accurate labeling and effective malware detection. ACM Transactions on Privacy and Security (TOPS), 24(4):1–35. Sebastin, M., Rivera, R., Kotzias, P., and Caballero, J. (2016). Avclass: A tool for massive malware labeling. Research in Attacks, Intrusions, and Defenses, 9854:230––253. Shukla, A., Pandey, H. M., and Mehrotra, D. (2015). Comparative review of selection techniques in genetic algorithm. International Conference on Futuristic Trends on Com- putational Analysis and Knowledge Management (ABLAZE), pages 515–519. SonicWall (2023). 2022 cyber threat report. Sung, A. H., Xu, J., Chavez, P., and Mukkamala, S. (2004). Static analyzer of vicious executables (save). 20th Annual Computer Security Applications Conference, 326–334. Usharani, S., Bala, P. M., and Mary, M. M. J. (2021). Dynamic analysis on crypto- ransomware by using machine learning: Gandcrab ransomware. Journal of Physics: Conference Series, 1717(1):012024. Virustotal (2023). Virustotal. Visalakshi, P. (2020). Detecting android malware using an improved filter based technique in embedded software. Microprocessors and Microsystems, 76:103115. Wu, Z. and Chen, Y. (2001). Genetic algorithm based selective neural network ensemble. IJCAI-01: proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, Seattle, Washington. Yoo, S., Kim, S., Kim, S., and Kang, B. B. (2021). Ai-hydra: Advanced hybrid approach using random forest and deep learning for malware classification. Information Sciences, 546:420–435. Zhu, S., Shi, J., Yang, L., Qin, B., Zhang, Z., Song, L., and Wang, G. (2020). Measuring and modeling the label dynamics of online anti-malware engines. USENIX Security Symposium, pages 2361–23	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM