Publications-Theses
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 基於基因演算法的惡意軟體標籤共識評分系統
A Novel Scoring System with Genetic Algorithm for Consensus Reaching in Malware Labels作者 王詩渝
Wang, Shih-Yu貢獻者 蕭舜文
Hsiao, Shun-Wen
王詩渝
Wang, Shih-Yu關鍵詞 惡意軟體分群
基因演算法
成對比較
共識達成系統
Malware clustering
Genetic algorithm
Pairwise comparison
Consensus reaching system日期 2023 上傳時間 1-Sep-2023 14:55:23 (UTC+8) 摘要 識別惡意軟體家族對於網絡安全研究人員來說至關重要。通常,防病毒軟體分析商會提供稱為AV標籤的惡意軟體標籤,其標籤根據病毒行為對惡意軟體樣本進行分類。然而,由於每個防病毒軟體分析商的觀點和分析方法不同,這些標籤經常具有不一致的格式和名稱。這種不一致性造成了標籤參考的混亂並降低了可信度。一些過往的方法為了解決這個問題,依賴於不一定有意義的加權方式來對分析商做篩選,或可能依賴於有偏見的投票制度。為了解決這個問題,我們提出了一種名為成對共識分數(PCS)的新穎評分系統。這種評分方法基於命名邏輯,以找出該群集是否與其他意見相似,而不是使用標籤名稱來判斷結果的質量。我們的共識達成過程結合了PCS和基因演算法,以根據不同的防病毒軟體分析商之間的協議對惡意軟體樣本進行分群分析,並找到最佳的標籤以良好地將惡意軟體進行分群並貼標。實驗結果顯示,我們的方法優於現有的方法,為惡意軟體樣本提供了更一致且可信的AV標籤。
Identifying malware families is crucial for researchers in cybersecurity. Usually, antivirus vendors provide malware labels called AV labels to categorize malware samples based on their behavior. However, due to the different viewpoints and analysis methods of each antivirus vendor, the labels often have inconsistent formats and names. This inconsistency creates clutter and reduces trustworthiness. Some previous approaches to address this issue relied on weightings that are not necessarily meaningful, or majority voting that can be biased. To solve this problem, we propose a novel scoring system called Pairwise Consensus Score (PCS). The scoring method is based on naming logic to determine whether the cluster is similar to other opinions instead of using labels to judge the quality of the results. Our consensus reaching process combines PCS and a Genetic Algorithm to cluster malware samples based on agreement among different antivirus vendors and find the best label that clusters the malware well. Experimental results show that our method outperforms existing methods, providing more consistent and trustworthy AV labels for malware samples.參考文獻 Afianian, A., Niksefat, S., Sadeghiyan, B., and Baptiste, D. (2019). Malware dynamicanalysis evasion techniques: A survey. ACM Computing Surveys (CSUR), 52(6):1–28.angavarapu, T. and Patil, N. (2019). A novel filter–wrapper hybrid greedy ensembleapproach optimized using the genetic algorithm to reduce the dimensionality of high-dimensional biomedical datasets. Applied Soft Computing, 81:105538.Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., and Siemens, C. (2014).Drebin: Effective and explainable detection of android malware in your pocket. Ndss,14:23–26.Babaagba, K. O. and Adesanya, S. O. (2019). A study on the effect of feature selection onmalware analysis using machine learning. In Proceedings of the 2019 8th internationalconference on educational and information technology, pages 51–55.Bacci, A., Bartoli, A., Martinelli, F., Medvet, E., Mercaldo, F., Visaggio, C. A., et al.(2018). Impact of code obfuscation on android malware detection based on static anddynamic analysis. ICISSP, pages 379–385.Bakour, K. and Ünver, H. M. (2020). Visdroid: Android malware classification based onlocal and global image features, bag of visual words and machine learning techniques.Neural Computing and Applications, 33:3133–3153.Bontchev, V. (2005). Current status of the caro malware naming scheme. Virus Bulletin,15.Dib, M. (2021). On Leveraging Next-Generation Deep Learning Techniques for IoT Mal-ware Classification, Family Attribution and Lineage Analysis. PhD thesis, ConcordiaUniversity.Ducau, F. N., Rudd, E. M., Heppner, T. M., Long, A., and Berlin, K. (2019). Automaticmalware description via attribute tagging and similarity embedding. arXiv preprintarXiv:1905.06262.D’Angelo, G., Ficco, M., and Palmieri, F. (2021). Association rule-based malwareclassification using common subsequences of api calls. Applied Soft Computing,105:107234.Fatima, A., Maurya, R., Dutta, M. K., Burget, R., and Masek, J. (2019). ndroid mal-ware detection using genetic algorithm based optimized feature selection and machinelearning. 42nd International conference on telecommunications and signal processing(TSP), pages 220–223.Fejrskov, M., Vasilomanolakis, E., and Pedersen, J. M. (2022). A study on the use of3rd party dns resolvers for malware filtering or censorship circumvention. ICT SystemsSecurity and Privacy Protection, 648.Garg, V. and Yadav, R. K. (2020). Malware detection using multilevel ensemble super-vised learning. In International Conference on Communication and Intelligent Systems,pages 219–231. Springer.Hamid, I. R. A., Khalid, N. S., Abdullah, N. A., Rahman, N. H. A., and Wen, C. C. (2017).Android malware classification using k-means clustering algorithm. IOP ConferenceSeries: Materials Science and Engineering, 226:012105.Holland, J. H. (1922). Adaptation in natural and artificial systems: an introductory analysiswith applications to biology, control, and artificial intelligence. MIT press.Hsiao, S.-W., Sun, Y. S., and Chen, M. C. (2016). Behavior grouping of android malwarefamily. 2016 IEEE International Conference on Communications (ICC), pages 1–6.Hurier, M., Allix, K., Bissyandé, T. F., Klein, J., and Le Traon, Y. (2016). n the lackof consensus in anti-virus decisions: Metrics and insights on building ground truths ofandroid malware. Detection of Intrusions and Malware, and Vulnerability Assessment:13th International Conference DIMVA, pages 142–162.Hurier, M., Suarez-Tangil, G., Dash, S. K., Bissyandé, T. F., Le Traon, Y., Klein, J., andCavallaro, L. (2017). Euphony: Harmonious unification of cacophonous anti-virus ven-dor labels for android malware. International Conference on Mining Software Reposi-tories, 14:425–435.Jang, J., Brumley, D., and Venkataraman, S. (2011). Bitshred: feature hashing malwarefor scalable triage and semantic analysis. Proceedings of the 18th ACM conference onComputer and communications security, pages 309–320.Kotzias, P., Matic, S., Rivera, R., and Caballero, J. (2015). Certified pup: abuse in authen-ticode code signing. Proceedings of the 22nd ACM SIGSAC Conference on Computerand Communications Security, pages 465–478.Kumar, S. and Mittal, S. K. (2020). Email spam and malware filtering using machinelearning and its applications. In Performance Management, pages 25–32. CRC Press.Laboratories, N. A. R. (2021). Narlabs. https://owl.nchc.org.tw/malware.php.Pektaş, A. and Acarman, T. (2018). Malware classification based on api calls and be-haviour analysis. IET Information Security, 12(2):107–117.Perdisci, R. and U, M. (2012). Vamo: towards a fully automated malware clustering va-lidity analysis. Proceedings of the 28th Annual Computer Security Applications Con-ference, pages 329–338.Salem, A., Banescu, s., and Pretschner, A. (2021). Maat: Automatically analyzing virusto-tal for accurate labeling and effective malware detection. ACM Transactions on Privacyand Security (TOPS), 24(4):1–35.Sebastin, M., Rivera, R., Kotzias, P., and Caballero, J. (2016). Avclass: A tool for massivemalware labeling. Research in Attacks, Intrusions, and Defenses, 9854:230––253.Shukla, A., Pandey, H. M., and Mehrotra, D. (2015). Comparative review of selectiontechniques in genetic algorithm. International Conference on Futuristic Trends on Com-putational Analysis and Knowledge Management (ABLAZE), pages 515–519.SonicWall (2023). 2022 cyber threat report.Sung, A. H., Xu, J., Chavez, P., and Mukkamala, S. (2004). Static analyzer of viciousexecutables (save). 20th Annual Computer Security Applications Conference, 326–334.Usharani, S., Bala, P. M., and Mary, M. M. J. (2021). Dynamic analysis on crypto-ransomware by using machine learning: Gandcrab ransomware. Journal of Physics:Conference Series, 1717(1):012024.Virustotal (2023). Virustotal.Visalakshi, P. (2020). Detecting android malware using an improved filter based techniquein embedded software. Microprocessors and Microsystems, 76:103115.Wu, Z. and Chen, Y. (2001). Genetic algorithm based selective neural network ensemble.IJCAI-01: proceedings of the Seventeenth International Joint Conference on ArtificialIntelligence, Seattle, Washington.Yoo, S., Kim, S., Kim, S., and Kang, B. B. (2021). Ai-hydra: Advanced hybrid approachusing random forest and deep learning for malware classification. Information Sciences,546:420–435.Zhu, S., Shi, J., Yang, L., Qin, B., Zhang, Z., Song, L., and Wang, G. (2020). Measuringand modeling the label dynamics of online anti-malware engines. USENIX SecuritySymposium, pages 2361–23 描述 碩士
國立政治大學
資訊管理學系
110356044資料來源 http://thesis.lib.nccu.edu.tw/record/#G0110356044 資料類型 thesis dc.contributor.advisor 蕭舜文 zh_TW dc.contributor.advisor Hsiao, Shun-Wen en_US dc.contributor.author (Authors) 王詩渝 zh_TW dc.contributor.author (Authors) Wang, Shih-Yu en_US dc.creator (作者) 王詩渝 zh_TW dc.creator (作者) Wang, Shih-Yu en_US dc.date (日期) 2023 en_US dc.date.accessioned 1-Sep-2023 14:55:23 (UTC+8) - dc.date.available 1-Sep-2023 14:55:23 (UTC+8) - dc.date.issued (上傳時間) 1-Sep-2023 14:55:23 (UTC+8) - dc.identifier (Other Identifiers) G0110356044 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/146896 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊管理學系 zh_TW dc.description (描述) 110356044 zh_TW dc.description.abstract (摘要) 識別惡意軟體家族對於網絡安全研究人員來說至關重要。通常,防病毒軟體分析商會提供稱為AV標籤的惡意軟體標籤,其標籤根據病毒行為對惡意軟體樣本進行分類。然而,由於每個防病毒軟體分析商的觀點和分析方法不同,這些標籤經常具有不一致的格式和名稱。這種不一致性造成了標籤參考的混亂並降低了可信度。一些過往的方法為了解決這個問題,依賴於不一定有意義的加權方式來對分析商做篩選,或可能依賴於有偏見的投票制度。為了解決這個問題,我們提出了一種名為成對共識分數(PCS)的新穎評分系統。這種評分方法基於命名邏輯,以找出該群集是否與其他意見相似,而不是使用標籤名稱來判斷結果的質量。我們的共識達成過程結合了PCS和基因演算法,以根據不同的防病毒軟體分析商之間的協議對惡意軟體樣本進行分群分析,並找到最佳的標籤以良好地將惡意軟體進行分群並貼標。實驗結果顯示,我們的方法優於現有的方法,為惡意軟體樣本提供了更一致且可信的AV標籤。 zh_TW dc.description.abstract (摘要) Identifying malware families is crucial for researchers in cybersecurity. Usually, antivirus vendors provide malware labels called AV labels to categorize malware samples based on their behavior. However, due to the different viewpoints and analysis methods of each antivirus vendor, the labels often have inconsistent formats and names. This inconsistency creates clutter and reduces trustworthiness. Some previous approaches to address this issue relied on weightings that are not necessarily meaningful, or majority voting that can be biased. To solve this problem, we propose a novel scoring system called Pairwise Consensus Score (PCS). The scoring method is based on naming logic to determine whether the cluster is similar to other opinions instead of using labels to judge the quality of the results. Our consensus reaching process combines PCS and a Genetic Algorithm to cluster malware samples based on agreement among different antivirus vendors and find the best label that clusters the malware well. Experimental results show that our method outperforms existing methods, providing more consistent and trustworthy AV labels for malware samples. en_US dc.description.tableofcontents 摘要 iAbstract iiContents ivList of Figures vList of Tables vi1 Introduction 12 Related work 62.1 Existing efforts on malware labeling 62.2 Genetic Algorithm 83 Methodology 103.1 Preprocessing for malware family extraction 103.2 Pairwise consensus score 133.3 Genetic algorithm for consensus reaching 164 Evaluation 214.1 Dataset Description 214.2 Experiments on different configuration 234.3 Pairwise consensus score evaluation 254.4 Evaluation with ground truth from dynamic analysis 295 Conclusions 32Reference 34 zh_TW dc.format.extent 1733942 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0110356044 en_US dc.subject (關鍵詞) 惡意軟體分群 zh_TW dc.subject (關鍵詞) 基因演算法 zh_TW dc.subject (關鍵詞) 成對比較 zh_TW dc.subject (關鍵詞) 共識達成系統 zh_TW dc.subject (關鍵詞) Malware clustering en_US dc.subject (關鍵詞) Genetic algorithm en_US dc.subject (關鍵詞) Pairwise comparison en_US dc.subject (關鍵詞) Consensus reaching system en_US dc.title (題名) 基於基因演算法的惡意軟體標籤共識評分系統 zh_TW dc.title (題名) A Novel Scoring System with Genetic Algorithm for Consensus Reaching in Malware Labels en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) Afianian, A., Niksefat, S., Sadeghiyan, B., and Baptiste, D. (2019). Malware dynamicanalysis evasion techniques: A survey. ACM Computing Surveys (CSUR), 52(6):1–28.angavarapu, T. and Patil, N. (2019). A novel filter–wrapper hybrid greedy ensembleapproach optimized using the genetic algorithm to reduce the dimensionality of high-dimensional biomedical datasets. Applied Soft Computing, 81:105538.Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., and Siemens, C. (2014).Drebin: Effective and explainable detection of android malware in your pocket. Ndss,14:23–26.Babaagba, K. O. and Adesanya, S. O. (2019). A study on the effect of feature selection onmalware analysis using machine learning. In Proceedings of the 2019 8th internationalconference on educational and information technology, pages 51–55.Bacci, A., Bartoli, A., Martinelli, F., Medvet, E., Mercaldo, F., Visaggio, C. A., et al.(2018). Impact of code obfuscation on android malware detection based on static anddynamic analysis. ICISSP, pages 379–385.Bakour, K. and Ünver, H. M. (2020). Visdroid: Android malware classification based onlocal and global image features, bag of visual words and machine learning techniques.Neural Computing and Applications, 33:3133–3153.Bontchev, V. (2005). Current status of the caro malware naming scheme. Virus Bulletin,15.Dib, M. (2021). On Leveraging Next-Generation Deep Learning Techniques for IoT Mal-ware Classification, Family Attribution and Lineage Analysis. PhD thesis, ConcordiaUniversity.Ducau, F. N., Rudd, E. M., Heppner, T. M., Long, A., and Berlin, K. (2019). Automaticmalware description via attribute tagging and similarity embedding. arXiv preprintarXiv:1905.06262.D’Angelo, G., Ficco, M., and Palmieri, F. (2021). Association rule-based malwareclassification using common subsequences of api calls. Applied Soft Computing,105:107234.Fatima, A., Maurya, R., Dutta, M. K., Burget, R., and Masek, J. (2019). ndroid mal-ware detection using genetic algorithm based optimized feature selection and machinelearning. 42nd International conference on telecommunications and signal processing(TSP), pages 220–223.Fejrskov, M., Vasilomanolakis, E., and Pedersen, J. M. (2022). A study on the use of3rd party dns resolvers for malware filtering or censorship circumvention. ICT SystemsSecurity and Privacy Protection, 648.Garg, V. and Yadav, R. K. (2020). Malware detection using multilevel ensemble super-vised learning. In International Conference on Communication and Intelligent Systems,pages 219–231. Springer.Hamid, I. R. A., Khalid, N. S., Abdullah, N. A., Rahman, N. H. A., and Wen, C. C. (2017).Android malware classification using k-means clustering algorithm. IOP ConferenceSeries: Materials Science and Engineering, 226:012105.Holland, J. H. (1922). Adaptation in natural and artificial systems: an introductory analysiswith applications to biology, control, and artificial intelligence. MIT press.Hsiao, S.-W., Sun, Y. S., and Chen, M. C. (2016). Behavior grouping of android malwarefamily. 2016 IEEE International Conference on Communications (ICC), pages 1–6.Hurier, M., Allix, K., Bissyandé, T. F., Klein, J., and Le Traon, Y. (2016). n the lackof consensus in anti-virus decisions: Metrics and insights on building ground truths ofandroid malware. Detection of Intrusions and Malware, and Vulnerability Assessment:13th International Conference DIMVA, pages 142–162.Hurier, M., Suarez-Tangil, G., Dash, S. K., Bissyandé, T. F., Le Traon, Y., Klein, J., andCavallaro, L. (2017). Euphony: Harmonious unification of cacophonous anti-virus ven-dor labels for android malware. International Conference on Mining Software Reposi-tories, 14:425–435.Jang, J., Brumley, D., and Venkataraman, S. (2011). Bitshred: feature hashing malwarefor scalable triage and semantic analysis. Proceedings of the 18th ACM conference onComputer and communications security, pages 309–320.Kotzias, P., Matic, S., Rivera, R., and Caballero, J. (2015). Certified pup: abuse in authen-ticode code signing. Proceedings of the 22nd ACM SIGSAC Conference on Computerand Communications Security, pages 465–478.Kumar, S. and Mittal, S. K. (2020). Email spam and malware filtering using machinelearning and its applications. In Performance Management, pages 25–32. CRC Press.Laboratories, N. A. R. (2021). Narlabs. https://owl.nchc.org.tw/malware.php.Pektaş, A. and Acarman, T. (2018). Malware classification based on api calls and be-haviour analysis. IET Information Security, 12(2):107–117.Perdisci, R. and U, M. (2012). Vamo: towards a fully automated malware clustering va-lidity analysis. Proceedings of the 28th Annual Computer Security Applications Con-ference, pages 329–338.Salem, A., Banescu, s., and Pretschner, A. (2021). Maat: Automatically analyzing virusto-tal for accurate labeling and effective malware detection. ACM Transactions on Privacyand Security (TOPS), 24(4):1–35.Sebastin, M., Rivera, R., Kotzias, P., and Caballero, J. (2016). Avclass: A tool for massivemalware labeling. Research in Attacks, Intrusions, and Defenses, 9854:230––253.Shukla, A., Pandey, H. M., and Mehrotra, D. (2015). Comparative review of selectiontechniques in genetic algorithm. International Conference on Futuristic Trends on Com-putational Analysis and Knowledge Management (ABLAZE), pages 515–519.SonicWall (2023). 2022 cyber threat report.Sung, A. H., Xu, J., Chavez, P., and Mukkamala, S. (2004). Static analyzer of viciousexecutables (save). 20th Annual Computer Security Applications Conference, 326–334.Usharani, S., Bala, P. M., and Mary, M. M. J. (2021). Dynamic analysis on crypto-ransomware by using machine learning: Gandcrab ransomware. Journal of Physics:Conference Series, 1717(1):012024.Virustotal (2023). Virustotal.Visalakshi, P. (2020). Detecting android malware using an improved filter based techniquein embedded software. Microprocessors and Microsystems, 76:103115.Wu, Z. and Chen, Y. (2001). Genetic algorithm based selective neural network ensemble.IJCAI-01: proceedings of the Seventeenth International Joint Conference on ArtificialIntelligence, Seattle, Washington.Yoo, S., Kim, S., Kim, S., and Kang, B. B. (2021). Ai-hydra: Advanced hybrid approachusing random forest and deep learning for malware classification. Information Sciences,546:420–435.Zhu, S., Shi, J., Yang, L., Qin, B., Zhang, Z., Song, L., and Wang, G. (2020). Measuringand modeling the label dynamics of online anti-malware engines. USENIX SecuritySymposium, pages 2361–23 zh_TW