Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 基於基因演算法的惡意軟體標籤共識評分系統
A Novel Scoring System with Genetic Algorithm for Consensus Reaching in Malware Labels
作者 王詩渝
Wang, Shih-Yu
貢獻者 蕭舜文
Hsiao, Shun-Wen
王詩渝
Wang, Shih-Yu
關鍵詞 惡意軟體分群
基因演算法
成對比較
共識達成系統
Malware clustering
Genetic algorithm
Pairwise comparison
Consensus reaching system
日期 2023
上傳時間 1-Sep-2023 14:55:23 (UTC+8)
摘要 識別惡意軟體家族對於網絡安全研究人員來說至關重要。通常,防病毒軟體分析商會提供稱為AV標籤的惡意軟體標籤,其標籤根據病毒行為對惡意軟體樣本進行分類。然而,由於每個防病毒軟體分析商的觀點和分析方法不同,這些標籤經常具有不一致的格式和名稱。這種不一致性造成了標籤參考的混亂並降低了可信度。一些過往的方法為了解決這個問題,依賴於不一定有意義的加權方式來對分析商做篩選,或可能依賴於有偏見的投票制度。為了解決這個問題,我們提出了一種名為成對共識分數(PCS)的新穎評分系統。這種評分方法基於命名邏輯,以找出該群集是否與其他意見相似,而不是使用標籤名稱來判斷結果的質量。我們的共識達成過程結合了PCS和基因演算法,以根據不同的防病毒軟體分析商之間的協議對惡意軟體樣本進行分群分析,並找到最佳的標籤以良好地將惡意軟體進行分群並貼標。實驗結果顯示,我們的方法優於現有的方法,為惡意軟體樣本提供了更一致且可信的AV標籤。
Identifying malware families is crucial for researchers in cybersecurity. Usually, antivirus vendors provide malware labels called AV labels to categorize malware samples based on their behavior. However, due to the different viewpoints and analysis methods of each antivirus vendor, the labels often have inconsistent formats and names. This inconsistency creates clutter and reduces trustworthiness. Some previous approaches to address this issue relied on weightings that are not necessarily meaningful, or majority voting that can be biased. To solve this problem, we propose a novel scoring system called Pairwise Consensus Score (PCS). The scoring method is based on naming logic to determine whether the cluster is similar to other opinions instead of using labels to judge the quality of the results. Our consensus reaching process combines PCS and a Genetic Algorithm to cluster malware samples based on agreement among different antivirus vendors and find the best label that clusters the malware well. Experimental results show that our method outperforms existing methods, providing more consistent and trustworthy AV labels for malware samples.
參考文獻 Afianian, A., Niksefat, S., Sadeghiyan, B., and Baptiste, D. (2019). Malware dynamic
analysis evasion techniques: A survey. ACM Computing Surveys (CSUR), 52(6):1–28.
angavarapu, T. and Patil, N. (2019). A novel filter–wrapper hybrid greedy ensemble
approach optimized using the genetic algorithm to reduce the dimensionality of high-
dimensional biomedical datasets. Applied Soft Computing, 81:105538.
Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., and Siemens, C. (2014).
Drebin: Effective and explainable detection of android malware in your pocket. Ndss,
14:23–26.
Babaagba, K. O. and Adesanya, S. O. (2019). A study on the effect of feature selection on
malware analysis using machine learning. In Proceedings of the 2019 8th international
conference on educational and information technology, pages 51–55.
Bacci, A., Bartoli, A., Martinelli, F., Medvet, E., Mercaldo, F., Visaggio, C. A., et al.
(2018). Impact of code obfuscation on android malware detection based on static and
dynamic analysis. ICISSP, pages 379–385.
Bakour, K. and Ünver, H. M. (2020). Visdroid: Android malware classification based on
local and global image features, bag of visual words and machine learning techniques.
Neural Computing and Applications, 33:3133–3153.
Bontchev, V. (2005). Current status of the caro malware naming scheme. Virus Bulletin,
15.
Dib, M. (2021). On Leveraging Next-Generation Deep Learning Techniques for IoT Mal-
ware Classification, Family Attribution and Lineage Analysis. PhD thesis, Concordia
University.
Ducau, F. N., Rudd, E. M., Heppner, T. M., Long, A., and Berlin, K. (2019). Automatic
malware description via attribute tagging and similarity embedding. arXiv preprint
arXiv:1905.06262.
D’Angelo, G., Ficco, M., and Palmieri, F. (2021). Association rule-based malware
classification using common subsequences of api calls. Applied Soft Computing,
105:107234.
Fatima, A., Maurya, R., Dutta, M. K., Burget, R., and Masek, J. (2019). ndroid mal-
ware detection using genetic algorithm based optimized feature selection and machine
learning. 42nd International conference on telecommunications and signal processing
(TSP), pages 220–223.
Fejrskov, M., Vasilomanolakis, E., and Pedersen, J. M. (2022). A study on the use of
3rd party dns resolvers for malware filtering or censorship circumvention. ICT Systems
Security and Privacy Protection, 648.
Garg, V. and Yadav, R. K. (2020). Malware detection using multilevel ensemble super-
vised learning. In International Conference on Communication and Intelligent Systems,
pages 219–231. Springer.
Hamid, I. R. A., Khalid, N. S., Abdullah, N. A., Rahman, N. H. A., and Wen, C. C. (2017).
Android malware classification using k-means clustering algorithm. IOP Conference
Series: Materials Science and Engineering, 226:012105.
Holland, J. H. (1922). Adaptation in natural and artificial systems: an introductory analysis
with applications to biology, control, and artificial intelligence. MIT press.
Hsiao, S.-W., Sun, Y. S., and Chen, M. C. (2016). Behavior grouping of android malware
family. 2016 IEEE International Conference on Communications (ICC), pages 1–6.
Hurier, M., Allix, K., Bissyandé, T. F., Klein, J., and Le Traon, Y. (2016). n the lack
of consensus in anti-virus decisions: Metrics and insights on building ground truths of
android malware. Detection of Intrusions and Malware, and Vulnerability Assessment:
13th International Conference DIMVA, pages 142–162.
Hurier, M., Suarez-Tangil, G., Dash, S. K., Bissyandé, T. F., Le Traon, Y., Klein, J., and
Cavallaro, L. (2017). Euphony: Harmonious unification of cacophonous anti-virus ven-
dor labels for android malware. International Conference on Mining Software Reposi-
tories, 14:425–435.
Jang, J., Brumley, D., and Venkataraman, S. (2011). Bitshred: feature hashing malware
for scalable triage and semantic analysis. Proceedings of the 18th ACM conference on
Computer and communications security, pages 309–320.
Kotzias, P., Matic, S., Rivera, R., and Caballero, J. (2015). Certified pup: abuse in authen-
ticode code signing. Proceedings of the 22nd ACM SIGSAC Conference on Computer
and Communications Security, pages 465–478.
Kumar, S. and Mittal, S. K. (2020). Email spam and malware filtering using machine
learning and its applications. In Performance Management, pages 25–32. CRC Press.
Laboratories, N. A. R. (2021). Narlabs. https://owl.nchc.org.tw/malware.php.
Pektaş, A. and Acarman, T. (2018). Malware classification based on api calls and be-
haviour analysis. IET Information Security, 12(2):107–117.
Perdisci, R. and U, M. (2012). Vamo: towards a fully automated malware clustering va-
lidity analysis. Proceedings of the 28th Annual Computer Security Applications Con-
ference, pages 329–338.
Salem, A., Banescu, s., and Pretschner, A. (2021). Maat: Automatically analyzing virusto-
tal for accurate labeling and effective malware detection. ACM Transactions on Privacy
and Security (TOPS), 24(4):1–35.
Sebastin, M., Rivera, R., Kotzias, P., and Caballero, J. (2016). Avclass: A tool for massive
malware labeling. Research in Attacks, Intrusions, and Defenses, 9854:230––253.
Shukla, A., Pandey, H. M., and Mehrotra, D. (2015). Comparative review of selection
techniques in genetic algorithm. International Conference on Futuristic Trends on Com-
putational Analysis and Knowledge Management (ABLAZE), pages 515–519.
SonicWall (2023). 2022 cyber threat report.
Sung, A. H., Xu, J., Chavez, P., and Mukkamala, S. (2004). Static analyzer of vicious
executables (save). 20th Annual Computer Security Applications Conference, 326–334.
Usharani, S., Bala, P. M., and Mary, M. M. J. (2021). Dynamic analysis on crypto-
ransomware by using machine learning: Gandcrab ransomware. Journal of Physics:
Conference Series, 1717(1):012024.
Virustotal (2023). Virustotal.
Visalakshi, P. (2020). Detecting android malware using an improved filter based technique
in embedded software. Microprocessors and Microsystems, 76:103115.
Wu, Z. and Chen, Y. (2001). Genetic algorithm based selective neural network ensemble.
IJCAI-01: proceedings of the Seventeenth International Joint Conference on Artificial
Intelligence, Seattle, Washington.
Yoo, S., Kim, S., Kim, S., and Kang, B. B. (2021). Ai-hydra: Advanced hybrid approach
using random forest and deep learning for malware classification. Information Sciences,
546:420–435.
Zhu, S., Shi, J., Yang, L., Qin, B., Zhang, Z., Song, L., and Wang, G. (2020). Measuring
and modeling the label dynamics of online anti-malware engines. USENIX Security
Symposium, pages 2361–23
描述 碩士
國立政治大學
資訊管理學系
110356044
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0110356044
資料類型 thesis
dc.contributor.advisor 蕭舜文zh_TW
dc.contributor.advisor Hsiao, Shun-Wenen_US
dc.contributor.author (Authors) 王詩渝zh_TW
dc.contributor.author (Authors) Wang, Shih-Yuen_US
dc.creator (作者) 王詩渝zh_TW
dc.creator (作者) Wang, Shih-Yuen_US
dc.date (日期) 2023en_US
dc.date.accessioned 1-Sep-2023 14:55:23 (UTC+8)-
dc.date.available 1-Sep-2023 14:55:23 (UTC+8)-
dc.date.issued (上傳時間) 1-Sep-2023 14:55:23 (UTC+8)-
dc.identifier (Other Identifiers) G0110356044en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/146896-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊管理學系zh_TW
dc.description (描述) 110356044zh_TW
dc.description.abstract (摘要) 識別惡意軟體家族對於網絡安全研究人員來說至關重要。通常,防病毒軟體分析商會提供稱為AV標籤的惡意軟體標籤,其標籤根據病毒行為對惡意軟體樣本進行分類。然而,由於每個防病毒軟體分析商的觀點和分析方法不同,這些標籤經常具有不一致的格式和名稱。這種不一致性造成了標籤參考的混亂並降低了可信度。一些過往的方法為了解決這個問題,依賴於不一定有意義的加權方式來對分析商做篩選,或可能依賴於有偏見的投票制度。為了解決這個問題,我們提出了一種名為成對共識分數(PCS)的新穎評分系統。這種評分方法基於命名邏輯,以找出該群集是否與其他意見相似,而不是使用標籤名稱來判斷結果的質量。我們的共識達成過程結合了PCS和基因演算法,以根據不同的防病毒軟體分析商之間的協議對惡意軟體樣本進行分群分析,並找到最佳的標籤以良好地將惡意軟體進行分群並貼標。實驗結果顯示,我們的方法優於現有的方法,為惡意軟體樣本提供了更一致且可信的AV標籤。zh_TW
dc.description.abstract (摘要) Identifying malware families is crucial for researchers in cybersecurity. Usually, antivirus vendors provide malware labels called AV labels to categorize malware samples based on their behavior. However, due to the different viewpoints and analysis methods of each antivirus vendor, the labels often have inconsistent formats and names. This inconsistency creates clutter and reduces trustworthiness. Some previous approaches to address this issue relied on weightings that are not necessarily meaningful, or majority voting that can be biased. To solve this problem, we propose a novel scoring system called Pairwise Consensus Score (PCS). The scoring method is based on naming logic to determine whether the cluster is similar to other opinions instead of using labels to judge the quality of the results. Our consensus reaching process combines PCS and a Genetic Algorithm to cluster malware samples based on agreement among different antivirus vendors and find the best label that clusters the malware well. Experimental results show that our method outperforms existing methods, providing more consistent and trustworthy AV labels for malware samples.en_US
dc.description.tableofcontents 摘要 i
Abstract ii
Contents iv
List of Figures v
List of Tables vi
1 Introduction 1
2 Related work 6
2.1 Existing efforts on malware labeling 6
2.2 Genetic Algorithm 8
3 Methodology 10
3.1 Preprocessing for malware family extraction 10
3.2 Pairwise consensus score 13
3.3 Genetic algorithm for consensus reaching 16
4 Evaluation 21
4.1 Dataset Description 21
4.2 Experiments on different configuration 23
4.3 Pairwise consensus score evaluation 25
4.4 Evaluation with ground truth from dynamic analysis 29
5 Conclusions 32
Reference 34
zh_TW
dc.format.extent 1733942 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0110356044en_US
dc.subject (關鍵詞) 惡意軟體分群zh_TW
dc.subject (關鍵詞) 基因演算法zh_TW
dc.subject (關鍵詞) 成對比較zh_TW
dc.subject (關鍵詞) 共識達成系統zh_TW
dc.subject (關鍵詞) Malware clusteringen_US
dc.subject (關鍵詞) Genetic algorithmen_US
dc.subject (關鍵詞) Pairwise comparisonen_US
dc.subject (關鍵詞) Consensus reaching systemen_US
dc.title (題名) 基於基因演算法的惡意軟體標籤共識評分系統zh_TW
dc.title (題名) A Novel Scoring System with Genetic Algorithm for Consensus Reaching in Malware Labelsen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Afianian, A., Niksefat, S., Sadeghiyan, B., and Baptiste, D. (2019). Malware dynamic
analysis evasion techniques: A survey. ACM Computing Surveys (CSUR), 52(6):1–28.
angavarapu, T. and Patil, N. (2019). A novel filter–wrapper hybrid greedy ensemble
approach optimized using the genetic algorithm to reduce the dimensionality of high-
dimensional biomedical datasets. Applied Soft Computing, 81:105538.
Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., and Siemens, C. (2014).
Drebin: Effective and explainable detection of android malware in your pocket. Ndss,
14:23–26.
Babaagba, K. O. and Adesanya, S. O. (2019). A study on the effect of feature selection on
malware analysis using machine learning. In Proceedings of the 2019 8th international
conference on educational and information technology, pages 51–55.
Bacci, A., Bartoli, A., Martinelli, F., Medvet, E., Mercaldo, F., Visaggio, C. A., et al.
(2018). Impact of code obfuscation on android malware detection based on static and
dynamic analysis. ICISSP, pages 379–385.
Bakour, K. and Ünver, H. M. (2020). Visdroid: Android malware classification based on
local and global image features, bag of visual words and machine learning techniques.
Neural Computing and Applications, 33:3133–3153.
Bontchev, V. (2005). Current status of the caro malware naming scheme. Virus Bulletin,
15.
Dib, M. (2021). On Leveraging Next-Generation Deep Learning Techniques for IoT Mal-
ware Classification, Family Attribution and Lineage Analysis. PhD thesis, Concordia
University.
Ducau, F. N., Rudd, E. M., Heppner, T. M., Long, A., and Berlin, K. (2019). Automatic
malware description via attribute tagging and similarity embedding. arXiv preprint
arXiv:1905.06262.
D’Angelo, G., Ficco, M., and Palmieri, F. (2021). Association rule-based malware
classification using common subsequences of api calls. Applied Soft Computing,
105:107234.
Fatima, A., Maurya, R., Dutta, M. K., Burget, R., and Masek, J. (2019). ndroid mal-
ware detection using genetic algorithm based optimized feature selection and machine
learning. 42nd International conference on telecommunications and signal processing
(TSP), pages 220–223.
Fejrskov, M., Vasilomanolakis, E., and Pedersen, J. M. (2022). A study on the use of
3rd party dns resolvers for malware filtering or censorship circumvention. ICT Systems
Security and Privacy Protection, 648.
Garg, V. and Yadav, R. K. (2020). Malware detection using multilevel ensemble super-
vised learning. In International Conference on Communication and Intelligent Systems,
pages 219–231. Springer.
Hamid, I. R. A., Khalid, N. S., Abdullah, N. A., Rahman, N. H. A., and Wen, C. C. (2017).
Android malware classification using k-means clustering algorithm. IOP Conference
Series: Materials Science and Engineering, 226:012105.
Holland, J. H. (1922). Adaptation in natural and artificial systems: an introductory analysis
with applications to biology, control, and artificial intelligence. MIT press.
Hsiao, S.-W., Sun, Y. S., and Chen, M. C. (2016). Behavior grouping of android malware
family. 2016 IEEE International Conference on Communications (ICC), pages 1–6.
Hurier, M., Allix, K., Bissyandé, T. F., Klein, J., and Le Traon, Y. (2016). n the lack
of consensus in anti-virus decisions: Metrics and insights on building ground truths of
android malware. Detection of Intrusions and Malware, and Vulnerability Assessment:
13th International Conference DIMVA, pages 142–162.
Hurier, M., Suarez-Tangil, G., Dash, S. K., Bissyandé, T. F., Le Traon, Y., Klein, J., and
Cavallaro, L. (2017). Euphony: Harmonious unification of cacophonous anti-virus ven-
dor labels for android malware. International Conference on Mining Software Reposi-
tories, 14:425–435.
Jang, J., Brumley, D., and Venkataraman, S. (2011). Bitshred: feature hashing malware
for scalable triage and semantic analysis. Proceedings of the 18th ACM conference on
Computer and communications security, pages 309–320.
Kotzias, P., Matic, S., Rivera, R., and Caballero, J. (2015). Certified pup: abuse in authen-
ticode code signing. Proceedings of the 22nd ACM SIGSAC Conference on Computer
and Communications Security, pages 465–478.
Kumar, S. and Mittal, S. K. (2020). Email spam and malware filtering using machine
learning and its applications. In Performance Management, pages 25–32. CRC Press.
Laboratories, N. A. R. (2021). Narlabs. https://owl.nchc.org.tw/malware.php.
Pektaş, A. and Acarman, T. (2018). Malware classification based on api calls and be-
haviour analysis. IET Information Security, 12(2):107–117.
Perdisci, R. and U, M. (2012). Vamo: towards a fully automated malware clustering va-
lidity analysis. Proceedings of the 28th Annual Computer Security Applications Con-
ference, pages 329–338.
Salem, A., Banescu, s., and Pretschner, A. (2021). Maat: Automatically analyzing virusto-
tal for accurate labeling and effective malware detection. ACM Transactions on Privacy
and Security (TOPS), 24(4):1–35.
Sebastin, M., Rivera, R., Kotzias, P., and Caballero, J. (2016). Avclass: A tool for massive
malware labeling. Research in Attacks, Intrusions, and Defenses, 9854:230––253.
Shukla, A., Pandey, H. M., and Mehrotra, D. (2015). Comparative review of selection
techniques in genetic algorithm. International Conference on Futuristic Trends on Com-
putational Analysis and Knowledge Management (ABLAZE), pages 515–519.
SonicWall (2023). 2022 cyber threat report.
Sung, A. H., Xu, J., Chavez, P., and Mukkamala, S. (2004). Static analyzer of vicious
executables (save). 20th Annual Computer Security Applications Conference, 326–334.
Usharani, S., Bala, P. M., and Mary, M. M. J. (2021). Dynamic analysis on crypto-
ransomware by using machine learning: Gandcrab ransomware. Journal of Physics:
Conference Series, 1717(1):012024.
Virustotal (2023). Virustotal.
Visalakshi, P. (2020). Detecting android malware using an improved filter based technique
in embedded software. Microprocessors and Microsystems, 76:103115.
Wu, Z. and Chen, Y. (2001). Genetic algorithm based selective neural network ensemble.
IJCAI-01: proceedings of the Seventeenth International Joint Conference on Artificial
Intelligence, Seattle, Washington.
Yoo, S., Kim, S., Kim, S., and Kang, B. B. (2021). Ai-hydra: Advanced hybrid approach
using random forest and deep learning for malware classification. Information Sciences,
546:420–435.
Zhu, S., Shi, J., Yang, L., Qin, B., Zhang, Z., Song, L., and Wang, G. (2020). Measuring
and modeling the label dynamics of online anti-malware engines. USENIX Security
Symposium, pages 2361–23
zh_TW