學術產出-學位論文
文章檢視/開啟
書目匯出
-
題名 使用兩層語言模型的自監督日誌異常檢測
Self-Supervised Log Anomaly Detection Using Two-Layer Language Model作者 陳羿丞
Chen, Yi-Cheng貢獻者 蕭舜文
Hsiao, Shun-Wen
陳羿丞
Chen, Yi-Cheng關鍵詞 系統日誌
語言模型
自監督學習
單一分類
去識別化
System logs
Language models
Self-supervised
One-class classification
Anonymization日期 2024 上傳時間 4-九月-2024 14:06:20 (UTC+8) 摘要 隨著系統日益複雜以及潛在攻擊者的利用,機器生成數據(如安全日誌和監控信息)的海量且不斷增長,迫切需要及早檢測異常。語言模型在日誌異常檢測中面臨的主要挑戰包括:檢測不同粒度的異常、處理解析錯誤和日誌解析器導致的語義信息丟失、缺乏標註數據需要無監督異常檢測方法,以及在將分析外包時需要去噪和匿名機制以保護隱私。 為了解決這些挑戰,我們提出了一種自監督的兩層語言模型,利用BERT和Transformer編碼器來考慮不同層次的異常。我們的匿名化預處理技術消除了對日誌解析器的依賴並保護隱私。同時,我們將兩層語言模型與去噪機制和單類分類結合起來。 在多個數據集上的實驗結果證明了我們方法的有效性,在檢測異常方面達到了高精度和高召回率。我們提出的方法為日誌異常檢測提供了一個強有力的解決方案。
The immense and ever-growing volume of machine-generated data, including security logs and monitoring information, necessitates early anomaly detection due to increasing system complexity and potential exploitation by attackers. The primary challenges for language models in log anomaly detection include detecting different granularity of anomalies, handling parsing errors and loss of semantic information from log parsers, lack of labeled data requiring unsupervised anomaly detection approaches, the need for the denoising mechanism, and anonymization for privacy protection if outsourcing the analysis. To address these challenges, we propose the self-supervised two-layer language model that utilizes BERT and the transformer encoder to consider anomalies at different levels. The anonymization preprocessing technique eliminates reliance on log parsers and protects privacy. We also integrate the two-layer language model with a denoising mechanism and one-class classification. Experimental results on multiple datasets demonstrate the effectiveness of our approach, achieving high precision and recall rates in detecting anomalies. The proposed method offers a robust solution for log anomaly detection.參考文獻 [1] Prajjwal Bhargava, Aleksandr Drozd, and Anna Rogers. Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics. 2021. arXiv: 2110.01518 [cs.CL]. [2] Varun Chandola, Arindam Banerjee, and Vipin Kumar. “Anomaly detection: A sur- vey”. In: ACM computing surveys (CSUR) 41.3 (2009), pp. 1–58. [3] Kyunghyun Cho et al. “On the properties of neural machine translation: Encoder- decoder approaches”. In: arXiv preprint arXiv:1409.1259 (2014). [4] Jacob Devlin et al. “Bert: Pre-training of deep bidirectional transformers for lan- guage understanding”. In: arXiv preprint arXiv:1810.04805 (2018). [5] Min Du and Feifei Li. “Spell: Streaming parsing of system event logs”. In: 2016 n IEEE 16th International Conference ongData Mining (ICDM). IEEE. 2016, pp. 859– 864. [6] Min Du et al. “Deeplog: Anomaly detection and diagnosis from system logs through deep learning”. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. 2017, pp. 1285–1298. [7] Siavash Ghiasvand and Florina M Ciorba. “Anonymization of system logs for pre- serving privacy and reducing storage”. In: Advances in Information and Communi- cation Networks: Proceedings of the 2018 Future of Information and Communica- tion Conference (FICC), Vol. 2. Springer. 2019, pp. 162–179. [8] Siavash Ghiasvand and Florina M Ciorba. “Assessing data usefulness for failure analysis in anonymized system logs”. In: 2018 17th International Symposium on Parallel and Distributed Computing (ISPDC). IEEE. 2018, pp. 164–171. [9] Ian Goodfellow et al. “Generative adversarial nets”. In: Advances in neural infor- mation processing systems 27 (2014). [10] Haixuan Guo, Shuhan Yuan, and Xintao Wu. “Logbert: Log anomaly detection via bert”. In: 2021 international joint conference on neural networks (IJCNN). IEEE. 2021, pp. 1–8. [11] Pinjia He et al. “Drain: An online log parsing approach with fixed depth tree”. In: 2017 IEEE international conference on web services (ICWS). IEEE. 2017, pp. 33– 40. [12] Pinjia He et al. “Towards automated log parsing for large-scale log data analysis”. In: IEEE Transactions on Dependable and Secure Computing 15.6 (2017), pp. 931– 944. [13] Sepp Hochreiter and Jürgen Schmidhuber. “Long short-term memory”. In: Neural computation 9.8 (1997), pp. 1735–1780. n g [14] Edward J Hu et al. “Lora: Low-rank adaptation of large language models”. In: arXiv preprint arXiv:2106.09685 (2021). [15] Shaohan Huang et al. “Hitanomaly: Hierarchical transformers for anomaly detec- tion in system log”. In: IEEE transactions on network and service management 17.4 (2020), pp. 2064–2076. [16] Zhen Ming Jiang et al. “An automated approach for abstracting execution logs to execution events”. In: Journal of Software Maintenance and Evolution: Research and Practice 20.4 (2008), pp. 249–267. [17] Armand Joulin et al. “Fasttext. zip: Compressing text classification models”. In: arXiv preprint arXiv:1612.03651 (2016). [18] Diederik P Kingma and Max Welling. “Auto-encoding variational bayes”. In: arXiv preprint arXiv:1312.6114 (2013). [19] Max Landauer et al. “Deep learning for anomaly detection in log data: A survey”. In: Machine Learning with Applications 12 (2023), p. 100470. [20] Van-Hoang Le and Hongyu Zhang. “Log-based anomaly detection without log pars- ing”. In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE. 2021, pp. 492–504. [21] Yukyung Lee, Jina Kim, and Pilsung Kang. “Lanobert: System log anomaly de- tection based on bert masked language model”. In: Applied Soft Computing 146 (2023), p. 110689. [22] Yinglung Liang et al. “Failure prediction in ibm bluegene/l event logs”. In: Sev- enth IEEE International Conference on Data Mining (ICDM 2007). IEEE. 2007, pp. 583–588. n g [23] Jian-Guang Lou et al. “Mining invariants from console logs for system problem de- tection”. In: 2010 USENIX Annual Technical Conference (USENIX ATC 10). 2010. [24] Adetokunbo AO Makanju, A Nur Zincir-Heywood, and Evangelos E Milios. “Clus- tering event logs using iterative partitioning”. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 2009, pp. 1255–1264. [25] MarketsandMarkets. Log Management Market Size, Share and Global Market Fore- cast to 2026. Accessed: 2024-06-30. 2023. URL: https://www.marketsandmarkets. com/Market-Reports/log-management-market-69287057.html. [26] Weibin Meng et al. “Loganomaly: Unsupervised detection of sequential and quanti- tative anomalies in unstructured logs.” In: IJCAI. Vol. 19. 7. 2019, pp. 4739–4745. [27] Tomas Mikolov et al. “Efficient estimation of word representations in vector space”. In: arXiv preprint arXiv:1301.3781 (2013). [28] Sasho Nedelkoski et al. “Self-attentive classification-based anomaly detection in unstructured logs”. In: 2020 IEEE International Conference on Data Mining (ICDM). IEEE. 2020, pp. 1196–1201. [29] Adam Oliner and Jon Stearley. “What supercomputers say: A study of five system logs”. In: 37th annual IEEE/IFIP international conference on dependable systems and networks (DSN’07). IEEE. 2007, pp. 575–584. [30] Subhadarshi Panda et al. “Shuffled-token detection for refining pre-trained roberta”. In: Proceedings of the 2021 Conference of the North American Chapter of the Asso- ciation for Computational Linguistics: Student Research Workshop. 2021, pp. 88– 93. [31] Jeffrey Pennington, Richard Socher, and Christopher D Manning. “Glove: Global vectors for word representation”. In: Proceedings of the 2014 conference on empir- n g ical methods in natural language processing (EMNLP). 2014, pp. 1532–1543. [32] Matthew E. Peters et al. Deep contextualized word representations. 2018. arXiv: 1802.05365 [cs.CL]. [33] Alec Radford et al. “Improving language understanding by generative pre-training”. In: (2018). [34] Lukas Ruff et al. “Deep one-class classification”. In: International conference on machine learning. PMLR. 2018, pp. 4393–4402. [35] Bernhard Schölkopf et al. “Estimating the support of a high-dimensional distribu- tion”. In: Neural computation 13.7 (2001), pp. 1443–1471. [36] Wilson L Taylor. ““Cloze procedure”: A new tool for measuring readability”. In: Journalism quarterly 30.4 (1953), pp. 415–433. [37] Iulia Turc et al. “Well-Read Students Learn Better: On the Importance of Pre- training Compact Models”. In: arXiv preprint arXiv:1908.08962v2 (2019). [38] Ashish Vaswani et al. “Attention is all you need”. In: Advances in neural informa- tion processing systems 30 (2017). [39] Zhiwei Wang et al. “Multi-scale one-class recurrent neural networks for discrete event sequence anomaly detection”. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 2021, pp. 3726–3734. [40] Yonghui Wu et al. “Google’s neural machine translation system: Bridging the gap between human and machine translation”. In: arXiv preprint arXiv:1609.08144 (2016). [41] Wei Xu et al. “Detecting large-scale system problems by mining console logs”. In: n g Proceedings of the ACM SIGOPS 22nd symposium on Operating systems princi- ples. 2009, pp. 117–132. [42] Kenji Yamanishi and Yuko Maruyama. “Dynamic syslog mining for network failure monitoring”. In: Proceedings of the eleventh ACM SIGKDD international confer- ence on Knowledge discovery in data mining. 2005, pp. 499–508. [43] Ke Zhang et al. “Automated IT system failure prediction: A deep learning ap- proach”. In: 2016 IEEE International Conference on Big Data (Big Data). IEEE. 2016, pp. 1291–1300. [44] Xu Zhang et al. “Robust log-based anomaly detection on unstable log data”. In: Proceedings of the 2019 27th ACM joint meeting on European software engineer- ing conference and symposium on the foundations of software engineering. 2019, pp. 807–817. [45] Jieming Zhu et al. “Loghub: A large collection of system log datasets for ai-driven log analytics”. In: 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE). IEEE. 2023, pp. 355–366. 描述 碩士
國立政治大學
資訊管理學系
111356045資料來源 http://thesis.lib.nccu.edu.tw/record/#G0111356045 資料類型 thesis dc.contributor.advisor 蕭舜文 zh_TW dc.contributor.advisor Hsiao, Shun-Wen en_US dc.contributor.author (作者) 陳羿丞 zh_TW dc.contributor.author (作者) Chen, Yi-Cheng en_US dc.creator (作者) 陳羿丞 zh_TW dc.creator (作者) Chen, Yi-Cheng en_US dc.date (日期) 2024 en_US dc.date.accessioned 4-九月-2024 14:06:20 (UTC+8) - dc.date.available 4-九月-2024 14:06:20 (UTC+8) - dc.date.issued (上傳時間) 4-九月-2024 14:06:20 (UTC+8) - dc.identifier (其他 識別碼) G0111356045 en_US dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/153163 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊管理學系 zh_TW dc.description (描述) 111356045 zh_TW dc.description.abstract (摘要) 隨著系統日益複雜以及潛在攻擊者的利用,機器生成數據(如安全日誌和監控信息)的海量且不斷增長,迫切需要及早檢測異常。語言模型在日誌異常檢測中面臨的主要挑戰包括:檢測不同粒度的異常、處理解析錯誤和日誌解析器導致的語義信息丟失、缺乏標註數據需要無監督異常檢測方法,以及在將分析外包時需要去噪和匿名機制以保護隱私。 為了解決這些挑戰,我們提出了一種自監督的兩層語言模型,利用BERT和Transformer編碼器來考慮不同層次的異常。我們的匿名化預處理技術消除了對日誌解析器的依賴並保護隱私。同時,我們將兩層語言模型與去噪機制和單類分類結合起來。 在多個數據集上的實驗結果證明了我們方法的有效性,在檢測異常方面達到了高精度和高召回率。我們提出的方法為日誌異常檢測提供了一個強有力的解決方案。 zh_TW dc.description.abstract (摘要) The immense and ever-growing volume of machine-generated data, including security logs and monitoring information, necessitates early anomaly detection due to increasing system complexity and potential exploitation by attackers. The primary challenges for language models in log anomaly detection include detecting different granularity of anomalies, handling parsing errors and loss of semantic information from log parsers, lack of labeled data requiring unsupervised anomaly detection approaches, the need for the denoising mechanism, and anonymization for privacy protection if outsourcing the analysis. To address these challenges, we propose the self-supervised two-layer language model that utilizes BERT and the transformer encoder to consider anomalies at different levels. The anonymization preprocessing technique eliminates reliance on log parsers and protects privacy. We also integrate the two-layer language model with a denoising mechanism and one-class classification. Experimental results on multiple datasets demonstrate the effectiveness of our approach, achieving high precision and recall rates in detecting anomalies. The proposed method offers a robust solution for log anomaly detection. en_US dc.description.tableofcontents 1. Introduction 1 2. Related Work 5 2.1 Language Representation Model 5 2.2 Anonymized System Logs 7 2.3 LM Log Analysis 8 2.3.1 Log Parsers 8 2.3.2 Anomaly Detection 8 2.3.3 Misuse Detection 9 2.3.4 Discussion 10 3. Methodology 11 3.1 Overview 11 3.2 Anonymization Preprocessing 13 3.3 Pre-Training Tasks 15 3.3.1 Masked Language Modeling 15 3.3.2 Shuffled Token Detection 16 3.4 Two-Layer Language Model 17 4. Evaluation 22 4.1 Data Set 22 4.2 Implementation 23 4.3 Experiments 24 4.3.1 Single-layer versus two-layer 24 4.3.2 Ablation Test 26 4.3.3 Model Evaluation 29 4.3.4 Denoising Mechanism 30 5. Conclusion 32 Reference 33 zh_TW dc.format.extent 1943832 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0111356045 en_US dc.subject (關鍵詞) 系統日誌 zh_TW dc.subject (關鍵詞) 語言模型 zh_TW dc.subject (關鍵詞) 自監督學習 zh_TW dc.subject (關鍵詞) 單一分類 zh_TW dc.subject (關鍵詞) 去識別化 zh_TW dc.subject (關鍵詞) System logs en_US dc.subject (關鍵詞) Language models en_US dc.subject (關鍵詞) Self-supervised en_US dc.subject (關鍵詞) One-class classification en_US dc.subject (關鍵詞) Anonymization en_US dc.title (題名) 使用兩層語言模型的自監督日誌異常檢測 zh_TW dc.title (題名) Self-Supervised Log Anomaly Detection Using Two-Layer Language Model en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) [1] Prajjwal Bhargava, Aleksandr Drozd, and Anna Rogers. Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics. 2021. arXiv: 2110.01518 [cs.CL]. [2] Varun Chandola, Arindam Banerjee, and Vipin Kumar. “Anomaly detection: A sur- vey”. In: ACM computing surveys (CSUR) 41.3 (2009), pp. 1–58. [3] Kyunghyun Cho et al. “On the properties of neural machine translation: Encoder- decoder approaches”. In: arXiv preprint arXiv:1409.1259 (2014). [4] Jacob Devlin et al. “Bert: Pre-training of deep bidirectional transformers for lan- guage understanding”. In: arXiv preprint arXiv:1810.04805 (2018). [5] Min Du and Feifei Li. “Spell: Streaming parsing of system event logs”. In: 2016 n IEEE 16th International Conference ongData Mining (ICDM). IEEE. 2016, pp. 859– 864. [6] Min Du et al. “Deeplog: Anomaly detection and diagnosis from system logs through deep learning”. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. 2017, pp. 1285–1298. [7] Siavash Ghiasvand and Florina M Ciorba. “Anonymization of system logs for pre- serving privacy and reducing storage”. In: Advances in Information and Communi- cation Networks: Proceedings of the 2018 Future of Information and Communica- tion Conference (FICC), Vol. 2. Springer. 2019, pp. 162–179. [8] Siavash Ghiasvand and Florina M Ciorba. “Assessing data usefulness for failure analysis in anonymized system logs”. In: 2018 17th International Symposium on Parallel and Distributed Computing (ISPDC). IEEE. 2018, pp. 164–171. [9] Ian Goodfellow et al. “Generative adversarial nets”. In: Advances in neural infor- mation processing systems 27 (2014). [10] Haixuan Guo, Shuhan Yuan, and Xintao Wu. “Logbert: Log anomaly detection via bert”. In: 2021 international joint conference on neural networks (IJCNN). IEEE. 2021, pp. 1–8. [11] Pinjia He et al. “Drain: An online log parsing approach with fixed depth tree”. In: 2017 IEEE international conference on web services (ICWS). IEEE. 2017, pp. 33– 40. [12] Pinjia He et al. “Towards automated log parsing for large-scale log data analysis”. In: IEEE Transactions on Dependable and Secure Computing 15.6 (2017), pp. 931– 944. [13] Sepp Hochreiter and Jürgen Schmidhuber. “Long short-term memory”. In: Neural computation 9.8 (1997), pp. 1735–1780. n g [14] Edward J Hu et al. “Lora: Low-rank adaptation of large language models”. In: arXiv preprint arXiv:2106.09685 (2021). [15] Shaohan Huang et al. “Hitanomaly: Hierarchical transformers for anomaly detec- tion in system log”. In: IEEE transactions on network and service management 17.4 (2020), pp. 2064–2076. [16] Zhen Ming Jiang et al. “An automated approach for abstracting execution logs to execution events”. In: Journal of Software Maintenance and Evolution: Research and Practice 20.4 (2008), pp. 249–267. [17] Armand Joulin et al. “Fasttext. zip: Compressing text classification models”. In: arXiv preprint arXiv:1612.03651 (2016). [18] Diederik P Kingma and Max Welling. “Auto-encoding variational bayes”. In: arXiv preprint arXiv:1312.6114 (2013). [19] Max Landauer et al. “Deep learning for anomaly detection in log data: A survey”. In: Machine Learning with Applications 12 (2023), p. 100470. [20] Van-Hoang Le and Hongyu Zhang. “Log-based anomaly detection without log pars- ing”. In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE. 2021, pp. 492–504. [21] Yukyung Lee, Jina Kim, and Pilsung Kang. “Lanobert: System log anomaly de- tection based on bert masked language model”. In: Applied Soft Computing 146 (2023), p. 110689. [22] Yinglung Liang et al. “Failure prediction in ibm bluegene/l event logs”. In: Sev- enth IEEE International Conference on Data Mining (ICDM 2007). IEEE. 2007, pp. 583–588. n g [23] Jian-Guang Lou et al. “Mining invariants from console logs for system problem de- tection”. In: 2010 USENIX Annual Technical Conference (USENIX ATC 10). 2010. [24] Adetokunbo AO Makanju, A Nur Zincir-Heywood, and Evangelos E Milios. “Clus- tering event logs using iterative partitioning”. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 2009, pp. 1255–1264. [25] MarketsandMarkets. Log Management Market Size, Share and Global Market Fore- cast to 2026. Accessed: 2024-06-30. 2023. URL: https://www.marketsandmarkets. com/Market-Reports/log-management-market-69287057.html. [26] Weibin Meng et al. “Loganomaly: Unsupervised detection of sequential and quanti- tative anomalies in unstructured logs.” In: IJCAI. Vol. 19. 7. 2019, pp. 4739–4745. [27] Tomas Mikolov et al. “Efficient estimation of word representations in vector space”. In: arXiv preprint arXiv:1301.3781 (2013). [28] Sasho Nedelkoski et al. “Self-attentive classification-based anomaly detection in unstructured logs”. In: 2020 IEEE International Conference on Data Mining (ICDM). IEEE. 2020, pp. 1196–1201. [29] Adam Oliner and Jon Stearley. “What supercomputers say: A study of five system logs”. In: 37th annual IEEE/IFIP international conference on dependable systems and networks (DSN’07). IEEE. 2007, pp. 575–584. [30] Subhadarshi Panda et al. “Shuffled-token detection for refining pre-trained roberta”. In: Proceedings of the 2021 Conference of the North American Chapter of the Asso- ciation for Computational Linguistics: Student Research Workshop. 2021, pp. 88– 93. [31] Jeffrey Pennington, Richard Socher, and Christopher D Manning. “Glove: Global vectors for word representation”. In: Proceedings of the 2014 conference on empir- n g ical methods in natural language processing (EMNLP). 2014, pp. 1532–1543. [32] Matthew E. Peters et al. Deep contextualized word representations. 2018. arXiv: 1802.05365 [cs.CL]. [33] Alec Radford et al. “Improving language understanding by generative pre-training”. In: (2018). [34] Lukas Ruff et al. “Deep one-class classification”. In: International conference on machine learning. PMLR. 2018, pp. 4393–4402. [35] Bernhard Schölkopf et al. “Estimating the support of a high-dimensional distribu- tion”. In: Neural computation 13.7 (2001), pp. 1443–1471. [36] Wilson L Taylor. ““Cloze procedure”: A new tool for measuring readability”. In: Journalism quarterly 30.4 (1953), pp. 415–433. [37] Iulia Turc et al. “Well-Read Students Learn Better: On the Importance of Pre- training Compact Models”. In: arXiv preprint arXiv:1908.08962v2 (2019). [38] Ashish Vaswani et al. “Attention is all you need”. In: Advances in neural informa- tion processing systems 30 (2017). [39] Zhiwei Wang et al. “Multi-scale one-class recurrent neural networks for discrete event sequence anomaly detection”. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 2021, pp. 3726–3734. [40] Yonghui Wu et al. “Google’s neural machine translation system: Bridging the gap between human and machine translation”. In: arXiv preprint arXiv:1609.08144 (2016). [41] Wei Xu et al. “Detecting large-scale system problems by mining console logs”. In: n g Proceedings of the ACM SIGOPS 22nd symposium on Operating systems princi- ples. 2009, pp. 117–132. [42] Kenji Yamanishi and Yuko Maruyama. “Dynamic syslog mining for network failure monitoring”. In: Proceedings of the eleventh ACM SIGKDD international confer- ence on Knowledge discovery in data mining. 2005, pp. 499–508. [43] Ke Zhang et al. “Automated IT system failure prediction: A deep learning ap- proach”. In: 2016 IEEE International Conference on Big Data (Big Data). IEEE. 2016, pp. 1291–1300. [44] Xu Zhang et al. “Robust log-based anomaly detection on unstable log data”. In: Proceedings of the 2019 27th ACM joint meeting on European software engineer- ing conference and symposium on the foundations of software engineering. 2019, pp. 807–817. [45] Jieming Zhu et al. “Loghub: A large collection of system log datasets for ai-driven log analytics”. In: 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE). IEEE. 2023, pp. 355–366. zh_TW