Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 HTTP 攻擊封包酬載嵌入:透過歷時性語言模型
HTTP Attack Payload Embedding by Diachronic Language Model
作者 郭宇萍
Kuo, Yu-Ping
貢獻者 蕭舜文
Hsiao, Shun-Wen
郭宇萍
Kuo, Yu-Ping
關鍵詞 HTTP 攻擊封包酬載
歷時性語言模型
模型漂移
模型更新
HTTP attack packet payload
Diachronic language model
Model drift
Model updating
日期 2024
上傳時間 4-Sep-2024 14:04:20 (UTC+8)
摘要 近年來,網路威脅的變化日益迅速且複雜,駭客持續開發新攻擊手法以達成其目的。隨著人工智慧技術的進步,AI模型已成為檢測和預測網路威脅的重要工具。然而,由於網路安全情勢的複雜性和變動性,這些模型經常面臨預測能力下降和模型漂移的挑戰,而這些問題在實務應用時尤其需要被重視。 在本研究中,我們提出了一個調查網路安全模型生命週期的框架,該框架將生命週期分為五個階段:模型初始化、訓練、推理、漂移評估和更新。我們選擇使用HTTP攻擊封包酬載、語言模型及 MITRE ATT&CK 策略分類任務來實作此框架,並證明了其有效性。 我們的研究結果表明,持續預訓練語言模型能顯著提升模型在下游分類任務中的表現,尤其在長期推理方面。我們發現,全面微調整個分類模型不僅能有效減緩模型預測能力隨時間下降的現象,還能顯著提升模型表現的穩定性。此外,下游任務分類器的設計對整個分類模型的表現具有重大的影響。實驗結果指出,模型預測能力下降和模型漂移是經常性發生的問題,但僅使用20%的新資料即可顯著恢復模型表現,因此我們建議在出現這些問題時應及時更新模型,採用「歷時性」網絡安全模型對於有效防禦網絡威脅並確保對攻擊採取及時且適當的應對至關重要。
In recent years, the landscape of cyber threats has become increasingly dynamic and complex, with hackers continuously developing new attack vectors to achieve their goals. With advancements in artificial intelligence technology, AI models have become important tools for detecting and predicting cyber threats. However, due to the complexity and volatility of the cybersecurity landscape, these models often face challenges such as a deterioration in predictive performance and model drift, which are particularly critical to address in practical applications. In this study, we propose a framework for investigating the lifecycle of cybersecurity models, dividing the lifecycle into five stages: model initialization, training, inference, drift assessment, and updating. We choose to implement this framework using HTTP attack payloads, language models, and the MITRE ATT&CK tactic classification tasks, demonstrating its effectiveness. Our findings reveal that further pre-training of language models can significantly enhance downstream classification performance, particularly for long-term inference. Fine-tuning the entire classification model not only effectively mitigates the decline in predictive capability over time but also significantly improves the stability of model performance. Additionally, the design of downstream task classifiers has a major impact on the performance of the entire classification model. Experimental results show that model deterioration and model drift are recurrent issues, but using just 20% of new data can significantly restore model performance. Therefore, we recommend promptly updating the model when these issues arise. Adopting a "diachronic" cybersecurity model is crucial for effectively defending against cyber threats and ensuring timely and appropriate responses to attacks.
參考文獻 Ahmed, M. and Uddin, M. N. (2020). Cyber attack detection method based on nlp and ensemble learning approach. In 2020 23rd International Conference on Computer and Information Technology (ICCIT), pages 16. IEEE. Alaoui, R. L. et al. (2022). Web attacks detection using stacked generalization ensemble for lstms and word embedding. Procedia Computer Science, 215:687–696. Andresini, G., Pendlebury, F., Pierazzi, F., Loglisci, C., Appice, A., and Cavallaro, L. (2021). Insomnia: Towards concept-drift robustness in network intrusion detection. In Proceedings of the 14th ACM workshop on artificial intelligence and security, pages 111–122. Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T. (2016). Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606. Bro, P. V. (1998). A system for detecting network intruders in real-time. In Proc. 7th USENIX security symposium. Buber, E., Diri, B., and Sahingoz, O. K. (2018). Nlp based phishing attack detection from urls. In Intelligent Systems Design and Applications: 17th International Conference on Intelligent Systems Design and Applications (ISDA 2017) held in Delhi, India, December 14-16, 2017, pages 608–618. Springer. Check Point Software (2024). 2024 cyber security report. https://pages.checkpoint.com/2024-cyber-security-report. Accessed: 2024-07-01. Chen, W.-Z. and Hsiao, S.-W. (2023). An unsupervised learning approach for cyber attack analysis with http payload embedding. https://hdl.handle.net/11296/kadj42. Chen, Y., Hou, J., Li, Q., and Long, H. (2020). Ddos attack detection based on random forest. In 2020 IEEE International Conference on Progress in Informatics and Computing (PIC), pages 328–334. IEEE. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Feldhans, R., Wilke, A., Heindorf, S., Shaker, M. H., Hammer, B., Ngonga Ngomo, A.C., and Hüllermeier, E. (2021). Drift detection in text data with document embeddings. In Intelligent Data Engineering and Automated Learning–IDEAL 2021: 22nd International Conference, IDEAL 2021, Manchester, UK, November 25–27, 2021, Proceedings 22, pages 107–118. Springer. Ghanem, K., Aparicio-Navarro, F. J., Kyriakopoulos, K. G., Lambotharan, S., and Chambers, J. A. (2017). Support vector machine for network intrusion and cyber-attack detection. In 2017 sensor signal processing for defence conference (SSPD), pages 1–5. IEEE. Gniewkowski, M., Maciejewski, H., Surmacz, T., and Walentynowicz, W. (2023). Sec2vec: Anomaly detection in http traffic and malicious urls. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, pages 1154–1162. Gniewkowski, M., Maciejewski, H., Surmacz, T. R., and Walentynowicz, W. (2021). Http2vec: Embedding of http requests for detection of anomalous traffic. arXiv preprint arXiv:2108.01763. Goodman, E. L., Zimmerman, C., and Hudson, C. (2020). Packet2vec: Utilizing word2vec for feature extraction in packet data. arXiv preprint arXiv:2004.14477. Greco, S. and Cerquitelli, T. (2021). Drift lens: Real-time unsupervised concept drift detection by evaluating per-label embedding distributions. In 2021 International Conference on Data Mining Workshops (ICDMW), pages 341–349. IEEE. Han, D., Wang, Z., Chen, W., Wang, K., Yu, R., Wang, S., Zhang, H., Wang, Z., Jin, M., Yang, J., et al. (2023). Anomaly detection in the open world: Normality shift detection, explanation, and adaptation. In NDSS. Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2021). Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685. Jang, J., Ye, S., Lee, C., Yang, S., Shin, J., Han, J., Kim, G., and Seo, M. (2022). Temporalwiki: A lifelong benchmark for training and evaluating ever-evolving language models. arXiv preprint arXiv:2204.14211. Jang, J., Ye, S., Yang, S., Shin, J., Han, J., Kim, G., Choi, S. J., and Seo, M. (2021). Towards continual knowledge learning of language models. arXiv preprint arXiv:2110.03215. Jemal, I., Haddar, M. A., Cheikhrouhou, O., and Mahfoudhi, A. (2021). Malicious http request detection using code-level convolutional neural network. In Risks and Security of Internet and Systems: 15th International Conference, CRiSIS 2020, Paris, France, November 4–6, 2020, Revised Selected Papers 15, pages 317–324. Springer. Jin, X., Zhang, D., Zhu, H., Xiao, W., Li, S.-W., Wei, X., Arnold, A., and Ren, X. (2021). Lifelong pretraining: Continually adapting language models to emerging corpora. arXiv preprint arXiv:2110.08534. Johnson, C., Khadka, B., Basnet, R. B., and Doleck, T. (2020). Towards detecting and classifying malicious urls using deep learning. J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl., 11(4):31–48. Joulin, A., Grave, E., Bojanowski, P., and Mikolov, T. (2016). Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759. Kim, J., Kim, J., Thu, H. L. T., and Kim, H. (2016). Long short term memory recurrent neural network classifier for intrusion detection. In 2016 international conference on platform technology and service (PlatCon), pages 1–5. IEEE. Kiss, I., Genge, B., and Haller, P. (2015). A clustering-based approach to detect cyber attacks in process control systems. In 2015 IEEE 13th international conference on industrial informatics (INDIN), pages 142–148. IEEE. Laughter, A., Omari, S., Szczurek, P., and Perry, J. (2021). Detection of malicious http requests using header and url features. In Proceedings of the Future Technologies Conference (FTC) 2020, Volume 2, pages 449–468. Springer. Le, H., Pham, Q., Sahoo, D., and Hoi, S. C. (2018). Urlnet: Learning a url representation with deep learning for malicious url detection. arXiv preprint arXiv:1802.03162. Li, J., Zhang, H., and Wei, Z. (2020). The weighted word2vec paragraph vectors for anomaly detection over http traffic. IEEE Access, 8:141787–141798. Lin, L.-H. and Hsiao, S.-W. (2022). Attack tactic identification by transfer learning of language model. arXiv preprint arXiv:2209.00263 Liu, H., Lang, B., Liu, M., and Yan, H. (2019). Cnn and rnn based payload classification methods for attack detection. Knowledge-Based Systems, 163:332–341. Loureiro, D., Barbieri, F., Neves, L., Anke, L. E., and Camacho-Collados, J. (2022). Timelms: Diachronic language models from twitter. arXiv preprint arXiv:2202.03829. Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. MITRE ATT&CK (2024). Mitre att&ck. https://attack.mitre.org/. Accessed: 2024-07-01. Parizad, A. and Hatziadoniu, C. J. (2022). Cyber-attack detection using principal component analysis and noisy clustering algorithms: A collaborative machine learning-based framework. IEEE Transactions on Smart Grid, 13(6):4848–4861. Primartha, R. and Tama, B. A. (2017). Anomaly detection using random forest: A performance revisited. In 2017 International conference on data and software engineering (ICoDSE), pages 1–6. IEEE. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140):1–67. Röttger, P. and Pierrehumbert, J. B. (2021). Temporal adaptation of bert and performance on downstream document classification: Insights from social media. arXiv preprint arXiv:2104.08116. Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. Schwengber, B. H., Vergütz, A., Prates, N. G., and Nogueira, M. (2020). A method aware of concept drift for online botnet detection. In GLOBECOM 2020-2020 IEEE Global Communications Conference, pages 1–6. IEEE. Seyyar, Y. E., Yavuz, A. G., and Ünver, H. M. (2022). An attack detection framework based on bert and deep learning. IEEE Access, 10:68633–68644. Tekerek, A. (2021). A novel architecture for web-based attack detection using convolutional neural network. Computers & Security, 100:102096. Terai, A., Abe, S., Kojima, S., Takano, Y., and Koshijima, I. (2017). Cyber-attack detection for industrial control system monitoring with support vector machine based on communication profile. In 2017 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), pages 132–138. IEEE. Trend Micro (2024). Calibrating expansion: 2023 annual cybersecurity report. https://www.trendmicro.com/vinfo/us/security/research-and-analysis/threat-reports/roundup/calibrating-expansion-2023-annual-cybersecurity-threat-report. Accessed: 2024-07-01. Wan, K., Liang, Y., and Yoon, S. (2024). Online drift detection with maximum concept discrepancy. arXiv preprint arXiv:2407.05375. Yu, Y., Yan, H., Guan, H., and Zhou, H. (2018). Deephttp: semantics-structure model with attention for anomalous http traffic detection and pattern mining. arXiv preprint arXiv:1810.12751. Zhang, M., Xu, B., Bai, S., Lu, S., and Lin, Z. (2017). A deep learning method to detect web attacks using a specially designed cnn. In Neural Information Processing: 24th International Conference, ICONIP 2017, Guangzhou, China, November 14–18, 2017, Proceedings, Part V 24, pages 828–836. Springer.
描述 碩士
國立政治大學
資訊管理學系
111356026
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0111356026
資料類型 thesis
dc.contributor.advisor 蕭舜文zh_TW
dc.contributor.advisor Hsiao, Shun-Wenen_US
dc.contributor.author (Authors) 郭宇萍zh_TW
dc.contributor.author (Authors) Kuo, Yu-Pingen_US
dc.creator (作者) 郭宇萍zh_TW
dc.creator (作者) Kuo, Yu-Pingen_US
dc.date (日期) 2024en_US
dc.date.accessioned 4-Sep-2024 14:04:20 (UTC+8)-
dc.date.available 4-Sep-2024 14:04:20 (UTC+8)-
dc.date.issued (上傳時間) 4-Sep-2024 14:04:20 (UTC+8)-
dc.identifier (Other Identifiers) G0111356026en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/153153-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊管理學系zh_TW
dc.description (描述) 111356026zh_TW
dc.description.abstract (摘要) 近年來,網路威脅的變化日益迅速且複雜,駭客持續開發新攻擊手法以達成其目的。隨著人工智慧技術的進步,AI模型已成為檢測和預測網路威脅的重要工具。然而,由於網路安全情勢的複雜性和變動性,這些模型經常面臨預測能力下降和模型漂移的挑戰,而這些問題在實務應用時尤其需要被重視。 在本研究中,我們提出了一個調查網路安全模型生命週期的框架,該框架將生命週期分為五個階段:模型初始化、訓練、推理、漂移評估和更新。我們選擇使用HTTP攻擊封包酬載、語言模型及 MITRE ATT&CK 策略分類任務來實作此框架,並證明了其有效性。 我們的研究結果表明,持續預訓練語言模型能顯著提升模型在下游分類任務中的表現,尤其在長期推理方面。我們發現,全面微調整個分類模型不僅能有效減緩模型預測能力隨時間下降的現象,還能顯著提升模型表現的穩定性。此外,下游任務分類器的設計對整個分類模型的表現具有重大的影響。實驗結果指出,模型預測能力下降和模型漂移是經常性發生的問題,但僅使用20%的新資料即可顯著恢復模型表現,因此我們建議在出現這些問題時應及時更新模型,採用「歷時性」網絡安全模型對於有效防禦網絡威脅並確保對攻擊採取及時且適當的應對至關重要。zh_TW
dc.description.abstract (摘要) In recent years, the landscape of cyber threats has become increasingly dynamic and complex, with hackers continuously developing new attack vectors to achieve their goals. With advancements in artificial intelligence technology, AI models have become important tools for detecting and predicting cyber threats. However, due to the complexity and volatility of the cybersecurity landscape, these models often face challenges such as a deterioration in predictive performance and model drift, which are particularly critical to address in practical applications. In this study, we propose a framework for investigating the lifecycle of cybersecurity models, dividing the lifecycle into five stages: model initialization, training, inference, drift assessment, and updating. We choose to implement this framework using HTTP attack payloads, language models, and the MITRE ATT&CK tactic classification tasks, demonstrating its effectiveness. Our findings reveal that further pre-training of language models can significantly enhance downstream classification performance, particularly for long-term inference. Fine-tuning the entire classification model not only effectively mitigates the decline in predictive capability over time but also significantly improves the stability of model performance. Additionally, the design of downstream task classifiers has a major impact on the performance of the entire classification model. Experimental results show that model deterioration and model drift are recurrent issues, but using just 20% of new data can significantly restore model performance. Therefore, we recommend promptly updating the model when these issues arise. Adopting a "diachronic" cybersecurity model is crucial for effectively defending against cyber threats and ensuring timely and appropriate responses to attacks.en_US
dc.description.tableofcontents 摘要 i Abstract ii Contents iv List of Figures vii List of Tables ix 1 Introduction 1 2 Related Work 6 2.1 MITRE ATT&CK 6 2.2 HTTP Attack Detection 6 2.3 Model Drift and Diachronic Language Model 8 3 Methodology 10 3.1 Overview 10 3.2 Data Collection and Pre-processing 11 3.2.1 Data Collection 11 3.2.2 PCAP Parsing 13 3.2.3 Packet Text Pre-processing 13 3.2.4 Session Grouping 13 3.2.5 Session Text Pre-processing 14 3.2.6 Data Labeling 14 3.2.7 Dataset Definition 15 3.3 Language Model 15 3.3.1 FastText 15 3.3.2 T5 16 3.3.3 DistilBERT 16 3.3.4 LoRA 16 3.4 Model Strategy 17 3.5 Evaluation Metric 18 3.5.1 Accuracy 19 3.5.2 F1 macro 19 3.6 Indicator of Drift Assessment 20 3.6.1 Perplexity 20 3.6.2 Jensen–Shannon Distance 21 3.6.3 Maximum Mean Discrepancy 22 3.6.4 ROC AUC 22 4 Experiment & Result 24 4.1 Dataset 24 4.2 Model Architecture & Hyperparameter 26 4.2.1 Classifier of Ft-based Scheme 26 4.2.2 FastText 26 4.2.3 T5 26 4.2.4 DistilBERT 27 4.3 Exp. 1 - Initialization, Training, and One-time Inference 28 4.3.1 Result 28 4.3.2 Discussion 29 4.4 Exp. 2 - Long-term Inference 31 4.4.1 Result 31 4.4.2 Discussion 35 4.5 Exp. 3 - Drift Assessment 35 4.5.1 Result 36 4.5.2 Discussion 36 4.6 Exp. 4 - Model Updating 38 4.6.1 Result 38 4.6.2 Discussion 41 4.7 Exp. 5 - Exploring the Trade-offs in Model Updating 42 4.7.1 Result 45 4.7.2 Discussion 46 5 Discussion 48 6 Conclusion 49 A Appendix 50 A.1 Exp. 2 Detailed Result Table 50 A.1.1 Ft-based 51 A.1.2 E2E-based 53 A.2 Exp. 3 Detailed Result Table 55 A.3 Exp. 4 Detailed Result Table 56 A.3.1 Long-term inference 56 A.3.2 Value of Drift Indicator 57 A.3.3 Monthly updating 58 A.4 Best-fitting Distribution 59 Reference 61zh_TW
dc.format.extent 3423477 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0111356026en_US
dc.subject (關鍵詞) HTTP 攻擊封包酬載zh_TW
dc.subject (關鍵詞) 歷時性語言模型zh_TW
dc.subject (關鍵詞) 模型漂移zh_TW
dc.subject (關鍵詞) 模型更新zh_TW
dc.subject (關鍵詞) HTTP attack packet payloaden_US
dc.subject (關鍵詞) Diachronic language modelen_US
dc.subject (關鍵詞) Model driften_US
dc.subject (關鍵詞) Model updatingen_US
dc.title (題名) HTTP 攻擊封包酬載嵌入:透過歷時性語言模型zh_TW
dc.title (題名) HTTP Attack Payload Embedding by Diachronic Language Modelen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Ahmed, M. and Uddin, M. N. (2020). Cyber attack detection method based on nlp and ensemble learning approach. In 2020 23rd International Conference on Computer and Information Technology (ICCIT), pages 16. IEEE. Alaoui, R. L. et al. (2022). Web attacks detection using stacked generalization ensemble for lstms and word embedding. Procedia Computer Science, 215:687–696. Andresini, G., Pendlebury, F., Pierazzi, F., Loglisci, C., Appice, A., and Cavallaro, L. (2021). Insomnia: Towards concept-drift robustness in network intrusion detection. In Proceedings of the 14th ACM workshop on artificial intelligence and security, pages 111–122. Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T. (2016). Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606. Bro, P. V. (1998). A system for detecting network intruders in real-time. In Proc. 7th USENIX security symposium. Buber, E., Diri, B., and Sahingoz, O. K. (2018). Nlp based phishing attack detection from urls. In Intelligent Systems Design and Applications: 17th International Conference on Intelligent Systems Design and Applications (ISDA 2017) held in Delhi, India, December 14-16, 2017, pages 608–618. Springer. Check Point Software (2024). 2024 cyber security report. https://pages.checkpoint.com/2024-cyber-security-report. Accessed: 2024-07-01. Chen, W.-Z. and Hsiao, S.-W. (2023). An unsupervised learning approach for cyber attack analysis with http payload embedding. https://hdl.handle.net/11296/kadj42. Chen, Y., Hou, J., Li, Q., and Long, H. (2020). Ddos attack detection based on random forest. In 2020 IEEE International Conference on Progress in Informatics and Computing (PIC), pages 328–334. IEEE. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Feldhans, R., Wilke, A., Heindorf, S., Shaker, M. H., Hammer, B., Ngonga Ngomo, A.C., and Hüllermeier, E. (2021). Drift detection in text data with document embeddings. In Intelligent Data Engineering and Automated Learning–IDEAL 2021: 22nd International Conference, IDEAL 2021, Manchester, UK, November 25–27, 2021, Proceedings 22, pages 107–118. Springer. Ghanem, K., Aparicio-Navarro, F. J., Kyriakopoulos, K. G., Lambotharan, S., and Chambers, J. A. (2017). Support vector machine for network intrusion and cyber-attack detection. In 2017 sensor signal processing for defence conference (SSPD), pages 1–5. IEEE. Gniewkowski, M., Maciejewski, H., Surmacz, T., and Walentynowicz, W. (2023). Sec2vec: Anomaly detection in http traffic and malicious urls. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, pages 1154–1162. Gniewkowski, M., Maciejewski, H., Surmacz, T. R., and Walentynowicz, W. (2021). Http2vec: Embedding of http requests for detection of anomalous traffic. arXiv preprint arXiv:2108.01763. Goodman, E. L., Zimmerman, C., and Hudson, C. (2020). Packet2vec: Utilizing word2vec for feature extraction in packet data. arXiv preprint arXiv:2004.14477. Greco, S. and Cerquitelli, T. (2021). Drift lens: Real-time unsupervised concept drift detection by evaluating per-label embedding distributions. In 2021 International Conference on Data Mining Workshops (ICDMW), pages 341–349. IEEE. Han, D., Wang, Z., Chen, W., Wang, K., Yu, R., Wang, S., Zhang, H., Wang, Z., Jin, M., Yang, J., et al. (2023). Anomaly detection in the open world: Normality shift detection, explanation, and adaptation. In NDSS. Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2021). Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685. Jang, J., Ye, S., Lee, C., Yang, S., Shin, J., Han, J., Kim, G., and Seo, M. (2022). Temporalwiki: A lifelong benchmark for training and evaluating ever-evolving language models. arXiv preprint arXiv:2204.14211. Jang, J., Ye, S., Yang, S., Shin, J., Han, J., Kim, G., Choi, S. J., and Seo, M. (2021). Towards continual knowledge learning of language models. arXiv preprint arXiv:2110.03215. Jemal, I., Haddar, M. A., Cheikhrouhou, O., and Mahfoudhi, A. (2021). Malicious http request detection using code-level convolutional neural network. In Risks and Security of Internet and Systems: 15th International Conference, CRiSIS 2020, Paris, France, November 4–6, 2020, Revised Selected Papers 15, pages 317–324. Springer. Jin, X., Zhang, D., Zhu, H., Xiao, W., Li, S.-W., Wei, X., Arnold, A., and Ren, X. (2021). Lifelong pretraining: Continually adapting language models to emerging corpora. arXiv preprint arXiv:2110.08534. Johnson, C., Khadka, B., Basnet, R. B., and Doleck, T. (2020). Towards detecting and classifying malicious urls using deep learning. J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl., 11(4):31–48. Joulin, A., Grave, E., Bojanowski, P., and Mikolov, T. (2016). Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759. Kim, J., Kim, J., Thu, H. L. T., and Kim, H. (2016). Long short term memory recurrent neural network classifier for intrusion detection. In 2016 international conference on platform technology and service (PlatCon), pages 1–5. IEEE. Kiss, I., Genge, B., and Haller, P. (2015). A clustering-based approach to detect cyber attacks in process control systems. In 2015 IEEE 13th international conference on industrial informatics (INDIN), pages 142–148. IEEE. Laughter, A., Omari, S., Szczurek, P., and Perry, J. (2021). Detection of malicious http requests using header and url features. In Proceedings of the Future Technologies Conference (FTC) 2020, Volume 2, pages 449–468. Springer. Le, H., Pham, Q., Sahoo, D., and Hoi, S. C. (2018). Urlnet: Learning a url representation with deep learning for malicious url detection. arXiv preprint arXiv:1802.03162. Li, J., Zhang, H., and Wei, Z. (2020). The weighted word2vec paragraph vectors for anomaly detection over http traffic. IEEE Access, 8:141787–141798. Lin, L.-H. and Hsiao, S.-W. (2022). Attack tactic identification by transfer learning of language model. arXiv preprint arXiv:2209.00263 Liu, H., Lang, B., Liu, M., and Yan, H. (2019). Cnn and rnn based payload classification methods for attack detection. Knowledge-Based Systems, 163:332–341. Loureiro, D., Barbieri, F., Neves, L., Anke, L. E., and Camacho-Collados, J. (2022). Timelms: Diachronic language models from twitter. arXiv preprint arXiv:2202.03829. Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. MITRE ATT&CK (2024). Mitre att&ck. https://attack.mitre.org/. Accessed: 2024-07-01. Parizad, A. and Hatziadoniu, C. J. (2022). Cyber-attack detection using principal component analysis and noisy clustering algorithms: A collaborative machine learning-based framework. IEEE Transactions on Smart Grid, 13(6):4848–4861. Primartha, R. and Tama, B. A. (2017). Anomaly detection using random forest: A performance revisited. In 2017 International conference on data and software engineering (ICoDSE), pages 1–6. IEEE. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140):1–67. Röttger, P. and Pierrehumbert, J. B. (2021). Temporal adaptation of bert and performance on downstream document classification: Insights from social media. arXiv preprint arXiv:2104.08116. Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. Schwengber, B. H., Vergütz, A., Prates, N. G., and Nogueira, M. (2020). A method aware of concept drift for online botnet detection. In GLOBECOM 2020-2020 IEEE Global Communications Conference, pages 1–6. IEEE. Seyyar, Y. E., Yavuz, A. G., and Ünver, H. M. (2022). An attack detection framework based on bert and deep learning. IEEE Access, 10:68633–68644. Tekerek, A. (2021). A novel architecture for web-based attack detection using convolutional neural network. Computers & Security, 100:102096. Terai, A., Abe, S., Kojima, S., Takano, Y., and Koshijima, I. (2017). Cyber-attack detection for industrial control system monitoring with support vector machine based on communication profile. In 2017 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), pages 132–138. IEEE. Trend Micro (2024). Calibrating expansion: 2023 annual cybersecurity report. https://www.trendmicro.com/vinfo/us/security/research-and-analysis/threat-reports/roundup/calibrating-expansion-2023-annual-cybersecurity-threat-report. Accessed: 2024-07-01. Wan, K., Liang, Y., and Yoon, S. (2024). Online drift detection with maximum concept discrepancy. arXiv preprint arXiv:2407.05375. Yu, Y., Yan, H., Guan, H., and Zhou, H. (2018). Deephttp: semantics-structure model with attention for anomalous http traffic detection and pattern mining. arXiv preprint arXiv:1810.12751. Zhang, M., Xu, B., Bai, S., Lu, S., and Lin, Z. (2017). A deep learning method to detect web attacks using a specially designed cnn. In Neural Information Processing: 24th International Conference, ICONIP 2017, Guangzhou, China, November 14–18, 2017, Proceedings, Part V 24, pages 828–836. Springer.zh_TW