異常偵測方法比較分析

學術產出-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

政大圖書館

學術資源探索系統

Citation Infomation

No doi shows Citation Infomation

Simple Record
Full Record

題名	異常偵測方法比較分析 Comparative Analysis of Anomaly Detection Methods
作者	林映孝 Lin, Ying-Hsiao
貢獻者	周珮婷<br>陳怡如 Chou, Pei-Ting<br>Chen,Yi-Ju 林映孝 Lin, Ying-Hsiao
關鍵詞	異常偵測實證實驗效果評估模型比較集成投票 Anomaly Detection Empirical Experiment Performance Evaluation Model Comparison Ensemble Voting
日期	2023
上傳時間	2-Aug-2023 13:05:05 (UTC+8)
摘要	異常偵測是機器學習和數據分析領域的重要挑戰之一，目前在實務上多數應用於欺詐偵測、網絡安全和故障診斷等不同領域。首先，本研究探討各種異常偵測方法的運作原理、優點和缺點。例如，One-Class SVM適用於高維度數據，但需要仔細選擇kernal function和參數。Gaussian Mixture Model能夠擬合複雜的資料分佈，但需要大量的參數估計。接著，本研究比較分析了六種不同的異常偵測技術，分別是One-Class SVM, Gaussian Mixture Model, Autoencoder, Isolation Forest, Local Outlier Factor，以及Ensemble Voting前五種方法。並將六種模型應用在五個不同的數據集上進行了實證實驗，以F1-score和Balanced Accuracy，評估每種模型方法在不同數據上的表現。最後，研究結果顯示，Isolation Forest在特定某些數據集上表現出相當的性能，但是Ensemble Voting的模型在每個數據集上皆表現優異。 Anomaly detection is one of the significant challenges in the fields of machine learning and data analysis. It is primarily applied in various practical domains like fraud detection, cybersecurity, and fault diagnosis. Initially, this study explores the operational principles, advantages, and disadvantages of various anomaly detection methods. For instance, the One-Class SVM is suitable for high-dimensional data, yet careful selection of the kernel function and parameters is required. The Gaussian Mixture Model can fit complex data distributions, but it requires numerous parameter estimations. Subsequently, this research conducts comparative analyses of six different anomaly detection techniques, namely One-Class SVM, Gaussian Mixture Model, Autoencoder, Isolation Forest, Local Outlier Factor, and Ensemble Voting of the former five methods. The six models are tested empirically on five different datasets, with their performance on each dataset evaluated using F1-score and Balanced Accuracy. Ultimately, the research findings indicate that while the Isolation Forest demonstrates substantial performance on certain specific datasets, the Ensemble Voting model performs excellently across all datasets.
參考文獻	Berk, R. A. (2006). An introduction to ensemble methods for data analysis. Sociological methods research, 34(3):263–295. Breiman, L. (1996). Bagging predictors. Machine learning, 24:123–140. Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J. (2000). Lof: identifying density- based local outliers. SIGMOD Rec., 29(2):93–104. Chalapathy, R. and Chawla, S. (2019). Deep learning for anomaly detection: A survey. arXiv preprint arXiv:1901.03407. Chandola, V., Banerjee, A., and Kumar, V. (2009). Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3):1–58. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from in- complete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1):1–38. Gandhi, I. and Pandey, M. (2015). Hybrid ensemble of classifiers using voting. In 2015 international conference on green computing and Internet of Things (ICGCIoT), pages 399–404. IEEE. Ghahramani, Z. (2004). Unsupervised learning. In Advanced Lectures on Machine Learn- ing: ML Summer Schools 2003, Canberra, Australia, February 2-14, 2003, Tübingen, Germany, August 4-16, 2003, Revised Lectures, pages 72–112. Han, S., Hu, X., Huang, H., Jiang, M., and Zhao, Y. (2022). Adbench: Anomaly detection benchmark. Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507. Khan, S. and Madden, M. (2014). One-class classification: Taxonomy of study and review of techniques. The Knowledge Engineering Review, 29(3):345–374. Laorden, C., Ugarte-Pedrero, X., Santos, I., Sanz, B., Nieves, J., and Bringas, P. G. (2014). Study on the effectiveness of anomaly detection for spam filtering. Information Sci- ences, 277:421–444. Learned-Miller, E. G. (2014). Introduction to supervised learning. I: Department of Com- puter Science, University of Massachusetts. 3. Liu, F. T., Ting, K. M., and Zhou, Z.-H. (2008). Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pages 413–422. Markou, M. and Singh, S. (2003). Novelty detection: a review—part 1: statistical ap- proaches. Signal Processing, 83:2481–2497. Rushe, E. and Namee, B. M. (2019). Anomaly detection in raw audio using deep autore- gressive networks. In ICASSP 2019 - 2019 IEEE International Conference on Acous- tics, Speech and Signal Processing (ICASSP), pages 3597–3601. Schapire, R. E. (1999). A brief introduction to boosting. In IJCAI, volume 99, pages 1401–1406. Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., and Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural Compu- tation, 13(7):1443–1471. Scrucca, L. (2023). Entropy-based anomaly detection for gaussian mixture modeling. Algorithms, 16(4):195. Sutton, R. S. and Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press. van der Maaten, L. and Hinton, G. (2008). Visualizing data using t-sne. Journal of Ma- chine Learning Research, 9(86):2579–2605. Vareldzhan, G., Yurkov, K., and Ushenin, K. (2021). Anomaly detection in image datasets using convolutional neural networks, center loss, and mahalanobis distance. In 2021 Ural Symposium on Biomedical Engineering, Radioelectronics and Information Tech- nology (USBEREIT), pages 0387–0390.
描述	碩士國立政治大學統計學系 110354025
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0110354025
資料類型	thesis

dc.contributor.advisor	周珮婷<br>陳怡如	zh_TW
dc.contributor.advisor	Chou, Pei-Ting<br>Chen,Yi-Ju	en_US
dc.contributor.author (Authors)	林映孝	zh_TW
dc.contributor.author (Authors)	Lin, Ying-Hsiao	en_US
dc.creator (作者)	林映孝	zh_TW
dc.creator (作者)	Lin, Ying-Hsiao	en_US
dc.date (日期)	2023	en_US
dc.date.accessioned	2-Aug-2023 13:05:05 (UTC+8)	-
dc.date.available	2-Aug-2023 13:05:05 (UTC+8)	-
dc.date.issued (上傳時間)	2-Aug-2023 13:05:05 (UTC+8)	-
dc.identifier (Other Identifiers)	G0110354025	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/146310	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計學系	zh_TW
dc.description (描述)	110354025	zh_TW
dc.description.abstract (摘要)	異常偵測是機器學習和數據分析領域的重要挑戰之一，目前在實務上多數應用於欺詐偵測、網絡安全和故障診斷等不同領域。首先，本研究探討各種異常偵測方法的運作原理、優點和缺點。例如，One-Class SVM適用於高維度數據，但需要仔細選擇kernal function和參數。Gaussian Mixture Model能夠擬合複雜的資料分佈，但需要大量的參數估計。接著，本研究比較分析了六種不同的異常偵測技術，分別是One-Class SVM, Gaussian Mixture Model, Autoencoder, Isolation Forest, Local Outlier Factor，以及Ensemble Voting前五種方法。並將六種模型應用在五個不同的數據集上進行了實證實驗，以F1-score和Balanced Accuracy，評估每種模型方法在不同數據上的表現。最後，研究結果顯示，Isolation Forest在特定某些數據集上表現出相當的性能，但是Ensemble Voting的模型在每個數據集上皆表現優異。	zh_TW
dc.description.abstract (摘要)	Anomaly detection is one of the significant challenges in the fields of machine learning and data analysis. It is primarily applied in various practical domains like fraud detection, cybersecurity, and fault diagnosis. Initially, this study explores the operational principles, advantages, and disadvantages of various anomaly detection methods. For instance, the One-Class SVM is suitable for high-dimensional data, yet careful selection of the kernel function and parameters is required. The Gaussian Mixture Model can fit complex data distributions, but it requires numerous parameter estimations. Subsequently, this research conducts comparative analyses of six different anomaly detection techniques, namely One-Class SVM, Gaussian Mixture Model, Autoencoder, Isolation Forest, Local Outlier Factor, and Ensemble Voting of the former five methods. The six models are tested empirically on five different datasets, with their performance on each dataset evaluated using F1-score and Balanced Accuracy. Ultimately, the research findings indicate that while the Isolation Forest demonstrates substantial performance on certain specific datasets, the Ensemble Voting model performs excellently across all datasets.	en_US
dc.description.tableofcontents	摘要 i Abstract ii 目次 iii 圖目錄 iv 表目錄 v 第一章緒論 1 第二章文獻回顧 3 第一節機器學習 3 1.1 監督式學習 3 1.2 強化式學習 3 1.3 非監督式學習 4 第二節異常偵測 4 第三章研究方法 6 第一節異常偵測模型 6 第二節評估指標 9 第三節本文方法 11 3.1 EnsembleLearning 11 3.2 EnsembleVoting:Hard Voting 11 第四章實驗結果 13 第一節資料介紹 13 第二節資料視覺化 13 第三節資料預處理 17 第四節實驗設置 17 第五節結果與分析 17 第五章結論與建議 21 參考文獻 22	zh_TW
dc.format.extent	2348743 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0110354025	en_US
dc.subject (關鍵詞)	異常偵測	zh_TW
dc.subject (關鍵詞)	實證實驗	zh_TW
dc.subject (關鍵詞)	效果評估	zh_TW
dc.subject (關鍵詞)	模型比較	zh_TW
dc.subject (關鍵詞)	集成投票	zh_TW
dc.subject (關鍵詞)	Anomaly Detection	en_US
dc.subject (關鍵詞)	Empirical Experiment	en_US
dc.subject (關鍵詞)	Performance Evaluation	en_US
dc.subject (關鍵詞)	Model Comparison	en_US
dc.subject (關鍵詞)	Ensemble Voting	en_US
dc.title (題名)	異常偵測方法比較分析	zh_TW
dc.title (題名)	Comparative Analysis of Anomaly Detection Methods	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Berk, R. A. (2006). An introduction to ensemble methods for data analysis. Sociological methods research, 34(3):263–295. Breiman, L. (1996). Bagging predictors. Machine learning, 24:123–140. Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J. (2000). Lof: identifying density- based local outliers. SIGMOD Rec., 29(2):93–104. Chalapathy, R. and Chawla, S. (2019). Deep learning for anomaly detection: A survey. arXiv preprint arXiv:1901.03407. Chandola, V., Banerjee, A., and Kumar, V. (2009). Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3):1–58. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from in- complete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1):1–38. Gandhi, I. and Pandey, M. (2015). Hybrid ensemble of classifiers using voting. In 2015 international conference on green computing and Internet of Things (ICGCIoT), pages 399–404. IEEE. Ghahramani, Z. (2004). Unsupervised learning. In Advanced Lectures on Machine Learn- ing: ML Summer Schools 2003, Canberra, Australia, February 2-14, 2003, Tübingen, Germany, August 4-16, 2003, Revised Lectures, pages 72–112. Han, S., Hu, X., Huang, H., Jiang, M., and Zhao, Y. (2022). Adbench: Anomaly detection benchmark. Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507. Khan, S. and Madden, M. (2014). One-class classification: Taxonomy of study and review of techniques. The Knowledge Engineering Review, 29(3):345–374. Laorden, C., Ugarte-Pedrero, X., Santos, I., Sanz, B., Nieves, J., and Bringas, P. G. (2014). Study on the effectiveness of anomaly detection for spam filtering. Information Sci- ences, 277:421–444. Learned-Miller, E. G. (2014). Introduction to supervised learning. I: Department of Com- puter Science, University of Massachusetts. 3. Liu, F. T., Ting, K. M., and Zhou, Z.-H. (2008). Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pages 413–422. Markou, M. and Singh, S. (2003). Novelty detection: a review—part 1: statistical ap- proaches. Signal Processing, 83:2481–2497. Rushe, E. and Namee, B. M. (2019). Anomaly detection in raw audio using deep autore- gressive networks. In ICASSP 2019 - 2019 IEEE International Conference on Acous- tics, Speech and Signal Processing (ICASSP), pages 3597–3601. Schapire, R. E. (1999). A brief introduction to boosting. In IJCAI, volume 99, pages 1401–1406. Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., and Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural Compu- tation, 13(7):1443–1471. Scrucca, L. (2023). Entropy-based anomaly detection for gaussian mixture modeling. Algorithms, 16(4):195. Sutton, R. S. and Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press. van der Maaten, L. and Hinton, G. (2008). Visualizing data using t-sne. Journal of Ma- chine Learning Research, 9(86):2579–2605. Vareldzhan, G., Yurkov, K., and Ushenin, K. (2021). Anomaly detection in image datasets using convolutional neural networks, center loss, and mahalanobis distance. In 2021 Ural Symposium on Biomedical Engineering, Radioelectronics and Information Tech- nology (USBEREIT), pages 0387–0390.	zh_TW

學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

Google Scholar^TM