使用正規隨機漫步及相似度進行異常偵測

學術產出-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

政大圖書館

學術資源探索系統

Citation Infomation

Simple Record
Full Record

題名	使用正規隨機漫步及相似度進行異常偵測 Anomaly Detection Using Regulated Random Walk and Similarity Degree
作者	陳柏龍 Chen, Po-Lung
貢獻者	周珮婷 Chou, Pei-Ting 陳柏龍 Chen, Po-Lung
關鍵詞	異常偵測相似度正規隨機漫步多尺度自我調整 Anomaly detection Similarity Regulated random walk Multi-scale Self-tuning
日期	2019
上傳時間	5-Sep-2019 15:42:18 (UTC+8)
摘要	資料雲幾何樹是一個透過正規隨機漫步捕捉資料結構，再進行分群的一個演算法。本論文從資料雲幾何樹的概念中延伸出了兩種異常偵測的方法，第一種是使用樣本間的相似度加總來進行異常偵測，第二種則是透過正規隨機漫步探索數據，以探索到的時間點做為異常值。而在使用多尺度的模擬資料時，發現演算法表現不穩定，因此使用了self-tuning的策略來改良演算法，能克服在資料多尺度時進行異常偵測的問題，最後在實際資料上和經典方法LOF比較。 Data cloud geometry tree is a clustering algorithm that explores data structures by regulated random walk. Based on the concept of data cloud geometry tree, the current study proposes two anomaly detection methods. The first method uses sum of similarities between samples for anomaly detection. The second method explores data through regulated random walk to detect unusual pattern. Samples that were later explored are treated as abnormal. However, the performance of the proposed algorithms are unstable when dealing with multi-scaled simulated data. Therefore, self-tuning strategy is applied to improve the performance of algorithms and to overcome the anomaly detection problem for multi-scaled data. Finally, the performance of proposed methods are compared to the performance resulting from the classical method, LOF, with many real examples.
參考文獻	Breunig, M. M., Kriegel, H.-P., Ng, R. T., & Sander, J. (2000). LOF: identifying density-based local outliers. Paper presented at the ACM sigmod record. Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3), 15. Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Paper presented at the Kdd. Fushing, H., & McAssey, M. P. (2010). Time, temperature, and data cloud geometry. Phys Rev E Stat Nonlin Soft Matter Phys, 82(6 Pt 1), 061110. doi:10.1103/PhysRevE.82.061110 Goldstein, M. (2012). FastLOF: An expectation-maximization based local outlier detection algorithm. Paper presented at the Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). Grubbs, F. E. (1950). Sample criteria for testing outlying observations. The Annals of Mathematical Statistics, 21(1), 27-58. He, Z., Xu, X., & Deng, S. (2003). Discovering cluster-based local outliers. Pattern Recognition Letters, 24(9-10), 1641-1650. Kriegel, H.-P., & Zimek, A. (2008). Angle-based outlier detection in high-dimensional data. Paper presented at the Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. Lazarevic, A., & Kumar, V. (2005). Feature bagging for outlier detection. Paper presented at the Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. Lee, Y.-J., Yeh, Y.-R., & Wang, Y.-C. F. (2012). Anomaly detection via online oversampling principal component analysis. IEEE Transactions on Knowledge and Data Engineering, 25(7), 1460-1470. Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2008). Isolation forest. Paper presented at the 2008 Eighth IEEE International Conference on Data Mining. Maaten, L. v. d., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9(Nov), 2579-2605. Ng, A. Y., Jordan, M. I., & Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. Paper presented at the Advances in neural information processing systems. Pokrajac, D., Lazarevic, A., & Latecki, L. J. (2007). Incremental local outlier detection for data streams. Paper presented at the 2007 IEEE symposium on computational intelligence and data mining. Sakurada, M., & Yairi, T. (2014). Anomaly detection using autoencoders with nonlinear dimensionality reduction. Paper presented at the Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis. Zelnik-Manor, L., & Perona, P. (2005). Self-tuning spectral clustering. Paper presented at the Advances in neural information processing systems. Zenati, H., Foo, C. S., Lecouat, B., Manek, G., & Chandrasekhar, V. R. (2018). Efficient gan-based anomaly detection. arXiv preprint arXiv:1802.06222.
描述	碩士國立政治大學統計學系 106354026
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0106354026
資料類型	thesis

dc.contributor.advisor	周珮婷	zh_TW
dc.contributor.advisor	Chou, Pei-Ting	en_US
dc.contributor.author (Authors)	陳柏龍	zh_TW
dc.contributor.author (Authors)	Chen, Po-Lung	en_US
dc.creator (作者)	陳柏龍	zh_TW
dc.creator (作者)	Chen, Po-Lung	en_US
dc.date (日期)	2019	en_US
dc.date.accessioned	5-Sep-2019 15:42:18 (UTC+8)	-
dc.date.available	5-Sep-2019 15:42:18 (UTC+8)	-
dc.date.issued (上傳時間)	5-Sep-2019 15:42:18 (UTC+8)	-
dc.identifier (Other Identifiers)	G0106354026	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/125518	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計學系	zh_TW
dc.description (描述)	106354026	zh_TW
dc.description.abstract (摘要)	資料雲幾何樹是一個透過正規隨機漫步捕捉資料結構，再進行分群的一個演算法。本論文從資料雲幾何樹的概念中延伸出了兩種異常偵測的方法，第一種是使用樣本間的相似度加總來進行異常偵測，第二種則是透過正規隨機漫步探索數據，以探索到的時間點做為異常值。而在使用多尺度的模擬資料時，發現演算法表現不穩定，因此使用了self-tuning的策略來改良演算法，能克服在資料多尺度時進行異常偵測的問題，最後在實際資料上和經典方法LOF比較。	zh_TW
dc.description.abstract (摘要)	Data cloud geometry tree is a clustering algorithm that explores data structures by regulated random walk. Based on the concept of data cloud geometry tree, the current study proposes two anomaly detection methods. The first method uses sum of similarities between samples for anomaly detection. The second method explores data through regulated random walk to detect unusual pattern. Samples that were later explored are treated as abnormal. However, the performance of the proposed algorithms are unstable when dealing with multi-scaled simulated data. Therefore, self-tuning strategy is applied to improve the performance of algorithms and to overcome the anomaly detection problem for multi-scaled data. Finally, the performance of proposed methods are compared to the performance resulting from the classical method, LOF, with many real examples.	en_US
dc.description.tableofcontents	摘要 i Abstract ii 表次 v 圖次 vi 第一章緒論 1 第二章文獻探討 2 第一節基於統計 3 第二節基於與鄰近點的距離 3 第三節基於密度 4 第四節基於分群 5 第六節異常偵測的難點 7 第七節總結 7 第三章研究方法 8 第一節資料雲幾何樹(Data Cloud Geometry Tree，DCGT) 8 一、定義樣本間的相似度 10 二、隨機漫步過程 10 三、建立同群機率矩陣 12 四、決定分群數量 12 五、使用階層式分群進行分群 14 第二節 Regulated Random Walk Outlier Factor(RRWOF) 15 第三節 Similarity Degree Outlier Factor(SDOF) 15 第五節模擬資料實驗 18 一、溫度尺度(T)=1 21 二、溫度尺度(T)=10 22 三、溫度尺度(T) =100 23 四、小結 24 第五節溫度自我調整(self-tuning) 25 一、k = 20 26 二、k = 100 27 三、k = 500 28 四、小結 28 第四章研究過程 29 第一節評估準則 29 第二節資料集介紹 32 一、APS Failure at S cania Trucks Data Set(APS Failure) 33 二、Credit Card Fraud Detection data set(Credit Card) 35 三、Epileptic Seizure Recognition Data Set(Epileptic) 37 第三節實驗流程 40 第五章實驗結果及結論 41 第一節APS Failure 41 第二節Credit Card 43 第三節Epileptic 45 第四節小結 47 第六章結論與未來展望 48 參考文獻 49	zh_TW
dc.format.extent	3882655 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0106354026	en_US
dc.subject (關鍵詞)	異常偵測	zh_TW
dc.subject (關鍵詞)	相似度	zh_TW
dc.subject (關鍵詞)	正規隨機漫步	zh_TW
dc.subject (關鍵詞)	多尺度	zh_TW
dc.subject (關鍵詞)	自我調整	zh_TW
dc.subject (關鍵詞)	Anomaly detection	en_US
dc.subject (關鍵詞)	Similarity	en_US
dc.subject (關鍵詞)	Regulated random walk	en_US
dc.subject (關鍵詞)	Multi-scale	en_US
dc.subject (關鍵詞)	Self-tuning	en_US
dc.title (題名)	使用正規隨機漫步及相似度進行異常偵測	zh_TW
dc.title (題名)	Anomaly Detection Using Regulated Random Walk and Similarity Degree	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Breunig, M. M., Kriegel, H.-P., Ng, R. T., & Sander, J. (2000). LOF: identifying density-based local outliers. Paper presented at the ACM sigmod record. Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3), 15. Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Paper presented at the Kdd. Fushing, H., & McAssey, M. P. (2010). Time, temperature, and data cloud geometry. Phys Rev E Stat Nonlin Soft Matter Phys, 82(6 Pt 1), 061110. doi:10.1103/PhysRevE.82.061110 Goldstein, M. (2012). FastLOF: An expectation-maximization based local outlier detection algorithm. Paper presented at the Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). Grubbs, F. E. (1950). Sample criteria for testing outlying observations. The Annals of Mathematical Statistics, 21(1), 27-58. He, Z., Xu, X., & Deng, S. (2003). Discovering cluster-based local outliers. Pattern Recognition Letters, 24(9-10), 1641-1650. Kriegel, H.-P., & Zimek, A. (2008). Angle-based outlier detection in high-dimensional data. Paper presented at the Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. Lazarevic, A., & Kumar, V. (2005). Feature bagging for outlier detection. Paper presented at the Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. Lee, Y.-J., Yeh, Y.-R., & Wang, Y.-C. F. (2012). Anomaly detection via online oversampling principal component analysis. IEEE Transactions on Knowledge and Data Engineering, 25(7), 1460-1470. Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2008). Isolation forest. Paper presented at the 2008 Eighth IEEE International Conference on Data Mining. Maaten, L. v. d., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9(Nov), 2579-2605. Ng, A. Y., Jordan, M. I., & Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. Paper presented at the Advances in neural information processing systems. Pokrajac, D., Lazarevic, A., & Latecki, L. J. (2007). Incremental local outlier detection for data streams. Paper presented at the 2007 IEEE symposium on computational intelligence and data mining. Sakurada, M., & Yairi, T. (2014). Anomaly detection using autoencoders with nonlinear dimensionality reduction. Paper presented at the Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis. Zelnik-Manor, L., & Perona, P. (2005). Self-tuning spectral clustering. Paper presented at the Advances in neural information processing systems. Zenati, H., Foo, C. S., Lecouat, B., Manek, G., & Chandrasekhar, V. R. (2018). Efficient gan-based anomaly detection. arXiv preprint arXiv:1802.06222.	zh_TW
dc.identifier.doi (DOI)	10.6814/NCCU201900895	en_US

學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

Google Scholar^TM