Publications-Theses
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 適用於動態環境中偵測離群值之決策支援機制
A Decision Support Mechanism for Outlier Detection in the Concept Drifting Environment作者 林哲緯 貢獻者 蔡瑞煌
Tsaih, Rua Huan
林哲緯關鍵詞 離群值偵測
概念飄移
移動視窗
神經網路
決策支援
outlier detection
concept drifting
moving window
neural networks
decision support日期 2015 上傳時間 3-Aug-2015 13:19:50 (UTC+8) 摘要 近來,偵測離群值已成為一個重要且具有挑戰性的研究議題。從給定之觀察值中我們可以推導出一個適配函數(fitting function),並依照距離此適配函數之距離決定出離群值(outlier)。而此議題在現今的環境中,更為困難:因現今之資料來源多為動態性且不穩定的環境,造成現在的資料具有概念飄移(concept drifting)之特性。因此本研究提出一個創新的決策支援機制,幫助決策者於動態環境且具概念飄移的特性之資料偵測出離群值。具體而言,本研究希望在網路安全的領域,透過推導出的決策支援機制找出潛在的異常或具攻擊的行為。本研究推導出的決策支援機制具有下列特點:(1)使用自適應的單一隱藏層倒傳遞神經網路(single-hidden layer feed-forward neural networks, SLFN)來實作出穩健學習(resistant learning)之概念;(2)透過移動視窗(moving window)機制實現增量學習(incremental learning)之策略;(3)兼具效率及效能的決策支援:具備良好的偵測結果,且僅列舉出少量的潛在離群值給決策者。此研究同時具有實驗進行驗證,實驗結果顯示此決策支援機制是非常具有前途的。
Outliers are observations far away from the fitting function that is deduced from the bulk of the given observations. Recently, to detect them has become an important issue. Since the data nature in the current era has become more concept-drifting, the outlier detection has become more challenging. To address this challenging issue, this study develops a decision support mechanism (DSM) for coping with the outlier detection problem in the concept-drifting environment. Specifically, this study wants to derive a DSM for identifying the potential intrusion detection in network security. The proposed DSM has the following features: (1) the implementation of the resistant learning concept via the adaptive single-hidden layer feed-forward neural networks, (2) the implementation of the incremental learning concept via the moving window technique, and (3) the efficiency and effectiveness in terms of having to review a much less amount of sample and getting a better accuracy of outlier detection. An experiment is designed to justify the proposed DSM. Experiment results show that the performance of proposed DSM is very promising.參考文獻 Babcock, B., Datar, M., & Motwani, R. (2002). Sampling from a moving window over streaming data. In Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms Society for Industrial and Applied Mathematic, 633-634.Babu, S., & Widom, J. (2001). Continuous queries over data streams. ACM Sigmod Record, 30(3), 109-120.Banerjee, A. (2012). Density-based evolutionary outlier detection. In Proceedings of the fourteenth international conference on Genetic and evolutionary computation conference companion, 651-652.Barnett, V., & Lewis, T. (1994). Outliers in statistical data (Vol. 3), Wiley, New York.Basu, S., & Meckesheimer, M. (2007). Automatic outlier detection for time series: an application to sensor data. Knowledge and Information Systems, 11(2), 137-154.Bezdek, J. C. (1994). What is computational intelligence? , Computational Intelligence: Imitating Life, 1-12.Bifet, A., Gama, J., Pechenizkiy, M., & Zliobaite, I. (2011). Handling concept drift: Importance, challenges and solutions. PAKDD-2011 Tutorial, Shenzhen, China.Bilge, L., & Dumitras, T. (2012). Before we knew it: an empirical study of zero-day attacks in the real world. In Proceedings of the 2012 ACM conference on Computer and communications security, 833-844.Buschermöhle, A., Schoenke, J., & Brockmann, W. (2012). Uncertainty and Trust Estimation in Incrementally Learning Function Approximation. In Advances on Computational Intelligence (pp. 32-41). Heidelberg: Springer Berlin.Castelo-Fernández, C., De Rezende, P. J., Falcão, A. X., & Papa, J. P. (2010). Improving the accuracy of the optimum-path forest supervised classifier for large datasets. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications (pp. 467-475). Heidelberg: Springer Berlin.Chen, C., & Liu, L. M. (1993). Forecasting time series with outliers. Journal of Forecasting, 12(1), 13-35.Cook, R. D., & Weisberg, S. (1982). Residuals and influence in regression. London: Chamman and Hall.Crawford, K. D., & Wainwright, R. L. (1995). Applying Genetic Algorithms to Outlier Detection. In ICGA, 546-550.Elwell, R., & Polikar, R. (2011). Incremental learning of concept drift in nonstationary environments. Neural Networks, IEEE Transactions on, 22(10), 1517-1531.Ferdousi, Z., & Maeda, A. (2006). Unsupervised outlier detection in time series data. In Data Engineering Workshops, 2006. Proceedings. 22nd International Conference on IEEE, x121-x121.Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys (CSUR), 46(4), 44.Hawkins, D. M. (1980). Identification of outliers (Vol. 11), London: Chapman and Hall.Hawkins, S., He, H., Williams, G., & Baxter, R. (2002), Outlier detection using replicator neural networks, Warehousing and Knowledge Discovery (pp. 170-180). Berlin Heidelberg: Springer.He, H. (2011). Self-adaptive systems for machine intelligence. John Wiley & Sons.Hodge, V. J., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2), 85-126.Huang, S. Y., Yu, F., Tsaih, R. H., & Huang, Y. (2014). Resistant learning on the envelope bulk for identifying anomalous patterns. In Neural Networks (IJCNN), 2014 International Joint Conference on, 3303-3310.Joo, D., Hong, T., & Han, I. (2003). The neural network models for IDS based on the asymmetric costs of false negative errors and false positive errors. Expert Systems with Applications, 25(1), 69-75.Krawczyk, B., & Woźniak, M. (2014). One-class classifiers with incremental learning and forgetting for data streams with concept drift. Soft Computing, 1-14.Lanquillon, C., & Renz, I. (1999). Adaptive information filtering: Detecting changes in text streams. In Proceedings of the eighth international conference on Information and knowledge management, 538-544.Lin, H. C. (2013), ‘An Application of Streaming Data Analysis on TAIEX Futures’, Unpublished Master dissertation, Natioal Cheng-chi University, Taipet , TW.Maggi, F., Robertson, W., Kruegel, C., & Vigna, G. (2009). Protecting a moving target: Addressing web application concept drift. In Recent Advances in Intrusion Detection (pp. 21-40). Springer Berlin Heidelberg.Masud, M. M., Chen, Q., Khan, L., Aggarwal, C., Gao, J., Han, J., & Thuraisingham, B. (2010). Addressing concept-evolution in concept-drifting data streams. In Data Mining (ICDM), 2010 IEEE 10th International Conference on, 929-934.Masud, M. M., Gao, J., Khan, L., Han, J., & Thuraisingham, B. (2011). Classification and novel class detection in concept-drifting data streams under time constraints. Knowledge and Data Engineering, IEEE Transactions on, 23(6), 859-874.Navvab Kashani, M., Aminian, J., Shahhosseini, S., & Farrokhi, M. (2012). Dynamic crude oil fouling prediction in industrial preheaters using optimized ANN based moving window technique. Chemical Engineering Research and Design, 90(7), 938-949.Olson, D. L., & Shi, Y. (2007). Introduction to business data mining. Englewood Cliffs: McGraw-Hill/Irwin.Rousseeuw, P. J., & Van Driessen, K. (2006). Computing LTS regression for large data sets. Data mining and knowledge discovery, 12(1), 29-45.Sendhoff, B., Körner, E., Sporns, O., Ritter, H., & Doya, K. (Eds.). (2009). Creating Brain-Like Intelligence: from basic principles to complex intelligent systems (Vol. 5436). Springer Science & Business Media.Song, J., Takakura, H., & Kwon, Y. (2008). A generalized feature extraction scheme to detect 0-day attacks via IDS alerts. In Applications and the Internet, 2008. SAINT 2008. International Symposium on (pp. 55-61). IEEE.Srinoy, S. (2007). Intrusion detection model based on particle swarm optimization and support vector machine. In Computational Intelligence in Security and Defense Applications, 2007. CISDA 2007. IEEE Symposium on , 186-192.Stanley, K. O. (2003). Learning concept drift with a committee of decision trees. Informe técnico: UT-AI-TR-03-302, Department of Computer Sciences, University of Texas at Austin, USA.Storkey, A. (2009). When training and test sets are different: characterizing learning transfer. Dataset shift in machine learning, 3-28.Sykacek, P. (1997). Equivalent error bars for neural network classifiers trained by Bayesian inference. In ESANN.Tolvi, J. U. S. S. I. (2002). Outliers and Predictability in Monthly Stock Market Index Returns. Liiketaloudellinen aikakauskirja, 369-380.Tsaih, R. H., & Cheng, T. C. (2009). A resistant learning procedure for coping with outliers. Annals of Mathematics and Artificial Intelligence, 57(2), 161-180.Tsay, R. S. (2014). An Introduction to Analysis of Financial Data with R., Wiely.Tsymbal, A. (2004). `The problem of concept drift: definitions and related work`. Computer Science Department, Trinity College Dublin.Wang, H., Fan, W., Yu, P. S., & Han, J. (2003). Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 226-235.Warren S. (1983), Cubic Clustering Criterion, SAS Technical Report, A-108, SAS Institute Inc., Wiley.Widmer, G., & Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine learning, 23(1), 69-101.Windham, M. P. (1995). Robustifying model fitting. Journal of the Royal Statistical Society. Series B (Methodological), 599-609.Wrótniak, K., & Woźniak, M. (2013). Combined Bayesian Classifiers Applied to Spam Filtering Problem. In International Joint Conference CISIS’12-ICEUTE´ 12-SOCO´ 12 Special Sessions (pp. 253-260). Springer Berlin Heidelberg.Yoon, K. A., Kwon, O. S., & Bae, D. H. (2007). An approach to outlier detection of software measurement data using the k-means clustering method. In Empirical Software Engineering and Measurement, 2007. ESEM 2007. First International Symposiu, 443-445.Zimek, A., Campello, R. J., & Sander, J. (2014). Ensembles for unsupervised outlier detection: challenges and research questions a position paper. ACM SIGKDD Explorations Newsletter, 15(1), 11-22. 描述 碩士
國立政治大學
資訊管理研究所
102356002資料來源 http://thesis.lib.nccu.edu.tw/record/#G0102356002 資料類型 thesis dc.contributor.advisor 蔡瑞煌 zh_TW dc.contributor.advisor Tsaih, Rua Huan en_US dc.contributor.author (Authors) 林哲緯 zh_TW dc.creator (作者) 林哲緯 zh_TW dc.date (日期) 2015 en_US dc.date.accessioned 3-Aug-2015 13:19:50 (UTC+8) - dc.date.available 3-Aug-2015 13:19:50 (UTC+8) - dc.date.issued (上傳時間) 3-Aug-2015 13:19:50 (UTC+8) - dc.identifier (Other Identifiers) G0102356002 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/77172 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊管理研究所 zh_TW dc.description (描述) 102356002 zh_TW dc.description.abstract (摘要) 近來,偵測離群值已成為一個重要且具有挑戰性的研究議題。從給定之觀察值中我們可以推導出一個適配函數(fitting function),並依照距離此適配函數之距離決定出離群值(outlier)。而此議題在現今的環境中,更為困難:因現今之資料來源多為動態性且不穩定的環境,造成現在的資料具有概念飄移(concept drifting)之特性。因此本研究提出一個創新的決策支援機制,幫助決策者於動態環境且具概念飄移的特性之資料偵測出離群值。具體而言,本研究希望在網路安全的領域,透過推導出的決策支援機制找出潛在的異常或具攻擊的行為。本研究推導出的決策支援機制具有下列特點:(1)使用自適應的單一隱藏層倒傳遞神經網路(single-hidden layer feed-forward neural networks, SLFN)來實作出穩健學習(resistant learning)之概念;(2)透過移動視窗(moving window)機制實現增量學習(incremental learning)之策略;(3)兼具效率及效能的決策支援:具備良好的偵測結果,且僅列舉出少量的潛在離群值給決策者。此研究同時具有實驗進行驗證,實驗結果顯示此決策支援機制是非常具有前途的。 zh_TW dc.description.abstract (摘要) Outliers are observations far away from the fitting function that is deduced from the bulk of the given observations. Recently, to detect them has become an important issue. Since the data nature in the current era has become more concept-drifting, the outlier detection has become more challenging. To address this challenging issue, this study develops a decision support mechanism (DSM) for coping with the outlier detection problem in the concept-drifting environment. Specifically, this study wants to derive a DSM for identifying the potential intrusion detection in network security. The proposed DSM has the following features: (1) the implementation of the resistant learning concept via the adaptive single-hidden layer feed-forward neural networks, (2) the implementation of the incremental learning concept via the moving window technique, and (3) the efficiency and effectiveness in terms of having to review a much less amount of sample and getting a better accuracy of outlier detection. An experiment is designed to justify the proposed DSM. Experiment results show that the performance of proposed DSM is very promising. en_US dc.description.tableofcontents FIGURE INDEX 6TABLE INDEX 7CHAPTER 1 INTRODUCTION 81.1 Background and Motivation 81.2 Research Question 101.3 Research Method 101.4 Purpose and Contribution 111.5 Content Organization 13CHAPTER 2 LITERATURE REVIEW 142.1 Concept Drifting 142.2 Outlier Detection 202.3 Envelope Module 262.4 Moving Window 302.5 Zero-Day Attack 32CHAPTER 3 THE PROPOSED DECISION SUPPORT MECHANISM 34CHAPTER 4 EXPERIMENT DESIGN AND RESULTS 414.1 Experiment Design 414.2 Performance Evaluation 46CHAPTER 5 CONCLUSION & FUTURE WORK 62REFERENCE 66 zh_TW dc.format.extent 5435784 bytes - dc.format.mimetype application/pdf - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0102356002 en_US dc.subject (關鍵詞) 離群值偵測 zh_TW dc.subject (關鍵詞) 概念飄移 zh_TW dc.subject (關鍵詞) 移動視窗 zh_TW dc.subject (關鍵詞) 神經網路 zh_TW dc.subject (關鍵詞) 決策支援 zh_TW dc.subject (關鍵詞) outlier detection en_US dc.subject (關鍵詞) concept drifting en_US dc.subject (關鍵詞) moving window en_US dc.subject (關鍵詞) neural networks en_US dc.subject (關鍵詞) decision support en_US dc.title (題名) 適用於動態環境中偵測離群值之決策支援機制 zh_TW dc.title (題名) A Decision Support Mechanism for Outlier Detection in the Concept Drifting Environment en_US dc.type (資料類型) thesis en dc.relation.reference (參考文獻) Babcock, B., Datar, M., & Motwani, R. (2002). Sampling from a moving window over streaming data. In Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms Society for Industrial and Applied Mathematic, 633-634.Babu, S., & Widom, J. (2001). Continuous queries over data streams. ACM Sigmod Record, 30(3), 109-120.Banerjee, A. (2012). Density-based evolutionary outlier detection. In Proceedings of the fourteenth international conference on Genetic and evolutionary computation conference companion, 651-652.Barnett, V., & Lewis, T. (1994). Outliers in statistical data (Vol. 3), Wiley, New York.Basu, S., & Meckesheimer, M. (2007). Automatic outlier detection for time series: an application to sensor data. Knowledge and Information Systems, 11(2), 137-154.Bezdek, J. C. (1994). What is computational intelligence? , Computational Intelligence: Imitating Life, 1-12.Bifet, A., Gama, J., Pechenizkiy, M., & Zliobaite, I. (2011). Handling concept drift: Importance, challenges and solutions. PAKDD-2011 Tutorial, Shenzhen, China.Bilge, L., & Dumitras, T. (2012). Before we knew it: an empirical study of zero-day attacks in the real world. In Proceedings of the 2012 ACM conference on Computer and communications security, 833-844.Buschermöhle, A., Schoenke, J., & Brockmann, W. (2012). Uncertainty and Trust Estimation in Incrementally Learning Function Approximation. In Advances on Computational Intelligence (pp. 32-41). Heidelberg: Springer Berlin.Castelo-Fernández, C., De Rezende, P. J., Falcão, A. X., & Papa, J. P. (2010). Improving the accuracy of the optimum-path forest supervised classifier for large datasets. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications (pp. 467-475). Heidelberg: Springer Berlin.Chen, C., & Liu, L. M. (1993). Forecasting time series with outliers. Journal of Forecasting, 12(1), 13-35.Cook, R. D., & Weisberg, S. (1982). Residuals and influence in regression. London: Chamman and Hall.Crawford, K. D., & Wainwright, R. L. (1995). Applying Genetic Algorithms to Outlier Detection. In ICGA, 546-550.Elwell, R., & Polikar, R. (2011). Incremental learning of concept drift in nonstationary environments. Neural Networks, IEEE Transactions on, 22(10), 1517-1531.Ferdousi, Z., & Maeda, A. (2006). Unsupervised outlier detection in time series data. In Data Engineering Workshops, 2006. Proceedings. 22nd International Conference on IEEE, x121-x121.Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys (CSUR), 46(4), 44.Hawkins, D. M. (1980). Identification of outliers (Vol. 11), London: Chapman and Hall.Hawkins, S., He, H., Williams, G., & Baxter, R. (2002), Outlier detection using replicator neural networks, Warehousing and Knowledge Discovery (pp. 170-180). Berlin Heidelberg: Springer.He, H. (2011). Self-adaptive systems for machine intelligence. John Wiley & Sons.Hodge, V. J., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2), 85-126.Huang, S. Y., Yu, F., Tsaih, R. H., & Huang, Y. (2014). Resistant learning on the envelope bulk for identifying anomalous patterns. In Neural Networks (IJCNN), 2014 International Joint Conference on, 3303-3310.Joo, D., Hong, T., & Han, I. (2003). The neural network models for IDS based on the asymmetric costs of false negative errors and false positive errors. Expert Systems with Applications, 25(1), 69-75.Krawczyk, B., & Woźniak, M. (2014). One-class classifiers with incremental learning and forgetting for data streams with concept drift. Soft Computing, 1-14.Lanquillon, C., & Renz, I. (1999). Adaptive information filtering: Detecting changes in text streams. In Proceedings of the eighth international conference on Information and knowledge management, 538-544.Lin, H. C. (2013), ‘An Application of Streaming Data Analysis on TAIEX Futures’, Unpublished Master dissertation, Natioal Cheng-chi University, Taipet , TW.Maggi, F., Robertson, W., Kruegel, C., & Vigna, G. (2009). Protecting a moving target: Addressing web application concept drift. In Recent Advances in Intrusion Detection (pp. 21-40). Springer Berlin Heidelberg.Masud, M. M., Chen, Q., Khan, L., Aggarwal, C., Gao, J., Han, J., & Thuraisingham, B. (2010). Addressing concept-evolution in concept-drifting data streams. In Data Mining (ICDM), 2010 IEEE 10th International Conference on, 929-934.Masud, M. M., Gao, J., Khan, L., Han, J., & Thuraisingham, B. (2011). Classification and novel class detection in concept-drifting data streams under time constraints. Knowledge and Data Engineering, IEEE Transactions on, 23(6), 859-874.Navvab Kashani, M., Aminian, J., Shahhosseini, S., & Farrokhi, M. (2012). Dynamic crude oil fouling prediction in industrial preheaters using optimized ANN based moving window technique. Chemical Engineering Research and Design, 90(7), 938-949.Olson, D. L., & Shi, Y. (2007). Introduction to business data mining. Englewood Cliffs: McGraw-Hill/Irwin.Rousseeuw, P. J., & Van Driessen, K. (2006). Computing LTS regression for large data sets. Data mining and knowledge discovery, 12(1), 29-45.Sendhoff, B., Körner, E., Sporns, O., Ritter, H., & Doya, K. (Eds.). (2009). Creating Brain-Like Intelligence: from basic principles to complex intelligent systems (Vol. 5436). Springer Science & Business Media.Song, J., Takakura, H., & Kwon, Y. (2008). A generalized feature extraction scheme to detect 0-day attacks via IDS alerts. In Applications and the Internet, 2008. SAINT 2008. International Symposium on (pp. 55-61). IEEE.Srinoy, S. (2007). Intrusion detection model based on particle swarm optimization and support vector machine. In Computational Intelligence in Security and Defense Applications, 2007. CISDA 2007. IEEE Symposium on , 186-192.Stanley, K. O. (2003). Learning concept drift with a committee of decision trees. Informe técnico: UT-AI-TR-03-302, Department of Computer Sciences, University of Texas at Austin, USA.Storkey, A. (2009). When training and test sets are different: characterizing learning transfer. Dataset shift in machine learning, 3-28.Sykacek, P. (1997). Equivalent error bars for neural network classifiers trained by Bayesian inference. In ESANN.Tolvi, J. U. S. S. I. (2002). Outliers and Predictability in Monthly Stock Market Index Returns. Liiketaloudellinen aikakauskirja, 369-380.Tsaih, R. H., & Cheng, T. C. (2009). A resistant learning procedure for coping with outliers. Annals of Mathematics and Artificial Intelligence, 57(2), 161-180.Tsay, R. S. (2014). An Introduction to Analysis of Financial Data with R., Wiely.Tsymbal, A. (2004). `The problem of concept drift: definitions and related work`. Computer Science Department, Trinity College Dublin.Wang, H., Fan, W., Yu, P. S., & Han, J. (2003). Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 226-235.Warren S. (1983), Cubic Clustering Criterion, SAS Technical Report, A-108, SAS Institute Inc., Wiley.Widmer, G., & Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine learning, 23(1), 69-101.Windham, M. P. (1995). Robustifying model fitting. Journal of the Royal Statistical Society. Series B (Methodological), 599-609.Wrótniak, K., & Woźniak, M. (2013). Combined Bayesian Classifiers Applied to Spam Filtering Problem. In International Joint Conference CISIS’12-ICEUTE´ 12-SOCO´ 12 Special Sessions (pp. 253-260). Springer Berlin Heidelberg.Yoon, K. A., Kwon, O. S., & Bae, D. H. (2007). An approach to outlier detection of software measurement data using the k-means clustering method. In Empirical Software Engineering and Measurement, 2007. ESEM 2007. First International Symposiu, 443-445.Zimek, A., Campello, R. J., & Sander, J. (2014). Ensembles for unsupervised outlier detection: challenges and research questions a position paper. ACM SIGKDD Explorations Newsletter, 15(1), 11-22. zh_TW