Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 適用於動態環境中偵測離群值之決策支援機制
A Decision Support Mechanism for Outlier Detection in the Concept Drifting Environment
作者 林哲緯
貢獻者 蔡瑞煌
Tsaih, Rua Huan
林哲緯
關鍵詞 離群值偵測
概念飄移
移動視窗
神經網路
決策支援
outlier detection
concept drifting
moving window
neural networks
decision support
日期 2015
上傳時間 3-Aug-2015 13:19:50 (UTC+8)
摘要 近來,偵測離群值已成為一個重要且具有挑戰性的研究議題。從給定之觀察值中我們可以推導出一個適配函數(fitting function),並依照距離此適配函數之距離決定出離群值(outlier)。而此議題在現今的環境中,更為困難:因現今之資料來源多為動態性且不穩定的環境,造成現在的資料具有概念飄移(concept drifting)之特性。
因此本研究提出一個創新的決策支援機制,幫助決策者於動態環境且具概念飄移的特性之資料偵測出離群值。具體而言,本研究希望在網路安全的領域,透過推導出的決策支援機制找出潛在的異常或具攻擊的行為。
本研究推導出的決策支援機制具有下列特點:
(1)使用自適應的單一隱藏層倒傳遞神經網路(single-hidden layer feed-forward neural networks, SLFN)來實作出穩健學習(resistant learning)之概念;
(2)透過移動視窗(moving window)機制實現增量學習(incremental learning)之策略;
(3)兼具效率及效能的決策支援:具備良好的偵測結果,且僅列舉出少量的潛在離群值給決策者。
此研究同時具有實驗進行驗證,實驗結果顯示此決策支援機制是非常具有前途的。
Outliers are observations far away from the fitting function that is deduced from the bulk of the given observations. Recently, to detect them has become an important issue. Since the data nature in the current era has become more concept-drifting, the outlier detection has become more challenging. To address this challenging issue, this study develops a decision support mechanism (DSM) for coping with the outlier detection problem in the concept-drifting environment. Specifically, this study wants to derive a DSM for identifying the potential intrusion detection in network security. The proposed DSM has the following features: (1) the implementation of the resistant learning concept via the adaptive single-hidden layer feed-forward neural networks, (2) the implementation of the incremental learning concept via the moving window technique, and (3) the efficiency and effectiveness in terms of having to review a much less amount of sample and getting a better accuracy of outlier detection. An experiment is designed to justify the proposed DSM. Experiment results show that the performance of proposed DSM is very promising.
參考文獻 Babcock, B., Datar, M., & Motwani, R. (2002). Sampling from a moving window over streaming data. In Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms Society for Industrial and Applied Mathematic, 633-634.
Babu, S., & Widom, J. (2001). Continuous queries over data streams. ACM Sigmod Record, 30(3), 109-120.
Banerjee, A. (2012). Density-based evolutionary outlier detection. In Proceedings of the fourteenth international conference on Genetic and evolutionary computation conference companion, 651-652.
Barnett, V., & Lewis, T. (1994). Outliers in statistical data (Vol. 3), Wiley, New York.
Basu, S., & Meckesheimer, M. (2007). Automatic outlier detection for time series: an application to sensor data. Knowledge and Information Systems, 11(2), 137-154.
Bezdek, J. C. (1994). What is computational intelligence? , Computational Intelligence: Imitating Life, 1-12.
Bifet, A., Gama, J., Pechenizkiy, M., & Zliobaite, I. (2011). Handling concept drift: Importance, challenges and solutions. PAKDD-2011 Tutorial, Shenzhen, China.
Bilge, L., & Dumitras, T. (2012). Before we knew it: an empirical study of zero-day attacks in the real world. In Proceedings of the 2012 ACM conference on Computer and communications security, 833-844.
Buschermöhle, A., Schoenke, J., & Brockmann, W. (2012). Uncertainty and Trust Estimation in Incrementally Learning Function Approximation. In Advances on Computational Intelligence (pp. 32-41). Heidelberg: Springer Berlin.
Castelo-Fernández, C., De Rezende, P. J., Falcão, A. X., & Papa, J. P. (2010). Improving the accuracy of the optimum-path forest supervised classifier for large datasets. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications (pp. 467-475). Heidelberg: Springer Berlin.
Chen, C., & Liu, L. M. (1993). Forecasting time series with outliers. Journal of Forecasting, 12(1), 13-35.
Cook, R. D., & Weisberg, S. (1982). Residuals and influence in regression. London: Chamman and Hall.
Crawford, K. D., & Wainwright, R. L. (1995). Applying Genetic Algorithms to Outlier Detection. In ICGA, 546-550.
Elwell, R., & Polikar, R. (2011). Incremental learning of concept drift in nonstationary environments. Neural Networks, IEEE Transactions on, 22(10), 1517-1531.
Ferdousi, Z., & Maeda, A. (2006). Unsupervised outlier detection in time series data. In Data Engineering Workshops, 2006. Proceedings. 22nd International Conference on IEEE, x121-x121.
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys (CSUR), 46(4), 44.
Hawkins, D. M. (1980). Identification of outliers (Vol. 11), London: Chapman and Hall.
Hawkins, S., He, H., Williams, G., & Baxter, R. (2002), Outlier detection using replicator neural networks, Warehousing and Knowledge Discovery (pp. 170-180). Berlin Heidelberg: Springer.
He, H. (2011). Self-adaptive systems for machine intelligence. John Wiley & Sons.
Hodge, V. J., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2), 85-126.
Huang, S. Y., Yu, F., Tsaih, R. H., & Huang, Y. (2014). Resistant learning on the envelope bulk for identifying anomalous patterns. In Neural Networks (IJCNN), 2014 International Joint Conference on, 3303-3310.
Joo, D., Hong, T., & Han, I. (2003). The neural network models for IDS based on the asymmetric costs of false negative errors and false positive errors. Expert Systems with Applications, 25(1), 69-75.
Krawczyk, B., & Woźniak, M. (2014). One-class classifiers with incremental learning and forgetting for data streams with concept drift. Soft Computing, 1-14.
Lanquillon, C., & Renz, I. (1999). Adaptive information filtering: Detecting changes in text streams. In Proceedings of the eighth international conference on Information and knowledge management, 538-544.
Lin, H. C. (2013), ‘An Application of Streaming Data Analysis on TAIEX Futures’, Unpublished Master dissertation, Natioal Cheng-chi University, Taipet , TW.
Maggi, F., Robertson, W., Kruegel, C., & Vigna, G. (2009). Protecting a moving target: Addressing web application concept drift. In Recent Advances in Intrusion Detection (pp. 21-40). Springer Berlin Heidelberg.
Masud, M. M., Chen, Q., Khan, L., Aggarwal, C., Gao, J., Han, J., & Thuraisingham, B. (2010). Addressing concept-evolution in concept-drifting data streams. In Data Mining (ICDM), 2010 IEEE 10th International Conference on, 929-934.
Masud, M. M., Gao, J., Khan, L., Han, J., & Thuraisingham, B. (2011). Classification and novel class detection in concept-drifting data streams under time constraints. Knowledge and Data Engineering, IEEE Transactions on, 23(6), 859-874.
Navvab Kashani, M., Aminian, J., Shahhosseini, S., & Farrokhi, M. (2012). Dynamic crude oil fouling prediction in industrial preheaters using optimized ANN based moving window technique. Chemical Engineering Research and Design, 90(7), 938-949.
Olson, D. L., & Shi, Y. (2007). Introduction to business data mining. Englewood Cliffs: McGraw-Hill/Irwin.
Rousseeuw, P. J., & Van Driessen, K. (2006). Computing LTS regression for large data sets. Data mining and knowledge discovery, 12(1), 29-45.
Sendhoff, B., Körner, E., Sporns, O., Ritter, H., & Doya, K. (Eds.). (2009). Creating Brain-Like Intelligence: from basic principles to complex intelligent systems (Vol. 5436). Springer Science & Business Media.
Song, J., Takakura, H., & Kwon, Y. (2008). A generalized feature extraction scheme to detect 0-day attacks via IDS alerts. In Applications and the Internet, 2008. SAINT 2008. International Symposium on (pp. 55-61). IEEE.
Srinoy, S. (2007). Intrusion detection model based on particle swarm optimization and support vector machine. In Computational Intelligence in Security and Defense Applications, 2007. CISDA 2007. IEEE Symposium on , 186-192.
Stanley, K. O. (2003). Learning concept drift with a committee of decision trees. Informe técnico: UT-AI-TR-03-302, Department of Computer Sciences, University of Texas at Austin, USA.
Storkey, A. (2009). When training and test sets are different: characterizing learning transfer. Dataset shift in machine learning, 3-28.
Sykacek, P. (1997). Equivalent error bars for neural network classifiers trained by Bayesian inference. In ESANN.
Tolvi, J. U. S. S. I. (2002). Outliers and Predictability in Monthly Stock Market Index Returns. Liiketaloudellinen aikakauskirja, 369-380.
Tsaih, R. H., & Cheng, T. C. (2009). A resistant learning procedure for coping with outliers. Annals of Mathematics and Artificial Intelligence, 57(2), 161-180.
Tsay, R. S. (2014). An Introduction to Analysis of Financial Data with R., Wiely.
Tsymbal, A. (2004). `The problem of concept drift: definitions and related work`. Computer Science Department, Trinity College Dublin.
Wang, H., Fan, W., Yu, P. S., & Han, J. (2003). Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 226-235.
Warren S. (1983), Cubic Clustering Criterion, SAS Technical Report, A-108, SAS Institute Inc., Wiley.
Widmer, G., & Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine learning, 23(1), 69-101.
Windham, M. P. (1995). Robustifying model fitting. Journal of the Royal Statistical Society. Series B (Methodological), 599-609.
Wrótniak, K., & Woźniak, M. (2013). Combined Bayesian Classifiers Applied to Spam Filtering Problem. In International Joint Conference CISIS’12-ICEUTE´ 12-SOCO´ 12 Special Sessions (pp. 253-260). Springer Berlin Heidelberg.
Yoon, K. A., Kwon, O. S., & Bae, D. H. (2007). An approach to outlier detection of software measurement data using the k-means clustering method. In Empirical Software Engineering and Measurement, 2007. ESEM 2007. First International Symposiu, 443-445.
Zimek, A., Campello, R. J., & Sander, J. (2014). Ensembles for unsupervised outlier detection: challenges and research questions a position paper. ACM SIGKDD Explorations Newsletter, 15(1), 11-22.
描述 碩士
國立政治大學
資訊管理研究所
102356002
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0102356002
資料類型 thesis
dc.contributor.advisor 蔡瑞煌zh_TW
dc.contributor.advisor Tsaih, Rua Huanen_US
dc.contributor.author (Authors) 林哲緯zh_TW
dc.creator (作者) 林哲緯zh_TW
dc.date (日期) 2015en_US
dc.date.accessioned 3-Aug-2015 13:19:50 (UTC+8)-
dc.date.available 3-Aug-2015 13:19:50 (UTC+8)-
dc.date.issued (上傳時間) 3-Aug-2015 13:19:50 (UTC+8)-
dc.identifier (Other Identifiers) G0102356002en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/77172-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊管理研究所zh_TW
dc.description (描述) 102356002zh_TW
dc.description.abstract (摘要) 近來,偵測離群值已成為一個重要且具有挑戰性的研究議題。從給定之觀察值中我們可以推導出一個適配函數(fitting function),並依照距離此適配函數之距離決定出離群值(outlier)。而此議題在現今的環境中,更為困難:因現今之資料來源多為動態性且不穩定的環境,造成現在的資料具有概念飄移(concept drifting)之特性。
因此本研究提出一個創新的決策支援機制,幫助決策者於動態環境且具概念飄移的特性之資料偵測出離群值。具體而言,本研究希望在網路安全的領域,透過推導出的決策支援機制找出潛在的異常或具攻擊的行為。
本研究推導出的決策支援機制具有下列特點:
(1)使用自適應的單一隱藏層倒傳遞神經網路(single-hidden layer feed-forward neural networks, SLFN)來實作出穩健學習(resistant learning)之概念;
(2)透過移動視窗(moving window)機制實現增量學習(incremental learning)之策略;
(3)兼具效率及效能的決策支援:具備良好的偵測結果,且僅列舉出少量的潛在離群值給決策者。
此研究同時具有實驗進行驗證,實驗結果顯示此決策支援機制是非常具有前途的。
zh_TW
dc.description.abstract (摘要) Outliers are observations far away from the fitting function that is deduced from the bulk of the given observations. Recently, to detect them has become an important issue. Since the data nature in the current era has become more concept-drifting, the outlier detection has become more challenging. To address this challenging issue, this study develops a decision support mechanism (DSM) for coping with the outlier detection problem in the concept-drifting environment. Specifically, this study wants to derive a DSM for identifying the potential intrusion detection in network security. The proposed DSM has the following features: (1) the implementation of the resistant learning concept via the adaptive single-hidden layer feed-forward neural networks, (2) the implementation of the incremental learning concept via the moving window technique, and (3) the efficiency and effectiveness in terms of having to review a much less amount of sample and getting a better accuracy of outlier detection. An experiment is designed to justify the proposed DSM. Experiment results show that the performance of proposed DSM is very promising.en_US
dc.description.tableofcontents FIGURE INDEX 6
TABLE INDEX 7
CHAPTER 1 INTRODUCTION 8
1.1 Background and Motivation 8
1.2 Research Question 10
1.3 Research Method 10
1.4 Purpose and Contribution 11
1.5 Content Organization 13
CHAPTER 2 LITERATURE REVIEW 14
2.1 Concept Drifting 14
2.2 Outlier Detection 20
2.3 Envelope Module 26
2.4 Moving Window 30
2.5 Zero-Day Attack 32
CHAPTER 3 THE PROPOSED DECISION SUPPORT MECHANISM 34
CHAPTER 4 EXPERIMENT DESIGN AND RESULTS 41
4.1 Experiment Design 41
4.2 Performance Evaluation 46
CHAPTER 5 CONCLUSION & FUTURE WORK 62
REFERENCE 66
zh_TW
dc.format.extent 5435784 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0102356002en_US
dc.subject (關鍵詞) 離群值偵測zh_TW
dc.subject (關鍵詞) 概念飄移zh_TW
dc.subject (關鍵詞) 移動視窗zh_TW
dc.subject (關鍵詞) 神經網路zh_TW
dc.subject (關鍵詞) 決策支援zh_TW
dc.subject (關鍵詞) outlier detectionen_US
dc.subject (關鍵詞) concept driftingen_US
dc.subject (關鍵詞) moving windowen_US
dc.subject (關鍵詞) neural networksen_US
dc.subject (關鍵詞) decision supporten_US
dc.title (題名) 適用於動態環境中偵測離群值之決策支援機制zh_TW
dc.title (題名) A Decision Support Mechanism for Outlier Detection in the Concept Drifting Environmenten_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) Babcock, B., Datar, M., & Motwani, R. (2002). Sampling from a moving window over streaming data. In Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms Society for Industrial and Applied Mathematic, 633-634.
Babu, S., & Widom, J. (2001). Continuous queries over data streams. ACM Sigmod Record, 30(3), 109-120.
Banerjee, A. (2012). Density-based evolutionary outlier detection. In Proceedings of the fourteenth international conference on Genetic and evolutionary computation conference companion, 651-652.
Barnett, V., & Lewis, T. (1994). Outliers in statistical data (Vol. 3), Wiley, New York.
Basu, S., & Meckesheimer, M. (2007). Automatic outlier detection for time series: an application to sensor data. Knowledge and Information Systems, 11(2), 137-154.
Bezdek, J. C. (1994). What is computational intelligence? , Computational Intelligence: Imitating Life, 1-12.
Bifet, A., Gama, J., Pechenizkiy, M., & Zliobaite, I. (2011). Handling concept drift: Importance, challenges and solutions. PAKDD-2011 Tutorial, Shenzhen, China.
Bilge, L., & Dumitras, T. (2012). Before we knew it: an empirical study of zero-day attacks in the real world. In Proceedings of the 2012 ACM conference on Computer and communications security, 833-844.
Buschermöhle, A., Schoenke, J., & Brockmann, W. (2012). Uncertainty and Trust Estimation in Incrementally Learning Function Approximation. In Advances on Computational Intelligence (pp. 32-41). Heidelberg: Springer Berlin.
Castelo-Fernández, C., De Rezende, P. J., Falcão, A. X., & Papa, J. P. (2010). Improving the accuracy of the optimum-path forest supervised classifier for large datasets. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications (pp. 467-475). Heidelberg: Springer Berlin.
Chen, C., & Liu, L. M. (1993). Forecasting time series with outliers. Journal of Forecasting, 12(1), 13-35.
Cook, R. D., & Weisberg, S. (1982). Residuals and influence in regression. London: Chamman and Hall.
Crawford, K. D., & Wainwright, R. L. (1995). Applying Genetic Algorithms to Outlier Detection. In ICGA, 546-550.
Elwell, R., & Polikar, R. (2011). Incremental learning of concept drift in nonstationary environments. Neural Networks, IEEE Transactions on, 22(10), 1517-1531.
Ferdousi, Z., & Maeda, A. (2006). Unsupervised outlier detection in time series data. In Data Engineering Workshops, 2006. Proceedings. 22nd International Conference on IEEE, x121-x121.
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys (CSUR), 46(4), 44.
Hawkins, D. M. (1980). Identification of outliers (Vol. 11), London: Chapman and Hall.
Hawkins, S., He, H., Williams, G., & Baxter, R. (2002), Outlier detection using replicator neural networks, Warehousing and Knowledge Discovery (pp. 170-180). Berlin Heidelberg: Springer.
He, H. (2011). Self-adaptive systems for machine intelligence. John Wiley & Sons.
Hodge, V. J., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2), 85-126.
Huang, S. Y., Yu, F., Tsaih, R. H., & Huang, Y. (2014). Resistant learning on the envelope bulk for identifying anomalous patterns. In Neural Networks (IJCNN), 2014 International Joint Conference on, 3303-3310.
Joo, D., Hong, T., & Han, I. (2003). The neural network models for IDS based on the asymmetric costs of false negative errors and false positive errors. Expert Systems with Applications, 25(1), 69-75.
Krawczyk, B., & Woźniak, M. (2014). One-class classifiers with incremental learning and forgetting for data streams with concept drift. Soft Computing, 1-14.
Lanquillon, C., & Renz, I. (1999). Adaptive information filtering: Detecting changes in text streams. In Proceedings of the eighth international conference on Information and knowledge management, 538-544.
Lin, H. C. (2013), ‘An Application of Streaming Data Analysis on TAIEX Futures’, Unpublished Master dissertation, Natioal Cheng-chi University, Taipet , TW.
Maggi, F., Robertson, W., Kruegel, C., & Vigna, G. (2009). Protecting a moving target: Addressing web application concept drift. In Recent Advances in Intrusion Detection (pp. 21-40). Springer Berlin Heidelberg.
Masud, M. M., Chen, Q., Khan, L., Aggarwal, C., Gao, J., Han, J., & Thuraisingham, B. (2010). Addressing concept-evolution in concept-drifting data streams. In Data Mining (ICDM), 2010 IEEE 10th International Conference on, 929-934.
Masud, M. M., Gao, J., Khan, L., Han, J., & Thuraisingham, B. (2011). Classification and novel class detection in concept-drifting data streams under time constraints. Knowledge and Data Engineering, IEEE Transactions on, 23(6), 859-874.
Navvab Kashani, M., Aminian, J., Shahhosseini, S., & Farrokhi, M. (2012). Dynamic crude oil fouling prediction in industrial preheaters using optimized ANN based moving window technique. Chemical Engineering Research and Design, 90(7), 938-949.
Olson, D. L., & Shi, Y. (2007). Introduction to business data mining. Englewood Cliffs: McGraw-Hill/Irwin.
Rousseeuw, P. J., & Van Driessen, K. (2006). Computing LTS regression for large data sets. Data mining and knowledge discovery, 12(1), 29-45.
Sendhoff, B., Körner, E., Sporns, O., Ritter, H., & Doya, K. (Eds.). (2009). Creating Brain-Like Intelligence: from basic principles to complex intelligent systems (Vol. 5436). Springer Science & Business Media.
Song, J., Takakura, H., & Kwon, Y. (2008). A generalized feature extraction scheme to detect 0-day attacks via IDS alerts. In Applications and the Internet, 2008. SAINT 2008. International Symposium on (pp. 55-61). IEEE.
Srinoy, S. (2007). Intrusion detection model based on particle swarm optimization and support vector machine. In Computational Intelligence in Security and Defense Applications, 2007. CISDA 2007. IEEE Symposium on , 186-192.
Stanley, K. O. (2003). Learning concept drift with a committee of decision trees. Informe técnico: UT-AI-TR-03-302, Department of Computer Sciences, University of Texas at Austin, USA.
Storkey, A. (2009). When training and test sets are different: characterizing learning transfer. Dataset shift in machine learning, 3-28.
Sykacek, P. (1997). Equivalent error bars for neural network classifiers trained by Bayesian inference. In ESANN.
Tolvi, J. U. S. S. I. (2002). Outliers and Predictability in Monthly Stock Market Index Returns. Liiketaloudellinen aikakauskirja, 369-380.
Tsaih, R. H., & Cheng, T. C. (2009). A resistant learning procedure for coping with outliers. Annals of Mathematics and Artificial Intelligence, 57(2), 161-180.
Tsay, R. S. (2014). An Introduction to Analysis of Financial Data with R., Wiely.
Tsymbal, A. (2004). `The problem of concept drift: definitions and related work`. Computer Science Department, Trinity College Dublin.
Wang, H., Fan, W., Yu, P. S., & Han, J. (2003). Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 226-235.
Warren S. (1983), Cubic Clustering Criterion, SAS Technical Report, A-108, SAS Institute Inc., Wiley.
Widmer, G., & Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine learning, 23(1), 69-101.
Windham, M. P. (1995). Robustifying model fitting. Journal of the Royal Statistical Society. Series B (Methodological), 599-609.
Wrótniak, K., & Woźniak, M. (2013). Combined Bayesian Classifiers Applied to Spam Filtering Problem. In International Joint Conference CISIS’12-ICEUTE´ 12-SOCO´ 12 Special Sessions (pp. 253-260). Springer Berlin Heidelberg.
Yoon, K. A., Kwon, O. S., & Bae, D. H. (2007). An approach to outlier detection of software measurement data using the k-means clustering method. In Empirical Software Engineering and Measurement, 2007. ESEM 2007. First International Symposiu, 443-445.
Zimek, A., Campello, R. J., & Sander, J. (2014). Ensembles for unsupervised outlier detection: challenges and research questions a position paper. ACM SIGKDD Explorations Newsletter, 15(1), 11-22.
zh_TW