學術產出-Theses
Article View/Open
Publication Export
-
題名 串流資料分析在台灣股市指數期貨之應用
An Application of Streaming Data Analysis on TAIEX Futures作者 林宏哲
Lin, Hong Che貢獻者 徐國偉
Hsu, Kuo Wei
林宏哲
Lin, Hong Che關鍵詞 資料串流探勘
概念飄移
台灣股市期貨
data stream mining
concept drift
TAIEX Futures日期 2012 上傳時間 2-Sep-2013 16:48:39 (UTC+8) 摘要 資料串流探勘是一個重要的研究領域,因為在現實中有許多重要的資料以串流的形式產生或被收集,金融市場的資料常常是一種資料串流,而通常這類型資料的本質是變動性大的。在這篇論文中我們運應了資料串流探勘的技術去預測台灣加權指數期貨的漲跌。對機器而言,預測期貨這種資料串流並不容易,而困難度跟概念飄移的種類與程度或頻率有關。概念飄移表示資料的潛在分布改變,這造成預測的準確率會急遽下降,因此我們專注在如何處理概念飄移。首先我們根據實驗的結果推測台灣加權指數期貨可能存在高頻率的概念飄移。另外實驗結果指出,使用偵測概念飄移的演算法可以大幅改善預測的準確率,甚至對於原本表現不好的演算法都能有顯著的改善。在這篇論文中我們亦整理出專門處理各類概念飄移的演算法。此外,我們提出了一個多分類器演算法,有助於偵測「重複發生」類別的概念飄移。該演算法相比改進之前,其最大的特色在於不需要使用者設定每個子分類器的樣本數,而該樣本數是影響演算法的關鍵之一。
Data stream mining is an important research field, because data is usually generated and collected in a form of a stream in many cases in the real world. Financial market data is such an example. It is intrinsically dynamic and usually generated in a sequential manner. In this thesis, we apply data stream mining techniques to the prediction of Taiwan Stock Exchange Capitalization Weighted Stock Index Futures or TAIEX Futures. Our goal is to predict the rising or falling of the futures. The prediction is difficult and the difficulty is associated with concept drift, which indicates changes in the underlying data distribution. Therefore, we focus on concept drift handling. We first show that concept drift occurs frequently in the TAIEX Futures data by referring to the results from an empirical study. In addition, the results indicate that a concept drift detection method can improve the accuracy of the prediction even when it is used with a data stream mining algorithm that does not perform well. Next, we explore methods that can help us identify the types of concept drift. The experimental results indicate that sudden and reoccurring concept drift exist in the TAIEX Futures data. Moreover, we propose an ensemble based algorithm for reoccurring concept drift. The most characteristic feature of the proposed algorithm is that it can adaptively determine the chunk size, which is an important parameter for other concept drift handling algorithms.參考文獻 [1] C. Sammut and M. Harries, "Concept Drift," in Encyclopedia of Machine Learning, ed: Springer, 2010, pp. 202-205.[2] A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, "Moa: Massive online analysis," The Journal of Machine Learning Research, vol. 99, pp. 1601-1604, 2010.[3] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The WEKA data mining software: an update," ACM SIGKDD Explorations Newsletter, vol. 11, pp. 10-18, 2009.[4] J. A. Ou and S. H. Penman, "Financial statement analysis and the prediction of stock returns," Journal of accounting and economics, vol. 11, pp. 295-329, 1989.[5] R. W. Holthausen and D. F. Larcker, "The prediction of stock returns using financial statement information," Journal of accounting and economics, vol. 15, pp. 373-411, 1992.[6] D. P. Brown and R. H. Jennings, "On technical analysis," Review of Financial Studies, vol. 2, pp. 527-551, 1989.[7] H. V. Roberts, "Stock‐Market “Patterns” And Financial Analysis: Methodological Suggestions," The Journal of Finance, vol. 14, pp. 1-10, 1959.[8] L. Blume, D. Easley, and M. O`hara, "Market statistics and technical analysis: The role of volume," The Journal of Finance, vol. 49, pp. 153-181, 1994.[9] E. J. Hannan, Multiple time series vol. 38: Wiley, 1970.[10] P.-F. Pai and C.-S. Lin, "A hybrid ARIMA and support vector machines model in stock price forecasting," Omega, vol. 33, pp. 497-505, 2005.[11] S. H. Cheng, "Data mining techniques to identify the direction of Taiwan Stock Index Futures day trading," PhD Thesis, Department of Financial Engineering and Actuarial Mathematics of Soochow University. 2011. (in Chinese)[12] C.-H. L. Chiu, Zne-Jung, "Application of Data Mining Technologies for IC Stock Category," Digital Technology Information Management. 2009. (in Chinese)[13] S.-H. C. Cheng, I-LING, "Data Mining for Analysis of Choosing Stocks from Taiwan Stock Market," 2009 International Conference on Advanced Information Technologies (AIT), 2009. (in Chinese)[14] P.-C. Chang and C.-H. Liu, "A TSK type fuzzy rule based system for stock price prediction," Expert Systems with Applications, vol. 34, pp. 135-144, 2008.[15] T.-N. Lin, "Using AdaBoost for Taiwan Stock Index Future Intra-day Trading System," Graduae Institute of Network and Multimedia college of Electrical Engineering and computer Science, National Taiwan University. 2008. (in Chinese), 2008. (in Chinese)[16] M. Harries and K. Horn, "Detecting concept drift in financial time series prediction using symbolic machine learning," in AI-CONFERENCE-, 1995, pp. 91-98.[17] K. B. Pratt and G. Tschapek, "Visualizing concept drift," in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 735-740.[18] G. R. Marrs, R. J. Hickey, and M. M. Black, "The impact of latency on online classification learning with concept drift," in Knowledge Science, Engineering and Management, ed: Springer, 2010, pp. 459-469.[19] C.-M. Y. Chao, Huei-Wen, "Application of Multiple Data Streams Sequential Pattern Mining on Taiwan Stock Market," Journal of Information Management, vol. 12, pp. 113-132, 2010. (in Chinse)[20] J. Sun and H. Li, "Dynamic financial distress prediction using instance selection for the disposal of concept drift," Expert Systems with Applications, vol. 38, pp. 2566-2576, 2011.[21] M. Last, "Online classification of nonstationary data streams," Intelligent Data Analysis, vol. 6, pp. 129-147, 2002.[22] J. R. Quinlan, C4. 5: programs for machine learning vol. 1: Morgan kaufmann, 1993.[23] J. R. Quinlan, "Induction of decision trees," Machine learning, vol. 1, pp. 81-106, 1986.[24] W. W. Cohen, "Fast effective rule induction," in Machine Learning-International Workshop Then Conference, 1995, pp. 115-123.[25] T. Cover and P. Hart, "Nearest neighbor pattern classification," Information Theory, IEEE Transactions on, vol. 13, pp. 21-27, 1967.[26] C. Cortes and V. Vapnik, "Support-vector networks," Machine learning, vol. 20, pp. 273-297, 1995.[27] Y. Freund and R. E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," Journal of computer and system sciences, vol. 55, pp. 119-139, 1997.[28] G. Widmer and M. Kubat, "Learning in the presence of concept drift and hidden contexts," Machine learning, vol. 23, pp. 69-101, 1996.[29] A. Bifet, J. Gama, M. Pechenizkiy, and I. Zliobaite, "Handling concept drift: Importance, challenges and solutions," PAKDD-2011 Tutorial, Shenzhen, China, 2011. [30] X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, and S. Y. Philip, "Top 10 algorithms in data mining," Knowledge and Information Systems, vol. 14, pp. 1-37, 2008.[31] P. Domingos and G. Hulten, "Mining high-speed data streams," in Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, 2000, pp. 71-80.[32] A. Bifet and R. Gavaldà, "Adaptive learning from evolving data streams," in Advances in Intelligent Data Analysis VIII, ed: Springer, 2009, pp. 249-260.[33] G. Holmes, R. Kirkby, and B. Pfahringer, "Stress-testing hoeffding trees," in Knowledge Discovery in Databases: PKDD 2005, ed: Springer, 2005, pp. 495-502.[34] J. Gama, P. Medas, G. Castillo, and P. Rodrigues, "Learning with drift detection," in Advances in Artificial Intelligence–SBIA 2004, ed: Springer, 2004, pp. 286-295.[35] M. Baena-García, J. del Campo-Ávila, R. Fidalgo, A. Bifet, R. Gavaldà, and R. Morales-Bueno, "Early drift detection method," 2006.[36] H. Wang, W. Fan, P. S. Yu, and J. Han, "Mining concept-drifting data streams using ensemble classifiers," in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 226-235.[37] D. Brzeziński and J. Stefanowski, "Accuracy updated ensemble for data streams with concept drift," in Hybrid Artificial Intelligent Systems, ed: Springer, 2011, pp. 155-163.[38] E. Kirkos, C. Spathis, and Y. Manolopoulos, "Data mining techniques for the detection of fraudulent financial statements," Expert Systems with Applications, vol. 32, pp. 995-1003, 2007.[39] P. Ou and H. Wang, "Prediction of stock market index movement by ten data mining techniques," Modern Applied Science, vol. 3, p. P28, 2009.[40] B. Rosenberg and W. McKibben, "The prediction of systematic and specific risk in common stocks," Journal of Financial and Quantitative Analysis, pp. 317-333, 1973.[41] G. Gidófalvi and C. Elkan, "Using news articles to predict stock price movements," Department of Computer Science and Engineering, University of California, San Diego, 2001. 描述 碩士
國立政治大學
資訊科學學系
100753020
101資料來源 http://thesis.lib.nccu.edu.tw/record/#G0100753020 資料類型 thesis dc.contributor.advisor 徐國偉 zh_TW dc.contributor.advisor Hsu, Kuo Wei en_US dc.contributor.author (Authors) 林宏哲 zh_TW dc.contributor.author (Authors) Lin, Hong Che en_US dc.creator (作者) 林宏哲 zh_TW dc.creator (作者) Lin, Hong Che en_US dc.date (日期) 2012 en_US dc.date.accessioned 2-Sep-2013 16:48:39 (UTC+8) - dc.date.available 2-Sep-2013 16:48:39 (UTC+8) - dc.date.issued (上傳時間) 2-Sep-2013 16:48:39 (UTC+8) - dc.identifier (Other Identifiers) G0100753020 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/59440 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學學系 zh_TW dc.description (描述) 100753020 zh_TW dc.description (描述) 101 zh_TW dc.description.abstract (摘要) 資料串流探勘是一個重要的研究領域,因為在現實中有許多重要的資料以串流的形式產生或被收集,金融市場的資料常常是一種資料串流,而通常這類型資料的本質是變動性大的。在這篇論文中我們運應了資料串流探勘的技術去預測台灣加權指數期貨的漲跌。對機器而言,預測期貨這種資料串流並不容易,而困難度跟概念飄移的種類與程度或頻率有關。概念飄移表示資料的潛在分布改變,這造成預測的準確率會急遽下降,因此我們專注在如何處理概念飄移。首先我們根據實驗的結果推測台灣加權指數期貨可能存在高頻率的概念飄移。另外實驗結果指出,使用偵測概念飄移的演算法可以大幅改善預測的準確率,甚至對於原本表現不好的演算法都能有顯著的改善。在這篇論文中我們亦整理出專門處理各類概念飄移的演算法。此外,我們提出了一個多分類器演算法,有助於偵測「重複發生」類別的概念飄移。該演算法相比改進之前,其最大的特色在於不需要使用者設定每個子分類器的樣本數,而該樣本數是影響演算法的關鍵之一。 zh_TW dc.description.abstract (摘要) Data stream mining is an important research field, because data is usually generated and collected in a form of a stream in many cases in the real world. Financial market data is such an example. It is intrinsically dynamic and usually generated in a sequential manner. In this thesis, we apply data stream mining techniques to the prediction of Taiwan Stock Exchange Capitalization Weighted Stock Index Futures or TAIEX Futures. Our goal is to predict the rising or falling of the futures. The prediction is difficult and the difficulty is associated with concept drift, which indicates changes in the underlying data distribution. Therefore, we focus on concept drift handling. We first show that concept drift occurs frequently in the TAIEX Futures data by referring to the results from an empirical study. In addition, the results indicate that a concept drift detection method can improve the accuracy of the prediction even when it is used with a data stream mining algorithm that does not perform well. Next, we explore methods that can help us identify the types of concept drift. The experimental results indicate that sudden and reoccurring concept drift exist in the TAIEX Futures data. Moreover, we propose an ensemble based algorithm for reoccurring concept drift. The most characteristic feature of the proposed algorithm is that it can adaptively determine the chunk size, which is an important parameter for other concept drift handling algorithms. en_US dc.description.tableofcontents CHAPTER 1 INTRODUCTION 11.1 TAIEX Futures Markets 11.2 Problem Description 21.3 Contributions 41.4 Thesis Organization 5CHAPTER 2 PRELIMINARY 62.1 Non-Data Mining Techniques for Financial Data Analysis 62.2 Non-Streaming Data Mining Techniques for Financial Data Analysis 72.3 Data Streaming Mining Techniques for Financial Data Analysis 102.3.1 Concept Drift Analysis 102.3.2 Data Stream Mining Techniques 10CHAPTER 3 DATA STREAM MINING 123.1 Introduction to Data Stream Mining 123.2 Concept Drift 133.3 MOA: A Data Stream Mining Tool 153.4 Data Stream Mining Algorithms 163.3.1 Naïve Bayes 163.3.2 Hoeffding Tree 163.3.3 Hoeffding Adaptive Tree 163.3.4 Drift Detection Method 173.3.5 Early Drift Detection Method 183.3.6 Accuracy Weighted Ensemble 183.3.7 Accuracy Update Ensemble 19CHAPTER 4 ADAPTIVE DRIFT ENSEMBLE 20CHAPTER 5 EXPERIMENTS 265.1 Setup 265.2 Results 315.2.1 Baseline 315.2.2 Drift Detection Method 365.2.3 Early Drift Detection Method 375.2.4 Accuracy Weighted Ensemble 385.2.5 Accuracy Update Ensemble 395.2.6 Adaptive Drift Ensemble 40CHPATER 6 DISCUSSIONS 426.1 Impact of Concept Drift 426.1.1 Existence 426.1.2 Time Frame Granularity 456.2 Types of Concept Drift 516.2.1 Sudden vs. Gradual 516.2.2 Reoccurring 546.3 Characteristics of Adaptive Drift Ensemble 576.3.1 Comparison of Ensemble Methods 576.3.2 Comparison of Handlers 58CHPATER 7 CONCLUSIONS AND FUTURE WORK 607.1 Conclusions 607.2 Future Work 61REFERENCES 63 zh_TW dc.format.extent 5948789 bytes - dc.format.mimetype application/pdf - dc.language.iso en_US - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0100753020 en_US dc.subject (關鍵詞) 資料串流探勘 zh_TW dc.subject (關鍵詞) 概念飄移 zh_TW dc.subject (關鍵詞) 台灣股市期貨 zh_TW dc.subject (關鍵詞) data stream mining en_US dc.subject (關鍵詞) concept drift en_US dc.subject (關鍵詞) TAIEX Futures en_US dc.title (題名) 串流資料分析在台灣股市指數期貨之應用 zh_TW dc.title (題名) An Application of Streaming Data Analysis on TAIEX Futures en_US dc.type (資料類型) thesis en dc.relation.reference (參考文獻) [1] C. Sammut and M. Harries, "Concept Drift," in Encyclopedia of Machine Learning, ed: Springer, 2010, pp. 202-205.[2] A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, "Moa: Massive online analysis," The Journal of Machine Learning Research, vol. 99, pp. 1601-1604, 2010.[3] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The WEKA data mining software: an update," ACM SIGKDD Explorations Newsletter, vol. 11, pp. 10-18, 2009.[4] J. A. Ou and S. H. Penman, "Financial statement analysis and the prediction of stock returns," Journal of accounting and economics, vol. 11, pp. 295-329, 1989.[5] R. W. Holthausen and D. F. Larcker, "The prediction of stock returns using financial statement information," Journal of accounting and economics, vol. 15, pp. 373-411, 1992.[6] D. P. Brown and R. H. Jennings, "On technical analysis," Review of Financial Studies, vol. 2, pp. 527-551, 1989.[7] H. V. Roberts, "Stock‐Market “Patterns” And Financial Analysis: Methodological Suggestions," The Journal of Finance, vol. 14, pp. 1-10, 1959.[8] L. Blume, D. Easley, and M. O`hara, "Market statistics and technical analysis: The role of volume," The Journal of Finance, vol. 49, pp. 153-181, 1994.[9] E. J. Hannan, Multiple time series vol. 38: Wiley, 1970.[10] P.-F. Pai and C.-S. Lin, "A hybrid ARIMA and support vector machines model in stock price forecasting," Omega, vol. 33, pp. 497-505, 2005.[11] S. H. Cheng, "Data mining techniques to identify the direction of Taiwan Stock Index Futures day trading," PhD Thesis, Department of Financial Engineering and Actuarial Mathematics of Soochow University. 2011. (in Chinese)[12] C.-H. L. Chiu, Zne-Jung, "Application of Data Mining Technologies for IC Stock Category," Digital Technology Information Management. 2009. (in Chinese)[13] S.-H. C. Cheng, I-LING, "Data Mining for Analysis of Choosing Stocks from Taiwan Stock Market," 2009 International Conference on Advanced Information Technologies (AIT), 2009. (in Chinese)[14] P.-C. Chang and C.-H. Liu, "A TSK type fuzzy rule based system for stock price prediction," Expert Systems with Applications, vol. 34, pp. 135-144, 2008.[15] T.-N. Lin, "Using AdaBoost for Taiwan Stock Index Future Intra-day Trading System," Graduae Institute of Network and Multimedia college of Electrical Engineering and computer Science, National Taiwan University. 2008. (in Chinese), 2008. (in Chinese)[16] M. Harries and K. Horn, "Detecting concept drift in financial time series prediction using symbolic machine learning," in AI-CONFERENCE-, 1995, pp. 91-98.[17] K. B. Pratt and G. Tschapek, "Visualizing concept drift," in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 735-740.[18] G. R. Marrs, R. J. Hickey, and M. M. Black, "The impact of latency on online classification learning with concept drift," in Knowledge Science, Engineering and Management, ed: Springer, 2010, pp. 459-469.[19] C.-M. Y. Chao, Huei-Wen, "Application of Multiple Data Streams Sequential Pattern Mining on Taiwan Stock Market," Journal of Information Management, vol. 12, pp. 113-132, 2010. (in Chinse)[20] J. Sun and H. Li, "Dynamic financial distress prediction using instance selection for the disposal of concept drift," Expert Systems with Applications, vol. 38, pp. 2566-2576, 2011.[21] M. Last, "Online classification of nonstationary data streams," Intelligent Data Analysis, vol. 6, pp. 129-147, 2002.[22] J. R. Quinlan, C4. 5: programs for machine learning vol. 1: Morgan kaufmann, 1993.[23] J. R. Quinlan, "Induction of decision trees," Machine learning, vol. 1, pp. 81-106, 1986.[24] W. W. Cohen, "Fast effective rule induction," in Machine Learning-International Workshop Then Conference, 1995, pp. 115-123.[25] T. Cover and P. Hart, "Nearest neighbor pattern classification," Information Theory, IEEE Transactions on, vol. 13, pp. 21-27, 1967.[26] C. Cortes and V. Vapnik, "Support-vector networks," Machine learning, vol. 20, pp. 273-297, 1995.[27] Y. Freund and R. E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," Journal of computer and system sciences, vol. 55, pp. 119-139, 1997.[28] G. Widmer and M. Kubat, "Learning in the presence of concept drift and hidden contexts," Machine learning, vol. 23, pp. 69-101, 1996.[29] A. Bifet, J. Gama, M. Pechenizkiy, and I. Zliobaite, "Handling concept drift: Importance, challenges and solutions," PAKDD-2011 Tutorial, Shenzhen, China, 2011. [30] X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, and S. Y. Philip, "Top 10 algorithms in data mining," Knowledge and Information Systems, vol. 14, pp. 1-37, 2008.[31] P. Domingos and G. Hulten, "Mining high-speed data streams," in Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, 2000, pp. 71-80.[32] A. Bifet and R. Gavaldà, "Adaptive learning from evolving data streams," in Advances in Intelligent Data Analysis VIII, ed: Springer, 2009, pp. 249-260.[33] G. Holmes, R. Kirkby, and B. Pfahringer, "Stress-testing hoeffding trees," in Knowledge Discovery in Databases: PKDD 2005, ed: Springer, 2005, pp. 495-502.[34] J. Gama, P. Medas, G. Castillo, and P. Rodrigues, "Learning with drift detection," in Advances in Artificial Intelligence–SBIA 2004, ed: Springer, 2004, pp. 286-295.[35] M. Baena-García, J. del Campo-Ávila, R. Fidalgo, A. Bifet, R. Gavaldà, and R. Morales-Bueno, "Early drift detection method," 2006.[36] H. Wang, W. Fan, P. S. Yu, and J. Han, "Mining concept-drifting data streams using ensemble classifiers," in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 226-235.[37] D. Brzeziński and J. Stefanowski, "Accuracy updated ensemble for data streams with concept drift," in Hybrid Artificial Intelligent Systems, ed: Springer, 2011, pp. 155-163.[38] E. Kirkos, C. Spathis, and Y. Manolopoulos, "Data mining techniques for the detection of fraudulent financial statements," Expert Systems with Applications, vol. 32, pp. 995-1003, 2007.[39] P. Ou and H. Wang, "Prediction of stock market index movement by ten data mining techniques," Modern Applied Science, vol. 3, p. P28, 2009.[40] B. Rosenberg and W. McKibben, "The prediction of systematic and specific risk in common stocks," Journal of Financial and Quantitative Analysis, pp. 317-333, 1973.[41] G. Gidófalvi and C. Elkan, "Using news articles to predict stock price movements," Department of Computer Science and Engineering, University of California, San Diego, 2001. zh_TW