學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 串流資料分析在台灣股市指數期貨之應用
An Application of Streaming Data Analysis on TAIEX Futures
作者 林宏哲
Lin, Hong Che
貢獻者 徐國偉
Hsu, Kuo Wei
林宏哲
Lin, Hong Che
關鍵詞 資料串流探勘
概念飄移
台灣股市期貨
data stream mining
concept drift
TAIEX Futures
日期 2012
上傳時間 2-Sep-2013 16:48:39 (UTC+8)
摘要 資料串流探勘是一個重要的研究領域,因為在現實中有許多重要的資料以串流的形式產生或被收集,金融市場的資料常常是一種資料串流,而通常這類型資料的本質是變動性大的。在這篇論文中我們運應了資料串流探勘的技術去預測台灣加權指數期貨的漲跌。對機器而言,預測期貨這種資料串流並不容易,而困難度跟概念飄移的種類與程度或頻率有關。概念飄移表示資料的潛在分布改變,這造成預測的準確率會急遽下降,因此我們專注在如何處理概念飄移。首先我們根據實驗的結果推測台灣加權指數期貨可能存在高頻率的概念飄移。另外實驗結果指出,使用偵測概念飄移的演算法可以大幅改善預測的準確率,甚至對於原本表現不好的演算法都能有顯著的改善。在這篇論文中我們亦整理出專門處理各類概念飄移的演算法。此外,我們提出了一個多分類器演算法,有助於偵測「重複發生」類別的概念飄移。該演算法相比改進之前,其最大的特色在於不需要使用者設定每個子分類器的樣本數,而該樣本數是影響演算法的關鍵之一。
Data stream mining is an important research field, because data is usually generated and collected in a form of a stream in many cases in the real world. Financial market data is such an example. It is intrinsically dynamic and usually generated in a sequential manner. In this thesis, we apply data stream mining techniques to the prediction of Taiwan Stock Exchange Capitalization Weighted Stock Index Futures or TAIEX Futures. Our goal is to predict the rising or falling of the futures. The prediction is difficult and the difficulty is associated with concept drift, which indicates changes in the underlying data distribution. Therefore, we focus on concept drift handling. We first show that concept drift occurs frequently in the TAIEX Futures data by referring to the results from an empirical study. In addition, the results indicate that a concept drift detection method can improve the accuracy of the prediction even when it is used with a data stream mining algorithm that does not perform well. Next, we explore methods that can help us identify the types of concept drift. The experimental results indicate that sudden and reoccurring concept drift exist in the TAIEX Futures data. Moreover, we propose an ensemble based algorithm for reoccurring concept drift. The most characteristic feature of the proposed algorithm is that it can adaptively determine the chunk size, which is an important parameter for other concept drift handling algorithms.
參考文獻 [1] C. Sammut and M. Harries, "Concept Drift," in Encyclopedia of Machine Learning, ed: Springer, 2010, pp. 202-205.
[2] A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, "Moa: Massive online analysis," The Journal of Machine Learning Research, vol. 99, pp. 1601-1604, 2010.
[3] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The WEKA data mining software: an update," ACM SIGKDD Explorations Newsletter, vol. 11, pp. 10-18, 2009.
[4] J. A. Ou and S. H. Penman, "Financial statement analysis and the prediction of stock returns," Journal of accounting and economics, vol. 11, pp. 295-329, 1989.
[5] R. W. Holthausen and D. F. Larcker, "The prediction of stock returns using financial statement information," Journal of accounting and economics, vol. 15, pp. 373-411, 1992.
[6] D. P. Brown and R. H. Jennings, "On technical analysis," Review of Financial Studies, vol. 2, pp. 527-551, 1989.
[7] H. V. Roberts, "Stock‐Market “Patterns” And Financial Analysis: Methodological Suggestions," The Journal of Finance, vol. 14, pp. 1-10, 1959.
[8] L. Blume, D. Easley, and M. O`hara, "Market statistics and technical analysis: The role of volume," The Journal of Finance, vol. 49, pp. 153-181, 1994.
[9] E. J. Hannan, Multiple time series vol. 38: Wiley, 1970.
[10] P.-F. Pai and C.-S. Lin, "A hybrid ARIMA and support vector machines model in stock price forecasting," Omega, vol. 33, pp. 497-505, 2005.
[11] S. H. Cheng, "Data mining techniques to identify the direction of Taiwan Stock Index Futures day trading," PhD Thesis, Department of Financial Engineering and Actuarial Mathematics of Soochow University. 2011. (in Chinese)
[12] C.-H. L. Chiu, Zne-Jung, "Application of Data Mining Technologies for IC Stock Category," Digital Technology Information Management. 2009. (in Chinese)
[13] S.-H. C. Cheng, I-LING, "Data Mining for Analysis of Choosing Stocks from Taiwan Stock Market," 2009 International Conference on Advanced Information Technologies (AIT), 2009. (in Chinese)
[14] P.-C. Chang and C.-H. Liu, "A TSK type fuzzy rule based system for stock price prediction," Expert Systems with Applications, vol. 34, pp. 135-144, 2008.
[15] T.-N. Lin, "Using AdaBoost for Taiwan Stock Index Future Intra-day Trading System," Graduae Institute of Network and Multimedia college of Electrical Engineering and computer Science, National Taiwan University. 2008. (in Chinese), 2008. (in Chinese)
[16] M. Harries and K. Horn, "Detecting concept drift in financial time series prediction using symbolic machine learning," in AI-CONFERENCE-, 1995, pp. 91-98.
[17] K. B. Pratt and G. Tschapek, "Visualizing concept drift," in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 735-740.
[18] G. R. Marrs, R. J. Hickey, and M. M. Black, "The impact of latency on online classification learning with concept drift," in Knowledge Science, Engineering and Management, ed: Springer, 2010, pp. 459-469.
[19] C.-M. Y. Chao, Huei-Wen, "Application of Multiple Data Streams Sequential Pattern Mining on Taiwan Stock Market," Journal of Information Management, vol. 12, pp. 113-132, 2010. (in Chinse)
[20] J. Sun and H. Li, "Dynamic financial distress prediction using instance selection for the disposal of concept drift," Expert Systems with Applications, vol. 38, pp. 2566-2576, 2011.
[21] M. Last, "Online classification of nonstationary data streams," Intelligent Data Analysis, vol. 6, pp. 129-147, 2002.
[22] J. R. Quinlan, C4. 5: programs for machine learning vol. 1: Morgan kaufmann, 1993.
[23] J. R. Quinlan, "Induction of decision trees," Machine learning, vol. 1, pp. 81-106, 1986.
[24] W. W. Cohen, "Fast effective rule induction," in Machine Learning-International Workshop Then Conference, 1995, pp. 115-123.
[25] T. Cover and P. Hart, "Nearest neighbor pattern classification," Information Theory, IEEE Transactions on, vol. 13, pp. 21-27, 1967.
[26] C. Cortes and V. Vapnik, "Support-vector networks," Machine learning, vol. 20, pp. 273-297, 1995.
[27] Y. Freund and R. E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," Journal of computer and system sciences, vol. 55, pp. 119-139, 1997.
[28] G. Widmer and M. Kubat, "Learning in the presence of concept drift and hidden contexts," Machine learning, vol. 23, pp. 69-101, 1996.
[29] A. Bifet, J. Gama, M. Pechenizkiy, and I. Zliobaite, "Handling concept drift: Importance, challenges and solutions," PAKDD-2011 Tutorial, Shenzhen, China, 2011.
[30] X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, and S. Y. Philip, "Top 10 algorithms in data mining," Knowledge and Information Systems, vol. 14, pp. 1-37, 2008.
[31] P. Domingos and G. Hulten, "Mining high-speed data streams," in Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, 2000, pp. 71-80.
[32] A. Bifet and R. Gavaldà, "Adaptive learning from evolving data streams," in Advances in Intelligent Data Analysis VIII, ed: Springer, 2009, pp. 249-260.
[33] G. Holmes, R. Kirkby, and B. Pfahringer, "Stress-testing hoeffding trees," in Knowledge Discovery in Databases: PKDD 2005, ed: Springer, 2005, pp. 495-502.
[34] J. Gama, P. Medas, G. Castillo, and P. Rodrigues, "Learning with drift detection," in Advances in Artificial Intelligence–SBIA 2004, ed: Springer, 2004, pp. 286-295.
[35] M. Baena-García, J. del Campo-Ávila, R. Fidalgo, A. Bifet, R. Gavaldà, and R. Morales-Bueno, "Early drift detection method," 2006.
[36] H. Wang, W. Fan, P. S. Yu, and J. Han, "Mining concept-drifting data streams using ensemble classifiers," in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 226-235.
[37] D. Brzeziński and J. Stefanowski, "Accuracy updated ensemble for data streams with concept drift," in Hybrid Artificial Intelligent Systems, ed: Springer, 2011, pp. 155-163.
[38] E. Kirkos, C. Spathis, and Y. Manolopoulos, "Data mining techniques for the detection of fraudulent financial statements," Expert Systems with Applications, vol. 32, pp. 995-1003, 2007.
[39] P. Ou and H. Wang, "Prediction of stock market index movement by ten data mining techniques," Modern Applied Science, vol. 3, p. P28, 2009.
[40] B. Rosenberg and W. McKibben, "The prediction of systematic and specific risk in common stocks," Journal of Financial and Quantitative Analysis, pp. 317-333, 1973.
[41] G. Gidófalvi and C. Elkan, "Using news articles to predict stock price movements," Department of Computer Science and Engineering, University of California, San Diego, 2001.
描述 碩士
國立政治大學
資訊科學學系
100753020
101
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0100753020
資料類型 thesis
dc.contributor.advisor 徐國偉zh_TW
dc.contributor.advisor Hsu, Kuo Weien_US
dc.contributor.author (Authors) 林宏哲zh_TW
dc.contributor.author (Authors) Lin, Hong Cheen_US
dc.creator (作者) 林宏哲zh_TW
dc.creator (作者) Lin, Hong Cheen_US
dc.date (日期) 2012en_US
dc.date.accessioned 2-Sep-2013 16:48:39 (UTC+8)-
dc.date.available 2-Sep-2013 16:48:39 (UTC+8)-
dc.date.issued (上傳時間) 2-Sep-2013 16:48:39 (UTC+8)-
dc.identifier (Other Identifiers) G0100753020en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/59440-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學學系zh_TW
dc.description (描述) 100753020zh_TW
dc.description (描述) 101zh_TW
dc.description.abstract (摘要) 資料串流探勘是一個重要的研究領域,因為在現實中有許多重要的資料以串流的形式產生或被收集,金融市場的資料常常是一種資料串流,而通常這類型資料的本質是變動性大的。在這篇論文中我們運應了資料串流探勘的技術去預測台灣加權指數期貨的漲跌。對機器而言,預測期貨這種資料串流並不容易,而困難度跟概念飄移的種類與程度或頻率有關。概念飄移表示資料的潛在分布改變,這造成預測的準確率會急遽下降,因此我們專注在如何處理概念飄移。首先我們根據實驗的結果推測台灣加權指數期貨可能存在高頻率的概念飄移。另外實驗結果指出,使用偵測概念飄移的演算法可以大幅改善預測的準確率,甚至對於原本表現不好的演算法都能有顯著的改善。在這篇論文中我們亦整理出專門處理各類概念飄移的演算法。此外,我們提出了一個多分類器演算法,有助於偵測「重複發生」類別的概念飄移。該演算法相比改進之前,其最大的特色在於不需要使用者設定每個子分類器的樣本數,而該樣本數是影響演算法的關鍵之一。zh_TW
dc.description.abstract (摘要) Data stream mining is an important research field, because data is usually generated and collected in a form of a stream in many cases in the real world. Financial market data is such an example. It is intrinsically dynamic and usually generated in a sequential manner. In this thesis, we apply data stream mining techniques to the prediction of Taiwan Stock Exchange Capitalization Weighted Stock Index Futures or TAIEX Futures. Our goal is to predict the rising or falling of the futures. The prediction is difficult and the difficulty is associated with concept drift, which indicates changes in the underlying data distribution. Therefore, we focus on concept drift handling. We first show that concept drift occurs frequently in the TAIEX Futures data by referring to the results from an empirical study. In addition, the results indicate that a concept drift detection method can improve the accuracy of the prediction even when it is used with a data stream mining algorithm that does not perform well. Next, we explore methods that can help us identify the types of concept drift. The experimental results indicate that sudden and reoccurring concept drift exist in the TAIEX Futures data. Moreover, we propose an ensemble based algorithm for reoccurring concept drift. The most characteristic feature of the proposed algorithm is that it can adaptively determine the chunk size, which is an important parameter for other concept drift handling algorithms.en_US
dc.description.tableofcontents CHAPTER 1 INTRODUCTION 1
1.1 TAIEX Futures Markets 1
1.2 Problem Description 2
1.3 Contributions 4
1.4 Thesis Organization 5
CHAPTER 2 PRELIMINARY 6
2.1 Non-Data Mining Techniques for Financial Data Analysis 6
2.2 Non-Streaming Data Mining Techniques for Financial Data Analysis 7
2.3 Data Streaming Mining Techniques for Financial Data Analysis 10
2.3.1 Concept Drift Analysis 10
2.3.2 Data Stream Mining Techniques 10
CHAPTER 3 DATA STREAM MINING 12
3.1 Introduction to Data Stream Mining 12
3.2 Concept Drift 13
3.3 MOA: A Data Stream Mining Tool 15
3.4 Data Stream Mining Algorithms 16
3.3.1 Naïve Bayes 16
3.3.2 Hoeffding Tree 16
3.3.3 Hoeffding Adaptive Tree 16
3.3.4 Drift Detection Method 17
3.3.5 Early Drift Detection Method 18
3.3.6 Accuracy Weighted Ensemble 18
3.3.7 Accuracy Update Ensemble 19
CHAPTER 4 ADAPTIVE DRIFT ENSEMBLE 20
CHAPTER 5 EXPERIMENTS 26
5.1 Setup 26
5.2 Results 31
5.2.1 Baseline 31
5.2.2 Drift Detection Method 36
5.2.3 Early Drift Detection Method 37
5.2.4 Accuracy Weighted Ensemble 38
5.2.5 Accuracy Update Ensemble 39
5.2.6 Adaptive Drift Ensemble 40
CHPATER 6 DISCUSSIONS 42
6.1 Impact of Concept Drift 42
6.1.1 Existence 42
6.1.2 Time Frame Granularity 45
6.2 Types of Concept Drift 51
6.2.1 Sudden vs. Gradual 51
6.2.2 Reoccurring 54
6.3 Characteristics of Adaptive Drift Ensemble 57
6.3.1 Comparison of Ensemble Methods 57
6.3.2 Comparison of Handlers 58
CHPATER 7 CONCLUSIONS AND FUTURE WORK 60
7.1 Conclusions 60
7.2 Future Work 61
REFERENCES 63
zh_TW
dc.format.extent 5948789 bytes-
dc.format.mimetype application/pdf-
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0100753020en_US
dc.subject (關鍵詞) 資料串流探勘zh_TW
dc.subject (關鍵詞) 概念飄移zh_TW
dc.subject (關鍵詞) 台灣股市期貨zh_TW
dc.subject (關鍵詞) data stream miningen_US
dc.subject (關鍵詞) concept driften_US
dc.subject (關鍵詞) TAIEX Futuresen_US
dc.title (題名) 串流資料分析在台灣股市指數期貨之應用zh_TW
dc.title (題名) An Application of Streaming Data Analysis on TAIEX Futuresen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) [1] C. Sammut and M. Harries, "Concept Drift," in Encyclopedia of Machine Learning, ed: Springer, 2010, pp. 202-205.
[2] A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, "Moa: Massive online analysis," The Journal of Machine Learning Research, vol. 99, pp. 1601-1604, 2010.
[3] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The WEKA data mining software: an update," ACM SIGKDD Explorations Newsletter, vol. 11, pp. 10-18, 2009.
[4] J. A. Ou and S. H. Penman, "Financial statement analysis and the prediction of stock returns," Journal of accounting and economics, vol. 11, pp. 295-329, 1989.
[5] R. W. Holthausen and D. F. Larcker, "The prediction of stock returns using financial statement information," Journal of accounting and economics, vol. 15, pp. 373-411, 1992.
[6] D. P. Brown and R. H. Jennings, "On technical analysis," Review of Financial Studies, vol. 2, pp. 527-551, 1989.
[7] H. V. Roberts, "Stock‐Market “Patterns” And Financial Analysis: Methodological Suggestions," The Journal of Finance, vol. 14, pp. 1-10, 1959.
[8] L. Blume, D. Easley, and M. O`hara, "Market statistics and technical analysis: The role of volume," The Journal of Finance, vol. 49, pp. 153-181, 1994.
[9] E. J. Hannan, Multiple time series vol. 38: Wiley, 1970.
[10] P.-F. Pai and C.-S. Lin, "A hybrid ARIMA and support vector machines model in stock price forecasting," Omega, vol. 33, pp. 497-505, 2005.
[11] S. H. Cheng, "Data mining techniques to identify the direction of Taiwan Stock Index Futures day trading," PhD Thesis, Department of Financial Engineering and Actuarial Mathematics of Soochow University. 2011. (in Chinese)
[12] C.-H. L. Chiu, Zne-Jung, "Application of Data Mining Technologies for IC Stock Category," Digital Technology Information Management. 2009. (in Chinese)
[13] S.-H. C. Cheng, I-LING, "Data Mining for Analysis of Choosing Stocks from Taiwan Stock Market," 2009 International Conference on Advanced Information Technologies (AIT), 2009. (in Chinese)
[14] P.-C. Chang and C.-H. Liu, "A TSK type fuzzy rule based system for stock price prediction," Expert Systems with Applications, vol. 34, pp. 135-144, 2008.
[15] T.-N. Lin, "Using AdaBoost for Taiwan Stock Index Future Intra-day Trading System," Graduae Institute of Network and Multimedia college of Electrical Engineering and computer Science, National Taiwan University. 2008. (in Chinese), 2008. (in Chinese)
[16] M. Harries and K. Horn, "Detecting concept drift in financial time series prediction using symbolic machine learning," in AI-CONFERENCE-, 1995, pp. 91-98.
[17] K. B. Pratt and G. Tschapek, "Visualizing concept drift," in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 735-740.
[18] G. R. Marrs, R. J. Hickey, and M. M. Black, "The impact of latency on online classification learning with concept drift," in Knowledge Science, Engineering and Management, ed: Springer, 2010, pp. 459-469.
[19] C.-M. Y. Chao, Huei-Wen, "Application of Multiple Data Streams Sequential Pattern Mining on Taiwan Stock Market," Journal of Information Management, vol. 12, pp. 113-132, 2010. (in Chinse)
[20] J. Sun and H. Li, "Dynamic financial distress prediction using instance selection for the disposal of concept drift," Expert Systems with Applications, vol. 38, pp. 2566-2576, 2011.
[21] M. Last, "Online classification of nonstationary data streams," Intelligent Data Analysis, vol. 6, pp. 129-147, 2002.
[22] J. R. Quinlan, C4. 5: programs for machine learning vol. 1: Morgan kaufmann, 1993.
[23] J. R. Quinlan, "Induction of decision trees," Machine learning, vol. 1, pp. 81-106, 1986.
[24] W. W. Cohen, "Fast effective rule induction," in Machine Learning-International Workshop Then Conference, 1995, pp. 115-123.
[25] T. Cover and P. Hart, "Nearest neighbor pattern classification," Information Theory, IEEE Transactions on, vol. 13, pp. 21-27, 1967.
[26] C. Cortes and V. Vapnik, "Support-vector networks," Machine learning, vol. 20, pp. 273-297, 1995.
[27] Y. Freund and R. E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," Journal of computer and system sciences, vol. 55, pp. 119-139, 1997.
[28] G. Widmer and M. Kubat, "Learning in the presence of concept drift and hidden contexts," Machine learning, vol. 23, pp. 69-101, 1996.
[29] A. Bifet, J. Gama, M. Pechenizkiy, and I. Zliobaite, "Handling concept drift: Importance, challenges and solutions," PAKDD-2011 Tutorial, Shenzhen, China, 2011.
[30] X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, and S. Y. Philip, "Top 10 algorithms in data mining," Knowledge and Information Systems, vol. 14, pp. 1-37, 2008.
[31] P. Domingos and G. Hulten, "Mining high-speed data streams," in Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, 2000, pp. 71-80.
[32] A. Bifet and R. Gavaldà, "Adaptive learning from evolving data streams," in Advances in Intelligent Data Analysis VIII, ed: Springer, 2009, pp. 249-260.
[33] G. Holmes, R. Kirkby, and B. Pfahringer, "Stress-testing hoeffding trees," in Knowledge Discovery in Databases: PKDD 2005, ed: Springer, 2005, pp. 495-502.
[34] J. Gama, P. Medas, G. Castillo, and P. Rodrigues, "Learning with drift detection," in Advances in Artificial Intelligence–SBIA 2004, ed: Springer, 2004, pp. 286-295.
[35] M. Baena-García, J. del Campo-Ávila, R. Fidalgo, A. Bifet, R. Gavaldà, and R. Morales-Bueno, "Early drift detection method," 2006.
[36] H. Wang, W. Fan, P. S. Yu, and J. Han, "Mining concept-drifting data streams using ensemble classifiers," in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 226-235.
[37] D. Brzeziński and J. Stefanowski, "Accuracy updated ensemble for data streams with concept drift," in Hybrid Artificial Intelligent Systems, ed: Springer, 2011, pp. 155-163.
[38] E. Kirkos, C. Spathis, and Y. Manolopoulos, "Data mining techniques for the detection of fraudulent financial statements," Expert Systems with Applications, vol. 32, pp. 995-1003, 2007.
[39] P. Ou and H. Wang, "Prediction of stock market index movement by ten data mining techniques," Modern Applied Science, vol. 3, p. P28, 2009.
[40] B. Rosenberg and W. McKibben, "The prediction of systematic and specific risk in common stocks," Journal of Financial and Quantitative Analysis, pp. 317-333, 1973.
[41] G. Gidófalvi and C. Elkan, "Using news articles to predict stock price movements," Department of Computer Science and Engineering, University of California, San Diego, 2001.
zh_TW