學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 透過時間序列的波動特徵分群協助資料分類 -以公司危機事件為例
Achieve Efficient Data Classification by Time Series Wave Decomposition Pattern Clustering: Financial Distress as an Example
作者 陳郁婷
Chen, Yu Ting
貢獻者 胡毓忠
Hu,Yuh Jong
陳郁婷
Chen,Yu Ting
關鍵詞 R
時間序列
Spark
財務危機
R
Time Series
Spark
financial distress
日期 2016
上傳時間 2-Sep-2016 01:32:04 (UTC+8)
摘要 本研究透過時間序列拆解方法分析股價報酬率因數,取出趨勢波動特徵進行分群演算,將分群結果視為特徵值,進行更進一步資料分類。時間序列波形特徵,可對該序列做未來趨勢預測。本研究則將趨勢波形做為資料分群的特徵值,藉以輔助分類。本研究案例為財務危機公司,區分具實質財務危機或非實質財務危機,並整合公司其它財務與非財務相關分析。使用R 語言時間序列拆解工具找出趨勢波形並進行分群。採用Spark平行化計算架構的節點擴充運算能力與叢集式容錯處理以及RDD 的高效能運算。本研究並採用隨機決策森林的組合式(Ensemble)學習演算法進行公司危機型態的分類預測系統實驗。
The purpose of the study was to analyze rate of return factor by Time Series Wave Decomposition, to take Trend wave features to proceed clustering, then taking the clustering result as feature to achieve efficient data classification. Time Series Wave feature can be a predictor for future trend; however, this study took Time Series Wave as a classification feature and took Financial Distress company as an example to distinguish the financial distress to integrate relative financial analysis factors. Adopting Spark process data in parallel in standalone cluster mode with Resilient Distributed Dataset (RDD) to improve the computing performance. The study adopted random forest ensemble machine learning to proceed financial distress company classification prediction.
參考文獻 【1】 戴國良(2005)。「財務管理-最新實務導向與本土企業案例」。頁424。台灣:五南
【2】 天下雜誌: http://www.cw.com.tw/article/article.action?id=5076028
【3】 林麗雪、龍邵琪(2007)資料探勘技術應用於財務危機模式之建構,中華管理評論國際學報
【4】 Beaver, W.H.,( 1966), Financial ratios as predictors of failure, Journal of Accounting Research 4, 71-111.
【5】 Altman E.I.(1968). Financial Ratios, Discriminant Analysis and The Prediction of Corporate Bankruptcy ,The Journal of Finance, 23, pp. 589-609
【6】 Ohlson, J.A.,(1980), Financial ratios and the probabilistic prediction of bankruptcy, Journal of Accounting Research .18, 109-131.
【7】 Saeed Aghabozorgi,Ali Seyed Shirkhorshidi,Teh Ying Wah.(2015).Time-series clustering – A decade review.Information Systems,53,16-38
【8】 M. Chi, S. Banerjee, A.E. Hassanien.(2009).Clustering time series data: an evolutionary approach Found. Comput. Intell., 6 (1) , pp. 193–207
【9】 E. Keogh, S. Kasetty.(2003). On the need for time series data mining benchmarks: a survey and empirical demonstration, Data Min.Knowl. Discov. 7 (4) (2003) 349–371
【10】 Rosas-Romero, Roberto ; Díaz-Torres, Alejandro ; Etcheverry, Gibran.(2016).Forecasting of stock return prices with sparse representation of financial time series over redundant dictionaries.Expert Systems With Applications.57, pp.37-48
【11】 Keogh, Eamonn ; Lin, Jessica.(2005).Clustering of time-series subsequences is meaningless: implications for previous and future research.Knowledge .Information Systems, Vol.8(2), pp.154-177
【12】 X. Wang,K.Smith,R.Hyndman. (2006).Characteristic-based clustering for time seriesdata,DataMin.Knowl.Discov.13(3) 335–364.
【13】 J.MacQueen,(1967).Some methods for classification and analysis of multi- variate observations,in:Proceedings of the fifth Berkeley sympo-sium MathematicalStatist.Probability,vol.1,pp.281–297.
【14】 P.S.Bradley,U.Fayyad,C.Reina,(1998).Scaling clustering algorithms to large databases, Knowl.Discov.Data Min.9–15
【15】 台灣經濟新報TEJ資料庫
【16】 M. Kendall and A. Stuart (1983) The Advanced Theory of Statistics,.3, Griffin. pp. 410–414
【17】 Robert B. Cleveland, William S. Cleveland, Jean E. McRae, and Irma erpenning,(1990).STL:A Seasonal-Trend Decomposition Precedure Based on Loess Journal of Official Statistics,.6(1), pp. 3–73
【18】 Zhang,H.,Ho,T.B.,Zhang,Y.,andLin,M.(2006)Unsupervised feature extraction for timeseries clustering using or thogonal wavelet transform. INFORMATICA-LJUBLJANA-,30(3), 305.
【19】 Montero, P and Vilar, J.A. (2014) TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. http://www.jstatsoft.org/v62/i
【20】 Alonso, A.M., Berrendero, J.R., Hernandez, A. and Justel, A. (2006) Time series clustering based on forecast densities. Comput. Statist. Data Anal.,51,762–776.
【21】 Lin, J., Keogh, E., Lonardi, S. & Chiu, B. (2003). A Symbolic Representation of Time Series, with Implications for Streaming Algorithms. In Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.
【22】 Keogh, E., Chakrabarti, K., Pazzani, M., & Mehrotra, S. (2001). Dimensionality reduction for fast similarity search in large timeseries databases. Knowledge and information Systems,3(3),263-286.
【23】 El Hennawy, R. H. A, Morris, R. C.(1983). The Significance of Base Year in Developing Failure Prediction Models. Journal of Business Finance and Accounting.10(2), 209-223
【24】 Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica.(2010). Spark: cluster computing with working sets. In USENIX conference on Hot topics in cloud computing, p10,
【25】 Vilar, J.A., Alonso, A. M. and Vilar, J.M. (2010) Non-linear time series clustering based on nonparametric forecast densities. Comput. Statist. Data Anal.,54(11), 2850–2865
【26】 PJ Rousseeuw.(1987).Silhouettes: a graphical aid to the interpretation and validation of cluster analysis.Journal of Computational and Applied Mathematics 20, 53-65
【27】 Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32
【28】 Patel, Jigar ; Shah, Sahil ; Thakkar, Priyank ; Kotecha, K.(2015).Predicting stock and stock price index movement using Trend Deterministic Data Preparation and machine learning techniques. Expert Systems With Applications. 42(1), pp.259-268
【29】 Engle, Robert F.(1982). "Autoregressive Condi- tional Heteroskedasticity with Estimates of the Variance of United Kingdom Inflation." Econo- metrica. 50:4, pp. 987-1007
【30】 Engle, Robert.(2001)..GARCH 101: The Use of ARCH/GARCH Models in Applied Econometrics.Journal of Economic Perspectives, , Vol.15(4), pp.157-168
【31】 Zmijewski, M. E. (1984), “Methodological Issues Related to the Estimation of Financial Distress Prediction Models.”Journal of Accounting Research, 22, pp.59-82
【32】 Platt, H.D. and M. B. Platt.(2002).Predicting Corporate Financial Distress: Reflections on ChoiceBased Sample Bias.”Journal of Economics and Finance, 26, pp.184-199
【33】 Esling,Philippe ; Agon, Carlos.(2013).Time-series data mining..ACM Computing Surveys.Vol.45(1), p.12(34)
【34】 鉅亨網. (2016/08/15): http://news.cnyes.com/news/id/2155746
【35】 Indicator Reference: http://www.fmlabs.com/reference
【36】 Package `TTR` - CRAN :https://cran.r-project.org/web/packages/TTR/TTR.pdf
【37】 V. López, A. Fernandez, S. García, V. Palade, F. Herrera. (2013). An Insight into Classification with Imbalanced Data: Empirical Results and Current Trends on Using Data Intrinsic Characteristics. Information Sciences .250, 113-141
【38】 P. Ravi Kumar, V. Ravi .(2007).Bankruptcy prediction in banks and firms via statistical and intelligent techniques – A review. European Journal of Operational Research 180(1),pp 1–28
【39】 Sun, J., & Li, H. (2008b). Data mining method for listed companies’ financial distress prediction. Knowledge-Based Systems, 21(1), 1–5.
【40】 Sankoff, D., and Kruskal, J. B., eds. (1983). Time Warps, String Edits, and Macromolecules: Theory and Practice of Sequence Comparisons. Reading, MA: Addison-Wesley Publishing Company
【41】 Rosas-Romero, Roberto ; Díaz-Torres, Alejandro .(2006) Etcheverry, Gibran.Forecasting of stock return prices with sparse representation of financial time series over redundant dictionaries.Expert Systems With Applications. Vol.57, pp.37-48
【42】 Lahmiri, Salim.(2016).A variational mode decompoisition approach for analysis and forecasting of economic and financial time series.Expert Systems With Applications, .(55), pp.268-273 [Peer Reviewed Journal]
【43】 Hájek, Petr ; Neri, Filippo.An Introduction to the special issue on computational techniques for trading systems, time series forecasting, stock market modeling, financial assets modeling (print)
【44】 Palivonaite, Rita ; Lukoseviciute, Kristina ; Ragulskis, Minvydas.(2016).Short-term time series algebraic forecasting with mixed smoothing Neurocomputing, l(171). pp.854-865
【45】 Box, G. E. P. and Jenkins, G. M. (1976), Time Series Analysis: Forecasting and Control, San Francisco: Holden-Day.
【46】 Liao, T.W.(2005). Clustering of time series data: a survey. Pattern Recognit. 38, 1857-1874
描述 碩士
國立政治大學
資訊科學系碩士在職專班
98971006
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0098971006
資料類型 thesis
dc.contributor.advisor 胡毓忠zh_TW
dc.contributor.advisor Hu,Yuh Jongen_US
dc.contributor.author (Authors) 陳郁婷zh_TW
dc.contributor.author (Authors) Chen,Yu Tingen_US
dc.creator (作者) 陳郁婷zh_TW
dc.creator (作者) Chen, Yu Tingen_US
dc.date (日期) 2016en_US
dc.date.accessioned 2-Sep-2016 01:32:04 (UTC+8)-
dc.date.available 2-Sep-2016 01:32:04 (UTC+8)-
dc.date.issued (上傳時間) 2-Sep-2016 01:32:04 (UTC+8)-
dc.identifier (Other Identifiers) G0098971006en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/101249-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系碩士在職專班zh_TW
dc.description (描述) 98971006zh_TW
dc.description.abstract (摘要) 本研究透過時間序列拆解方法分析股價報酬率因數,取出趨勢波動特徵進行分群演算,將分群結果視為特徵值,進行更進一步資料分類。時間序列波形特徵,可對該序列做未來趨勢預測。本研究則將趨勢波形做為資料分群的特徵值,藉以輔助分類。本研究案例為財務危機公司,區分具實質財務危機或非實質財務危機,並整合公司其它財務與非財務相關分析。使用R 語言時間序列拆解工具找出趨勢波形並進行分群。採用Spark平行化計算架構的節點擴充運算能力與叢集式容錯處理以及RDD 的高效能運算。本研究並採用隨機決策森林的組合式(Ensemble)學習演算法進行公司危機型態的分類預測系統實驗。zh_TW
dc.description.abstract (摘要) The purpose of the study was to analyze rate of return factor by Time Series Wave Decomposition, to take Trend wave features to proceed clustering, then taking the clustering result as feature to achieve efficient data classification. Time Series Wave feature can be a predictor for future trend; however, this study took Time Series Wave as a classification feature and took Financial Distress company as an example to distinguish the financial distress to integrate relative financial analysis factors. Adopting Spark process data in parallel in standalone cluster mode with Resilient Distributed Dataset (RDD) to improve the computing performance. The study adopted random forest ensemble machine learning to proceed financial distress company classification prediction.en_US
dc.description.tableofcontents 第一章 導論 1
1.1 研究動機 1
1.2 研究目的 2
1.3 各章節概述 3
第二章 相關研究 4
第三章 研究架構設計與系統說明 8
3.1 研究架構 8
3.2 資料範圍說明與定義說明 9
3.3 時間序列趨勢拆解 13
3.4 相似度指標衡量 20
3.5 進行分群與評估 27
3.6 在Spark平台上進行資料分類與預測 30
第四章 實驗結果 35
4.1 時間序列成分拆解與未拆解對於時間分群的影響 36
4.2 相同之相似性指標,使用不同分群的比較 40
4.3 波動的相異度指標對於同分群方法不同指標的分群效果比較 45
4.4 分群與實際危機類別比較 46
4.5 波動特徵納入對分類的影響 48
第五章 結論與未來展望 50
zh_TW
dc.format.extent 3154315 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0098971006en_US
dc.subject (關鍵詞) Rzh_TW
dc.subject (關鍵詞) 時間序列zh_TW
dc.subject (關鍵詞) Sparkzh_TW
dc.subject (關鍵詞) 財務危機zh_TW
dc.subject (關鍵詞) Ren_US
dc.subject (關鍵詞) Time Seriesen_US
dc.subject (關鍵詞) Sparken_US
dc.subject (關鍵詞) financial distressen_US
dc.title (題名) 透過時間序列的波動特徵分群協助資料分類 -以公司危機事件為例zh_TW
dc.title (題名) Achieve Efficient Data Classification by Time Series Wave Decomposition Pattern Clustering: Financial Distress as an Exampleen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) 【1】 戴國良(2005)。「財務管理-最新實務導向與本土企業案例」。頁424。台灣:五南
【2】 天下雜誌: http://www.cw.com.tw/article/article.action?id=5076028
【3】 林麗雪、龍邵琪(2007)資料探勘技術應用於財務危機模式之建構,中華管理評論國際學報
【4】 Beaver, W.H.,( 1966), Financial ratios as predictors of failure, Journal of Accounting Research 4, 71-111.
【5】 Altman E.I.(1968). Financial Ratios, Discriminant Analysis and The Prediction of Corporate Bankruptcy ,The Journal of Finance, 23, pp. 589-609
【6】 Ohlson, J.A.,(1980), Financial ratios and the probabilistic prediction of bankruptcy, Journal of Accounting Research .18, 109-131.
【7】 Saeed Aghabozorgi,Ali Seyed Shirkhorshidi,Teh Ying Wah.(2015).Time-series clustering – A decade review.Information Systems,53,16-38
【8】 M. Chi, S. Banerjee, A.E. Hassanien.(2009).Clustering time series data: an evolutionary approach Found. Comput. Intell., 6 (1) , pp. 193–207
【9】 E. Keogh, S. Kasetty.(2003). On the need for time series data mining benchmarks: a survey and empirical demonstration, Data Min.Knowl. Discov. 7 (4) (2003) 349–371
【10】 Rosas-Romero, Roberto ; Díaz-Torres, Alejandro ; Etcheverry, Gibran.(2016).Forecasting of stock return prices with sparse representation of financial time series over redundant dictionaries.Expert Systems With Applications.57, pp.37-48
【11】 Keogh, Eamonn ; Lin, Jessica.(2005).Clustering of time-series subsequences is meaningless: implications for previous and future research.Knowledge .Information Systems, Vol.8(2), pp.154-177
【12】 X. Wang,K.Smith,R.Hyndman. (2006).Characteristic-based clustering for time seriesdata,DataMin.Knowl.Discov.13(3) 335–364.
【13】 J.MacQueen,(1967).Some methods for classification and analysis of multi- variate observations,in:Proceedings of the fifth Berkeley sympo-sium MathematicalStatist.Probability,vol.1,pp.281–297.
【14】 P.S.Bradley,U.Fayyad,C.Reina,(1998).Scaling clustering algorithms to large databases, Knowl.Discov.Data Min.9–15
【15】 台灣經濟新報TEJ資料庫
【16】 M. Kendall and A. Stuart (1983) The Advanced Theory of Statistics,.3, Griffin. pp. 410–414
【17】 Robert B. Cleveland, William S. Cleveland, Jean E. McRae, and Irma erpenning,(1990).STL:A Seasonal-Trend Decomposition Precedure Based on Loess Journal of Official Statistics,.6(1), pp. 3–73
【18】 Zhang,H.,Ho,T.B.,Zhang,Y.,andLin,M.(2006)Unsupervised feature extraction for timeseries clustering using or thogonal wavelet transform. INFORMATICA-LJUBLJANA-,30(3), 305.
【19】 Montero, P and Vilar, J.A. (2014) TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. http://www.jstatsoft.org/v62/i
【20】 Alonso, A.M., Berrendero, J.R., Hernandez, A. and Justel, A. (2006) Time series clustering based on forecast densities. Comput. Statist. Data Anal.,51,762–776.
【21】 Lin, J., Keogh, E., Lonardi, S. & Chiu, B. (2003). A Symbolic Representation of Time Series, with Implications for Streaming Algorithms. In Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.
【22】 Keogh, E., Chakrabarti, K., Pazzani, M., & Mehrotra, S. (2001). Dimensionality reduction for fast similarity search in large timeseries databases. Knowledge and information Systems,3(3),263-286.
【23】 El Hennawy, R. H. A, Morris, R. C.(1983). The Significance of Base Year in Developing Failure Prediction Models. Journal of Business Finance and Accounting.10(2), 209-223
【24】 Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica.(2010). Spark: cluster computing with working sets. In USENIX conference on Hot topics in cloud computing, p10,
【25】 Vilar, J.A., Alonso, A. M. and Vilar, J.M. (2010) Non-linear time series clustering based on nonparametric forecast densities. Comput. Statist. Data Anal.,54(11), 2850–2865
【26】 PJ Rousseeuw.(1987).Silhouettes: a graphical aid to the interpretation and validation of cluster analysis.Journal of Computational and Applied Mathematics 20, 53-65
【27】 Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32
【28】 Patel, Jigar ; Shah, Sahil ; Thakkar, Priyank ; Kotecha, K.(2015).Predicting stock and stock price index movement using Trend Deterministic Data Preparation and machine learning techniques. Expert Systems With Applications. 42(1), pp.259-268
【29】 Engle, Robert F.(1982). "Autoregressive Condi- tional Heteroskedasticity with Estimates of the Variance of United Kingdom Inflation." Econo- metrica. 50:4, pp. 987-1007
【30】 Engle, Robert.(2001)..GARCH 101: The Use of ARCH/GARCH Models in Applied Econometrics.Journal of Economic Perspectives, , Vol.15(4), pp.157-168
【31】 Zmijewski, M. E. (1984), “Methodological Issues Related to the Estimation of Financial Distress Prediction Models.”Journal of Accounting Research, 22, pp.59-82
【32】 Platt, H.D. and M. B. Platt.(2002).Predicting Corporate Financial Distress: Reflections on ChoiceBased Sample Bias.”Journal of Economics and Finance, 26, pp.184-199
【33】 Esling,Philippe ; Agon, Carlos.(2013).Time-series data mining..ACM Computing Surveys.Vol.45(1), p.12(34)
【34】 鉅亨網. (2016/08/15): http://news.cnyes.com/news/id/2155746
【35】 Indicator Reference: http://www.fmlabs.com/reference
【36】 Package `TTR` - CRAN :https://cran.r-project.org/web/packages/TTR/TTR.pdf
【37】 V. López, A. Fernandez, S. García, V. Palade, F. Herrera. (2013). An Insight into Classification with Imbalanced Data: Empirical Results and Current Trends on Using Data Intrinsic Characteristics. Information Sciences .250, 113-141
【38】 P. Ravi Kumar, V. Ravi .(2007).Bankruptcy prediction in banks and firms via statistical and intelligent techniques – A review. European Journal of Operational Research 180(1),pp 1–28
【39】 Sun, J., & Li, H. (2008b). Data mining method for listed companies’ financial distress prediction. Knowledge-Based Systems, 21(1), 1–5.
【40】 Sankoff, D., and Kruskal, J. B., eds. (1983). Time Warps, String Edits, and Macromolecules: Theory and Practice of Sequence Comparisons. Reading, MA: Addison-Wesley Publishing Company
【41】 Rosas-Romero, Roberto ; Díaz-Torres, Alejandro .(2006) Etcheverry, Gibran.Forecasting of stock return prices with sparse representation of financial time series over redundant dictionaries.Expert Systems With Applications. Vol.57, pp.37-48
【42】 Lahmiri, Salim.(2016).A variational mode decompoisition approach for analysis and forecasting of economic and financial time series.Expert Systems With Applications, .(55), pp.268-273 [Peer Reviewed Journal]
【43】 Hájek, Petr ; Neri, Filippo.An Introduction to the special issue on computational techniques for trading systems, time series forecasting, stock market modeling, financial assets modeling (print)
【44】 Palivonaite, Rita ; Lukoseviciute, Kristina ; Ragulskis, Minvydas.(2016).Short-term time series algebraic forecasting with mixed smoothing Neurocomputing, l(171). pp.854-865
【45】 Box, G. E. P. and Jenkins, G. M. (1976), Time Series Analysis: Forecasting and Control, San Francisco: Holden-Day.
【46】 Liao, T.W.(2005). Clustering of time series data: a survey. Pattern Recognit. 38, 1857-1874
zh_TW