學術產出-學位論文
文章檢視/開啟
書目匯出
-
題名 "Spaghetti" 主成份分析應用於時間相關區間型資料的研究---以台灣股價為例
A study of Spaghetti PCA for time dependent interval data applied to stock prices in Taiwan作者 邱大倞
Chiu, Ta Ching貢獻者 劉惠美<br>鄭宗記
Liu, Hui Mei<br>Cheng, Tsung Chi
邱大倞
Chiu, Ta Ching關鍵詞 主成份分析
區間型資料
時間相關
方向性的區間型資料
Principal component analysis
Interval data
Time dependent
Oriented intervals日期 2009 上傳時間 9-五月-2016 11:37:47 (UTC+8) 摘要 區間型資料一般定義為由一個連續型變數的上限及下限所構成,本文中,我們特別引進了一種與時間相關的區間型資料,在Irpino (2006, Pattern Recognition Letters, 27, 504-513),他提出每個觀測值皆是由某個時段的起始值及終點值之有方向性的區間所組成,譬如某一支股票在某一周的開盤價和收盤價。過去已經有許多方法運用在區間型資料,但尚未有方法來處理有方向性的區間型資料,然而Irpino 延伸主成分方法來處理有方向性的區間資料。Irpino提出的方法以幾何學的觀點來說,可視為定義在多維度空間上對有方向性線段(一般都稱作“spaghetti”)的分析,在本文中我們有更作進一步的延伸,不僅引入股票的開盤價及收盤價,且引入當周的最高價及最低價來探索Irpino所遺漏的資訊。此外,我們也嘗試用貝他分配來取代Irpino所使用的均勻分配來檢測是否有明顯的改善。延伸的方法需要計算大量複雜的式子,包含了平均數,變異數,共變異數等,最後利用相關係數矩陣進行主成分分析。然而最後的結論為若考慮資訊的價值,以加入最高值和最小值的延伸方法是較好的選擇。
Interval data are generally defined by the upper and the lower value assumed by a unit for a continuous variable. In this study, we introduce a special type of interval description depending on time. The original idea (Irpino, 2006, Pattern Recognition Letters, 27, 504-513) is that each observation is characterized by an oriented interval of values with a starting and a closing value for each period of observation: for example, the beginning and the closing price of a stock in a week. Several factorial methods have been discovered in order to treat interval data, but not yet for oriented intervals. Irpino presented an extension of principle component analysis to time dependent interval data, or, in general, to oriented intervals. From a geometrical point of view, the proposed approach can be considered as an analysis of oriented segments (nicely called “spaghetti”) defined in a multidimensional space identified by periods. In this paper, we make further extension not only provide the opening and the closing value but also the highest and the lowest value in a week to find out more information that cannot simply obtained from the original idea. Besides, we also use beta distribution to see if there is any improvement corresponding to the original ones. After we make these extensions, many complicated computations should be calculated such as the mean, variance, covariance in order to obtain correlation matrix for PCA. With regard to the value of information, the extended idea with the highest prices and the lowest price is the best choice.參考文獻 Cazes, P., Chouakria, A., Diday, E. & Schektman, Y. (1997) “Extension de l`analyse en composantes principales á des données de type intervalle”, Revue de Statistique Appliquée, XIV, 3, 5-24. Chen, P.D. (2009) “An extension of Spaghetti PCA for time dependent interval data”, master thesis, National Chengchi University, Taipei, Taiwan, R.O.C. Diday, E. (1987) “Introduction l’approche symbolique en Analyse des Donnés”, Première Journées Symbolique-Numerique, Université de Paris IX Dauphine. Diday, E. (2002) “An Introduction to Symbolic Data Analysis and the Sodas Software”, Journal of Symbolic Data Analysis, 0, ISSN 1723-5081. Gioia, F. & Lauro, C.N. (2005) “Basic statistical methods for interval data”, Statistica Applicata [Italian Journal of Applied Statistics], 17, 1, 75-104. Gioia, F. & Lauro, C.N. (2006) “Principal component analysis on interval data”, Computational Statistics, 21, 2, 343-363. Goupil, F., Touati, M. Diday, E. & Van Der Veen. H. (2000) “Symbolic Analysis of Financial Data ”. Irpino, A. (2006) “Spaghetti PCA analysis: An extension of principal components analysis to time dependent interval data”, Pattern Recognition Letters, 27, 504-513. Lauro, C.N. & Palumbo, F. (1998) “New approaches to principal component analysis to interval data, International Seminar on New Techniques & Technologies for Statistics, NTTS’98, 4/6 nov. 1998, Sorrento, Italy. Lauro, C.N. & Palumbo, F. (2000) “Principal Component Analysis of Interval Data: A Symbolic Data Analysis Approach”, Computational Statistics, 15, 1, 73-87. Lauro, C.N. & Palumbo, F. (2003) “Some results and new perspectives in Principal Component Analysis for interval data”, 237-244 Atti del Convegno CLADAG`03 Gruppo di Classificazione della Società Italiana di Statistica. Palumbo, F. & Lauro, C.N. (2003) “A PCA for interval valued data based on midpoints and radii”, New developments in Psychometrics, Yanai H. et al. eds., Psychometric Society, Springer-Verlag, Tokyo. Zuccolotto, P. (2007) “Principal component of sample estimates: an approach through symbolic data analysis”, Applied & Metallurgical Statistics, 16,173-192. 描述 碩士
國立政治大學
統計學系
96354016資料來源 http://thesis.lib.nccu.edu.tw/record/#G0096354016 資料類型 thesis dc.contributor.advisor 劉惠美<br>鄭宗記 zh_TW dc.contributor.advisor Liu, Hui Mei<br>Cheng, Tsung Chi en_US dc.contributor.author (作者) 邱大倞 zh_TW dc.contributor.author (作者) Chiu, Ta Ching en_US dc.creator (作者) 邱大倞 zh_TW dc.creator (作者) Chiu, Ta Ching en_US dc.date (日期) 2009 en_US dc.date.accessioned 9-五月-2016 11:37:47 (UTC+8) - dc.date.available 9-五月-2016 11:37:47 (UTC+8) - dc.date.issued (上傳時間) 9-五月-2016 11:37:47 (UTC+8) - dc.identifier (其他 識別碼) G0096354016 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/94721 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 統計學系 zh_TW dc.description (描述) 96354016 zh_TW dc.description.abstract (摘要) 區間型資料一般定義為由一個連續型變數的上限及下限所構成,本文中,我們特別引進了一種與時間相關的區間型資料,在Irpino (2006, Pattern Recognition Letters, 27, 504-513),他提出每個觀測值皆是由某個時段的起始值及終點值之有方向性的區間所組成,譬如某一支股票在某一周的開盤價和收盤價。過去已經有許多方法運用在區間型資料,但尚未有方法來處理有方向性的區間型資料,然而Irpino 延伸主成分方法來處理有方向性的區間資料。Irpino提出的方法以幾何學的觀點來說,可視為定義在多維度空間上對有方向性線段(一般都稱作“spaghetti”)的分析,在本文中我們有更作進一步的延伸,不僅引入股票的開盤價及收盤價,且引入當周的最高價及最低價來探索Irpino所遺漏的資訊。此外,我們也嘗試用貝他分配來取代Irpino所使用的均勻分配來檢測是否有明顯的改善。延伸的方法需要計算大量複雜的式子,包含了平均數,變異數,共變異數等,最後利用相關係數矩陣進行主成分分析。然而最後的結論為若考慮資訊的價值,以加入最高值和最小值的延伸方法是較好的選擇。 zh_TW dc.description.abstract (摘要) Interval data are generally defined by the upper and the lower value assumed by a unit for a continuous variable. In this study, we introduce a special type of interval description depending on time. The original idea (Irpino, 2006, Pattern Recognition Letters, 27, 504-513) is that each observation is characterized by an oriented interval of values with a starting and a closing value for each period of observation: for example, the beginning and the closing price of a stock in a week. Several factorial methods have been discovered in order to treat interval data, but not yet for oriented intervals. Irpino presented an extension of principle component analysis to time dependent interval data, or, in general, to oriented intervals. From a geometrical point of view, the proposed approach can be considered as an analysis of oriented segments (nicely called “spaghetti”) defined in a multidimensional space identified by periods. In this paper, we make further extension not only provide the opening and the closing value but also the highest and the lowest value in a week to find out more information that cannot simply obtained from the original idea. Besides, we also use beta distribution to see if there is any improvement corresponding to the original ones. After we make these extensions, many complicated computations should be calculated such as the mean, variance, covariance in order to obtain correlation matrix for PCA. With regard to the value of information, the extended idea with the highest prices and the lowest price is the best choice. en_US dc.description.tableofcontents 1. Introduction 1 2. Literature Review 2 3. Time Dependent Interval Data 5 4. “Spaghetti” PCA Idea 9 4.1 Common Oriented Segments with Different Ideas 9 4.1.1 Original “Spaghetti” PCA Idea 9 4.1.2 Original “Spaghetti” PCA Idea with Beta Distribution 11 4.1.3 Extended “Spaghetti” PCA Idea 13 4.2 Standardized Oriented Segments 23 4.3 Applied vectors coordinates on the factorial axes 24 4.4 Correlation between principal components and variables 26 5. Application 27 5.1 Data Collection 27 5.2 Result of Thirty Semiconductor Industry 31 5.2.1 Original “Spaghetti” PCA Idea 31 5.2.2 Original “Spaghetti” PCA Idea with Beta Distribution 35 5.2.3 Extended “Spaghetti” PCA Idea 37 5.3 Result of Forty-Seven Stocks from TSEC Taiwan 50 index 39 5.3.1 Original “Spaghetti” PCA Idea 39 5.3.2 Original “Spaghetti” PCA Idea with Beta Distribution 41 5.3.3 Extended “Spaghetti” PCA Idea 42 6. Conclusion and Suggestion 45 6.1 Conclusion 45 6.2 Suggestion 46 Reference 47 Appendix A 49 Appendix B 54 zh_TW dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0096354016 en_US dc.subject (關鍵詞) 主成份分析 zh_TW dc.subject (關鍵詞) 區間型資料 zh_TW dc.subject (關鍵詞) 時間相關 zh_TW dc.subject (關鍵詞) 方向性的區間型資料 zh_TW dc.subject (關鍵詞) Principal component analysis en_US dc.subject (關鍵詞) Interval data en_US dc.subject (關鍵詞) Time dependent en_US dc.subject (關鍵詞) Oriented intervals en_US dc.title (題名) "Spaghetti" 主成份分析應用於時間相關區間型資料的研究---以台灣股價為例 zh_TW dc.title (題名) A study of Spaghetti PCA for time dependent interval data applied to stock prices in Taiwan en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) Cazes, P., Chouakria, A., Diday, E. & Schektman, Y. (1997) “Extension de l`analyse en composantes principales á des données de type intervalle”, Revue de Statistique Appliquée, XIV, 3, 5-24. Chen, P.D. (2009) “An extension of Spaghetti PCA for time dependent interval data”, master thesis, National Chengchi University, Taipei, Taiwan, R.O.C. Diday, E. (1987) “Introduction l’approche symbolique en Analyse des Donnés”, Première Journées Symbolique-Numerique, Université de Paris IX Dauphine. Diday, E. (2002) “An Introduction to Symbolic Data Analysis and the Sodas Software”, Journal of Symbolic Data Analysis, 0, ISSN 1723-5081. Gioia, F. & Lauro, C.N. (2005) “Basic statistical methods for interval data”, Statistica Applicata [Italian Journal of Applied Statistics], 17, 1, 75-104. Gioia, F. & Lauro, C.N. (2006) “Principal component analysis on interval data”, Computational Statistics, 21, 2, 343-363. Goupil, F., Touati, M. Diday, E. & Van Der Veen. H. (2000) “Symbolic Analysis of Financial Data ”. Irpino, A. (2006) “Spaghetti PCA analysis: An extension of principal components analysis to time dependent interval data”, Pattern Recognition Letters, 27, 504-513. Lauro, C.N. & Palumbo, F. (1998) “New approaches to principal component analysis to interval data, International Seminar on New Techniques & Technologies for Statistics, NTTS’98, 4/6 nov. 1998, Sorrento, Italy. Lauro, C.N. & Palumbo, F. (2000) “Principal Component Analysis of Interval Data: A Symbolic Data Analysis Approach”, Computational Statistics, 15, 1, 73-87. Lauro, C.N. & Palumbo, F. (2003) “Some results and new perspectives in Principal Component Analysis for interval data”, 237-244 Atti del Convegno CLADAG`03 Gruppo di Classificazione della Società Italiana di Statistica. Palumbo, F. & Lauro, C.N. (2003) “A PCA for interval valued data based on midpoints and radii”, New developments in Psychometrics, Yanai H. et al. eds., Psychometric Society, Springer-Verlag, Tokyo. Zuccolotto, P. (2007) “Principal component of sample estimates: an approach through symbolic data analysis”, Applied & Metallurgical Statistics, 16,173-192. zh_TW