學術產出-學位論文
文章檢視/開啟
書目匯出
-
題名 "Spaghetti "主成份分析之延伸-應用於時間相關之區間型台灣股價資料
An extension of Spaghetti PCA for time dependent interval data作者 陳品達
Chen, Pin-Da貢獻者 劉惠美<br>鄭宗記
Liu, Huimei<br>Cheng, Tsung-Chi
陳品達
Chen, Pin-Da關鍵詞 主成份分析
區間型資料
時間相關
Principal component analysis
Interval data
Time dependent日期 2009 上傳時間 9-五月-2016 11:37:52 (UTC+8) 摘要 摘要 近幾年發展的區間型態資料之主成份分析,運用在某些領域的資料上尚未成熟,例如股票價格的資料,這些資料是與時間息息相關地,於是有了時間相關的區間資料分析 (Irpino, 2006. Pattern Recognition Letters 27, 504-513)。本文延續這個分析,針對時間相關之區間型台灣股價資料進行研究。Irpino (2006) 的方法只考慮每週的開盤價與收盤價,為了得到更多資訊,我們提出三種方法,第一個方法,將每週的最高價(最低價)納入分析,由兩點的分析變成三點的分析;第二個方法,我們同時考慮最高價與最低價,變成四點的分析,這兩個方法都能得到原始方法不能得到的資訊-公司的穩定度,其中又以第二個方法較為準確;第三種方法引用Irpino (2006) 的建議,我們改變區間的分配,而此方法得到的結果與原 始的方法差異不大。 本文分別收集了台灣金融市場三十家半導體與台指五十中的四十七家公司於民國九十七年九月一號到十二月二十六號共十七週的股價資料進行實證分析。以台指五十為例,分析結果顯示編號17的台達電子工業股份有限公司、編號24的鴻海科技集團,這兩家公司的未來被看好;而編號10的聯陽半導體股份有限公司、編號35的統一超商股份有限公司,此兩家公司的未來不被看好,這四家公司在民國九十八年一月五號到一月七號三天的走勢確實是如此!此外,結果顯示 金融體系的公司比電子體系的公司來得穩定。 關鍵字:主成份分析,區間型資料,時間相關
ABSTRACT The methods for principal component analysis on interval data have not been ripe yet in some areas, for example, the data of stock prices that are closely related to the time, so the analysis of time dependent interval data was proposed (Irpino, 2006. Pattern Recognition Letters 27, 504-513). In this paper, we apply this approach to the stock prices data in Taiwan. The original “Spaghetti” PCA in Irpino (2006) considered only the starting and the ending prices for each week. In order to get more information we propose three methods. We consider the highest (lowest) price for each week to our analysis in Method 1, and the analysis changes from two points to three points. In Method 2, we consider all information to our analysis which considers four points. These two methods can get more information than the original one. For example, we can get the information of stability degree of the company. For the Method 3, we quote the suggestion from Irpino (2006) to change the distribution of intervals from uniform to beta. However, the result is similar to the original result. In our approach, we collect data of stock prices from 37 companies of semiconductor and 47 companies of TSEC Taiwan 50 index in Taiwan financial market during the 17 weeks from September 1 to December 26, 2008. For TSEC Taiwan 50 index, the results of this analysis are that the future trend of Delta (Delta Electronics Incorporation) which numbers 17 and Foxconn (Foxconn Electronics Incorporation) which numbers 24 are optimistic; And ITE (Integrated Technology Express) which numbers 10 and 7-ELEVEn (President Chain Store Corporation) which numbers 35 are not good. In fact, the trends of these four companies are indicated these results during January 5th to 7th. What’s more, the financial companies are steadier than the electronic industry. Keywords: Principal component analysis; Interval data; Time dependent參考文獻 References [1] Cazes, P., 1997. Extension de l’analyse en composantes principales à des données de type intervalle. Revue de Statistique Appliquée, XLV (3), 5-24. [2] Chiu, T.C., 2009. A study of Spaghetti PCA for time dependent interval data. [3] Diday, E., Lechevallier, Y. & Opitz, O. (eds.), 1996. Ordinal and Symbolic Data Analysis. Springer, ISBN 3-540-61081-2; pp. 372, DM 135.00 [4] Diday, E. & Esposito, F., 2003. An introduction to Symbolic Data Analysis and the SODAS software. Intelligent Data Analysis, 7(6), 583–602, IOS Press. [5] D’Urso, P. & Giordani, P., 2004. A least squares approach to principal component analysis for interval valued data. Giordani / Chemometrics and Intelligent Laboratory Systems, 70, 179–192. [6] Goupil, F., Touati, M., Diday, E. & Van Der Veen, H., 2000. Symbolic Analysis of Financial Data. (1) LISE-CEREMADE, Université Paris IX Dauphine, CNRS UMR 7534. Place du Mal de Lattre de Tassigny, 75775 PARIS CEDEX 16. (2) ING ITC ITR, Postbus 1800, 1000 BV Amsterdam [7] Gioia, F. & Lauro, N.C., 2005. Basic statistical methods for interval data. Statistica Applicata [Italian Journal of Applied Statistics] 17(1): 75-104. [8] Gioia, F. & Lauro, N.C., 2006. Principal component analysis on interval data. Computational Statistics, Volume 21, Issue 2. Pages: 343 –363. [9] Irpino, A., 2006. “Spaghetti” PCA analysis: An extension of principal components analysis to time dependent interval data. Pattern Recognition Letters 27, 504-513 [10] Lauro, N.C. & Palumbo, F., 2000. Principal components analysis of interval data : A symbolic data analysis approach. Comput. Stat. 15 (1), 79-87. [11] Lauro, N.C. & Palumbo, F., 2003. A PCA for interval-valued data based on midpoints and radii. In: New developments in Psychometrics, Yanai H., Okada A., Shigemasu K.,Kano Y. and Meulman J., eds., Psychometric Society, pp. 641-648, Springer, Tokyo. [12] Lauro, C.N. & Palumbo, F., 2003. Some results and new perspectives in Principal Component Analysis for interval data. 237–244. Cladag Book Short Papers. [13] Zuccolotto, P., 2007. Principal components of sample estimates: an approach through symbolic data analysis. Stat Meth & Appl, 16: 173–192 描述 碩士
國立政治大學
統計學系
96354017資料來源 http://thesis.lib.nccu.edu.tw/record/#G0096354017 資料類型 thesis dc.contributor.advisor 劉惠美<br>鄭宗記 zh_TW dc.contributor.advisor Liu, Huimei<br>Cheng, Tsung-Chi en_US dc.contributor.author (作者) 陳品達 zh_TW dc.contributor.author (作者) Chen, Pin-Da en_US dc.creator (作者) 陳品達 zh_TW dc.creator (作者) Chen, Pin-Da en_US dc.date (日期) 2009 en_US dc.date.accessioned 9-五月-2016 11:37:52 (UTC+8) - dc.date.available 9-五月-2016 11:37:52 (UTC+8) - dc.date.issued (上傳時間) 9-五月-2016 11:37:52 (UTC+8) - dc.identifier (其他 識別碼) G0096354017 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/94722 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 統計學系 zh_TW dc.description (描述) 96354017 zh_TW dc.description.abstract (摘要) 摘要 近幾年發展的區間型態資料之主成份分析,運用在某些領域的資料上尚未成熟,例如股票價格的資料,這些資料是與時間息息相關地,於是有了時間相關的區間資料分析 (Irpino, 2006. Pattern Recognition Letters 27, 504-513)。本文延續這個分析,針對時間相關之區間型台灣股價資料進行研究。Irpino (2006) 的方法只考慮每週的開盤價與收盤價,為了得到更多資訊,我們提出三種方法,第一個方法,將每週的最高價(最低價)納入分析,由兩點的分析變成三點的分析;第二個方法,我們同時考慮最高價與最低價,變成四點的分析,這兩個方法都能得到原始方法不能得到的資訊-公司的穩定度,其中又以第二個方法較為準確;第三種方法引用Irpino (2006) 的建議,我們改變區間的分配,而此方法得到的結果與原 始的方法差異不大。 本文分別收集了台灣金融市場三十家半導體與台指五十中的四十七家公司於民國九十七年九月一號到十二月二十六號共十七週的股價資料進行實證分析。以台指五十為例,分析結果顯示編號17的台達電子工業股份有限公司、編號24的鴻海科技集團,這兩家公司的未來被看好;而編號10的聯陽半導體股份有限公司、編號35的統一超商股份有限公司,此兩家公司的未來不被看好,這四家公司在民國九十八年一月五號到一月七號三天的走勢確實是如此!此外,結果顯示 金融體系的公司比電子體系的公司來得穩定。 關鍵字:主成份分析,區間型資料,時間相關 zh_TW dc.description.abstract (摘要) ABSTRACT The methods for principal component analysis on interval data have not been ripe yet in some areas, for example, the data of stock prices that are closely related to the time, so the analysis of time dependent interval data was proposed (Irpino, 2006. Pattern Recognition Letters 27, 504-513). In this paper, we apply this approach to the stock prices data in Taiwan. The original “Spaghetti” PCA in Irpino (2006) considered only the starting and the ending prices for each week. In order to get more information we propose three methods. We consider the highest (lowest) price for each week to our analysis in Method 1, and the analysis changes from two points to three points. In Method 2, we consider all information to our analysis which considers four points. These two methods can get more information than the original one. For example, we can get the information of stability degree of the company. For the Method 3, we quote the suggestion from Irpino (2006) to change the distribution of intervals from uniform to beta. However, the result is similar to the original result. In our approach, we collect data of stock prices from 37 companies of semiconductor and 47 companies of TSEC Taiwan 50 index in Taiwan financial market during the 17 weeks from September 1 to December 26, 2008. For TSEC Taiwan 50 index, the results of this analysis are that the future trend of Delta (Delta Electronics Incorporation) which numbers 17 and Foxconn (Foxconn Electronics Incorporation) which numbers 24 are optimistic; And ITE (Integrated Technology Express) which numbers 10 and 7-ELEVEn (President Chain Store Corporation) which numbers 35 are not good. In fact, the trends of these four companies are indicated these results during January 5th to 7th. What’s more, the financial companies are steadier than the electronic industry. Keywords: Principal component analysis; Interval data; Time dependent en_US dc.description.tableofcontents Contents 1 Introduction 1 2 Literature Review 3 3 Interval data related to time 6 4 Extension of “Spaghetti” PCA 9 4.1 Method 1 — only considered the highest point ................................................9 4.1.1 Factorial plane ........................................................................................14 4.2 Method 2 — considered all information ........................................................15 4.2.1 Factorial plane .....................................................................................21 4.3 Method 3 — using Beta distribution to original “Spaghetti” PCA ................22 4.3.1 Factorial plane .....................................................................................26 4.4 Principal Component Loading ......................................................................27 5 Real Data Analysis — two stock prices data from Taiwan 28 5.1 Data collection .................................................................................................28 5.2 Application in real data ....................................................................................30 5.2.1 Data of Semiconductor 97 ......................................................................30 5.2.2 Data of TSEC Taiwan 50 index 97 ........................................................36 6 Conclusion 39 7 Appendix 43 Appendix A:Method 1 ..........................................................................................43 Appendix A.1:Mean of the jth period for Method1 .........................................43 Appendix A.2:Variance of the jth period for Method1 ....................................44 Appendix A.3:Covariance of the jth and the kth period for Method1 ..............45 Appendix B:Method 2 ..........................................................................................52 Appendix B.1:Mean of jth period for Method 2 ..............................................52 Appendix B.2:Variance of the jth period for Method 2 ...................................55 Appendix B.3:Covariance of the jth and the kth period for Method ................58 Appendix C:Real Data Analysis ...........................................................................73 Appendix C.1:Proportion of explained for these two data .............................73 Appendix C.2:Eigenvectors by using Method1, Method2 and Method3 .......73 Appendix C.3:The principal component loading for Data Semiconductor ....75 Appendix C.4:The principal component loading for Data TSEC Taiwan 50 index .................................................................................76 Appendix C.5:First factorial plane of Method2 and Method3 for these two data ........................................................................................77 Appendix C.6:Raw data of some stocks .........................................................78 Appendix C.7:The rank of company`s steady .................................................80 Lists of Tables 1 Table3.1 Stock prices of X1 and X2 at time t1, t2 and t3 ......................................6 2 Table3.2 Interval data ........................................................................................7 3 Table3.3 Time dependent interval data ..............................................................7 4 Table5.1 Eigenvalues and explained inertia for Semiconductor 97 by using Method1 and Method2 ......................................................................31 5 Table5.2 The rank of amplitude from large to small of Semiconductor ..........36 6 Table5.3 The rank the vibration rate from large to small of Semiconductor .. 36 7 Table5.4 Eigenvalues and explained inertia for TSEC Taiwan 50 index 97 by using Method1 and Method2 .............................................................37 8 Table5.5 The rank of amplitude from large to small of TSEC Taiwan 50 index ............................................................................................................38 9 Table5.6 The rank the vibration rate from large to small of TSEC Taiwan 50 index ..................................................................................................38 10 Table7.1 Eigenvalues and explained inertia for these two data by using Method3 ............................................................................................73 11 Table7.2 Eigenvectors of Semiconductor 97 ...................................................73 12 Table7.3 Eigenvectors of TSEC Taiwan 50 index 97 .....................................74 13 Table7.4 Raw data of ITE and ALI .................................................................78 14 Table7.5 Raw data of ITE, 7-ELEVEn, Delta and Foxconn ...........................79 15 Table 7.6 The ranks of vibration rate of Semiconductor from small to large .80 16 Table 7.7 The ranks of vibration rate of TSEC Taiwan 50 index from small to large..................................................................................................81 Lists of Figures 1 Fig.2.1 Transposition to the origin ....................................................................4 2 Fig.3.1 Representation of the data by time series ..............................................6 3 Fig.3.2 Representation of the data by rectangles ...............................................7 4 Fig.3.3 Representation of the data by diagonal .................................................8 5 Fig.4.1 Two types of these two groups ............................................................12 6 Fig.4.2 Left graph represents , the other represents ..........16 7 Fig.4.3 Four groups of these cases ..................................................................20 8 Fig.4.4 The types of 11 kinds of cases of Group1 ...........................................21 9 Fig.4.5 PDFs of Beta(1.7,10) and Beta(10,1.7) ...............................................24 10 Fig.5.1 The time axis of 4 bargain days of a week ..........................................28 11 Fig.5.2 The time axis of 5 bargain days of a week ..........................................29 12 Fig.5.3 Representation of Semiconductor 97 on the first factorial plane by using Method1 .....................................................................................31 13 Fig.5.4 Representation small companies of Semiconductor 97 on the first factorial plane by using Method1 ........................................................32 14 Fig.5.5 Representation of Semiconductor 97 on the first factorial plane by using Method1 and Method2 ...............................................................34 15 Fig.5.6 Representation of TSEC Taiwan 50 index 97 on the first factorial plane by using Method1 ......................................................................37 16 Fig.7.1 The loadings of Mehod1 for Semiconductor ......................................75 17 Fig.7.2 The loadings of Mehod2 for Semiconductor ......................................75 18 Fig.7.3 The loadings of Mehod3 for Semiconductor ......................................75 19 Fig.7.4 The loadings of Mehod1 for TSEC Taiwan 50 index .........................76 20 Fig.7.5 The loadings of Mehod2 for TSEC Taiwan 50 index .........................76 21 Fig.7.6 The loadings of Mehod3 for TSEC Taiwan 50 index .........................76 22 Fig.7.7 Representation of Semiconductor 97 on the first factorial plane by using Method2 and Method3 ...............................................................77 23 Fig.7.8 Representation of TSEC Taiwan 50 index 97 on the first factorial plane by using Method2 and Method3 ................................................77 zh_TW dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0096354017 en_US dc.subject (關鍵詞) 主成份分析 zh_TW dc.subject (關鍵詞) 區間型資料 zh_TW dc.subject (關鍵詞) 時間相關 zh_TW dc.subject (關鍵詞) Principal component analysis en_US dc.subject (關鍵詞) Interval data en_US dc.subject (關鍵詞) Time dependent en_US dc.title (題名) "Spaghetti "主成份分析之延伸-應用於時間相關之區間型台灣股價資料 zh_TW dc.title (題名) An extension of Spaghetti PCA for time dependent interval data en_US dc.type (資料類型) thesis en_US dc.relation.reference (參考文獻) References [1] Cazes, P., 1997. Extension de l’analyse en composantes principales à des données de type intervalle. Revue de Statistique Appliquée, XLV (3), 5-24. [2] Chiu, T.C., 2009. A study of Spaghetti PCA for time dependent interval data. [3] Diday, E., Lechevallier, Y. & Opitz, O. (eds.), 1996. Ordinal and Symbolic Data Analysis. Springer, ISBN 3-540-61081-2; pp. 372, DM 135.00 [4] Diday, E. & Esposito, F., 2003. An introduction to Symbolic Data Analysis and the SODAS software. Intelligent Data Analysis, 7(6), 583–602, IOS Press. [5] D’Urso, P. & Giordani, P., 2004. A least squares approach to principal component analysis for interval valued data. Giordani / Chemometrics and Intelligent Laboratory Systems, 70, 179–192. [6] Goupil, F., Touati, M., Diday, E. & Van Der Veen, H., 2000. Symbolic Analysis of Financial Data. (1) LISE-CEREMADE, Université Paris IX Dauphine, CNRS UMR 7534. Place du Mal de Lattre de Tassigny, 75775 PARIS CEDEX 16. (2) ING ITC ITR, Postbus 1800, 1000 BV Amsterdam [7] Gioia, F. & Lauro, N.C., 2005. Basic statistical methods for interval data. Statistica Applicata [Italian Journal of Applied Statistics] 17(1): 75-104. [8] Gioia, F. & Lauro, N.C., 2006. Principal component analysis on interval data. Computational Statistics, Volume 21, Issue 2. Pages: 343 –363. [9] Irpino, A., 2006. “Spaghetti” PCA analysis: An extension of principal components analysis to time dependent interval data. Pattern Recognition Letters 27, 504-513 [10] Lauro, N.C. & Palumbo, F., 2000. Principal components analysis of interval data : A symbolic data analysis approach. Comput. Stat. 15 (1), 79-87. [11] Lauro, N.C. & Palumbo, F., 2003. A PCA for interval-valued data based on midpoints and radii. In: New developments in Psychometrics, Yanai H., Okada A., Shigemasu K.,Kano Y. and Meulman J., eds., Psychometric Society, pp. 641-648, Springer, Tokyo. [12] Lauro, C.N. & Palumbo, F., 2003. Some results and new perspectives in Principal Component Analysis for interval data. 237–244. Cladag Book Short Papers. [13] Zuccolotto, P., 2007. Principal components of sample estimates: an approach through symbolic data analysis. Stat Meth & Appl, 16: 173–192 zh_TW