Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 "Spaghetti "主成份分析之延伸-應用於時間相關之區間型台灣股價資料
An extension of Spaghetti PCA for time dependent interval data
作者 陳品達
Chen, Pin-Da
貢獻者 劉惠美<br>鄭宗記
Liu, Huimei<br>Cheng, Tsung-Chi
陳品達
Chen, Pin-Da
關鍵詞 主成份分析
區間型資料
時間相關
Principal component analysis
Interval data
Time dependent
日期 2009
上傳時間 9-May-2016 11:37:52 (UTC+8)
摘要 摘要
     近幾年發展的區間型態資料之主成份分析,運用在某些領域的資料上尚未成熟,例如股票價格的資料,這些資料是與時間息息相關地,於是有了時間相關的區間資料分析 (Irpino, 2006. Pattern Recognition Letters 27, 504-513)。本文延續這個分析,針對時間相關之區間型台灣股價資料進行研究。Irpino (2006) 的方法只考慮每週的開盤價與收盤價,為了得到更多資訊,我們提出三種方法,第一個方法,將每週的最高價(最低價)納入分析,由兩點的分析變成三點的分析;第二個方法,我們同時考慮最高價與最低價,變成四點的分析,這兩個方法都能得到原始方法不能得到的資訊-公司的穩定度,其中又以第二個方法較為準確;第三種方法引用Irpino (2006) 的建議,我們改變區間的分配,而此方法得到的結果與原
     始的方法差異不大。
     本文分別收集了台灣金融市場三十家半導體與台指五十中的四十七家公司於民國九十七年九月一號到十二月二十六號共十七週的股價資料進行實證分析。以台指五十為例,分析結果顯示編號17的台達電子工業股份有限公司、編號24的鴻海科技集團,這兩家公司的未來被看好;而編號10的聯陽半導體股份有限公司、編號35的統一超商股份有限公司,此兩家公司的未來不被看好,這四家公司在民國九十八年一月五號到一月七號三天的走勢確實是如此!此外,結果顯示
     金融體系的公司比電子體系的公司來得穩定。
     
     關鍵字:主成份分析,區間型資料,時間相關
ABSTRACT
     The methods for principal component analysis on interval data have not been ripe yet in some areas, for example, the data of stock prices that are closely related to the time, so the analysis of time dependent interval data was proposed (Irpino, 2006. Pattern Recognition Letters 27, 504-513). In this paper, we apply this approach to the stock prices data in Taiwan. The original “Spaghetti” PCA in Irpino (2006) considered only the starting and the ending prices for each week. In order to get more information we propose three methods. We consider the highest (lowest) price for each week to our analysis in Method 1, and the analysis changes from two points to three points. In Method 2, we consider all information to our analysis which considers four points. These two methods can get more information than the original one. For example, we can get the information of stability degree of the company. For the Method 3, we quote the suggestion from Irpino (2006) to change the distribution of intervals from uniform to beta. However, the result is similar to the original result.
     In our approach, we collect data of stock prices from 37 companies of semiconductor and 47 companies of TSEC Taiwan 50 index in Taiwan financial market during the 17 weeks from September 1 to December 26, 2008. For TSEC Taiwan 50 index, the results of this analysis are that the future trend of Delta (Delta Electronics Incorporation) which numbers 17 and Foxconn (Foxconn Electronics Incorporation) which numbers 24 are optimistic; And ITE (Integrated Technology Express) which numbers 10 and 7-ELEVEn (President Chain Store Corporation) which numbers 35 are not good. In fact, the trends of these four companies are indicated these results during January 5th to 7th. What’s more, the financial companies are steadier than the electronic industry.
     
     Keywords: Principal component analysis; Interval data; Time dependent
參考文獻 References
     [1] Cazes, P., 1997. Extension de l’analyse en composantes principales à des données
     de type intervalle. Revue de Statistique Appliquée, XLV (3), 5-24.
     [2] Chiu, T.C., 2009. A study of Spaghetti PCA for time dependent interval data.
     [3] Diday, E., Lechevallier, Y. & Opitz, O. (eds.), 1996. Ordinal and Symbolic Data
     Analysis. Springer, ISBN 3-540-61081-2; pp. 372, DM 135.00
     [4] Diday, E. & Esposito, F., 2003. An introduction to Symbolic Data Analysis and
     the SODAS software. Intelligent Data Analysis, 7(6), 583–602, IOS Press.
     [5] D’Urso, P. & Giordani, P., 2004. A least squares approach to principal
     component analysis for interval valued data. Giordani / Chemometrics and
     Intelligent Laboratory Systems, 70, 179–192.
     [6] Goupil, F., Touati, M., Diday, E. & Van Der Veen, H., 2000. Symbolic Analysis
     of Financial Data. (1) LISE-CEREMADE, Université Paris IX Dauphine,
     CNRS UMR 7534. Place du Mal de Lattre de Tassigny, 75775 PARIS
     CEDEX 16. (2) ING ITC ITR, Postbus 1800, 1000 BV Amsterdam
     [7] Gioia, F. & Lauro, N.C., 2005. Basic statistical methods for interval data.
     Statistica Applicata [Italian Journal of Applied Statistics] 17(1): 75-104.
     [8] Gioia, F. & Lauro, N.C., 2006. Principal component analysis on interval
     data. Computational Statistics, Volume 21, Issue 2. Pages: 343 –363.
     [9] Irpino, A., 2006. “Spaghetti” PCA analysis: An extension of principal components
     analysis to time dependent interval data. Pattern Recognition Letters 27,
     504-513
     [10] Lauro, N.C. & Palumbo, F., 2000. Principal components analysis of interval data
     : A symbolic data analysis approach. Comput. Stat. 15 (1), 79-87.
     [11] Lauro, N.C. & Palumbo, F., 2003. A PCA for interval-valued data based on
     midpoints and radii. In: New developments in Psychometrics, Yanai H., Okada A., Shigemasu K.,Kano Y. and Meulman J., eds., Psychometric Society,
     pp. 641-648, Springer, Tokyo.
     [12] Lauro, C.N. & Palumbo, F., 2003. Some results and new perspectives in
     Principal Component Analysis for interval data. 237–244. Cladag Book Short
     Papers.
     [13] Zuccolotto, P., 2007. Principal components of sample estimates: an approach
     through symbolic data analysis. Stat Meth & Appl, 16: 173–192
描述 碩士
國立政治大學
統計學系
96354017
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0096354017
資料類型 thesis
dc.contributor.advisor 劉惠美<br>鄭宗記zh_TW
dc.contributor.advisor Liu, Huimei<br>Cheng, Tsung-Chien_US
dc.contributor.author (Authors) 陳品達zh_TW
dc.contributor.author (Authors) Chen, Pin-Daen_US
dc.creator (作者) 陳品達zh_TW
dc.creator (作者) Chen, Pin-Daen_US
dc.date (日期) 2009en_US
dc.date.accessioned 9-May-2016 11:37:52 (UTC+8)-
dc.date.available 9-May-2016 11:37:52 (UTC+8)-
dc.date.issued (上傳時間) 9-May-2016 11:37:52 (UTC+8)-
dc.identifier (Other Identifiers) G0096354017en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/94722-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計學系zh_TW
dc.description (描述) 96354017zh_TW
dc.description.abstract (摘要) 摘要
     近幾年發展的區間型態資料之主成份分析,運用在某些領域的資料上尚未成熟,例如股票價格的資料,這些資料是與時間息息相關地,於是有了時間相關的區間資料分析 (Irpino, 2006. Pattern Recognition Letters 27, 504-513)。本文延續這個分析,針對時間相關之區間型台灣股價資料進行研究。Irpino (2006) 的方法只考慮每週的開盤價與收盤價,為了得到更多資訊,我們提出三種方法,第一個方法,將每週的最高價(最低價)納入分析,由兩點的分析變成三點的分析;第二個方法,我們同時考慮最高價與最低價,變成四點的分析,這兩個方法都能得到原始方法不能得到的資訊-公司的穩定度,其中又以第二個方法較為準確;第三種方法引用Irpino (2006) 的建議,我們改變區間的分配,而此方法得到的結果與原
     始的方法差異不大。
     本文分別收集了台灣金融市場三十家半導體與台指五十中的四十七家公司於民國九十七年九月一號到十二月二十六號共十七週的股價資料進行實證分析。以台指五十為例,分析結果顯示編號17的台達電子工業股份有限公司、編號24的鴻海科技集團,這兩家公司的未來被看好;而編號10的聯陽半導體股份有限公司、編號35的統一超商股份有限公司,此兩家公司的未來不被看好,這四家公司在民國九十八年一月五號到一月七號三天的走勢確實是如此!此外,結果顯示
     金融體系的公司比電子體系的公司來得穩定。
     
     關鍵字:主成份分析,區間型資料,時間相關
zh_TW
dc.description.abstract (摘要) ABSTRACT
     The methods for principal component analysis on interval data have not been ripe yet in some areas, for example, the data of stock prices that are closely related to the time, so the analysis of time dependent interval data was proposed (Irpino, 2006. Pattern Recognition Letters 27, 504-513). In this paper, we apply this approach to the stock prices data in Taiwan. The original “Spaghetti” PCA in Irpino (2006) considered only the starting and the ending prices for each week. In order to get more information we propose three methods. We consider the highest (lowest) price for each week to our analysis in Method 1, and the analysis changes from two points to three points. In Method 2, we consider all information to our analysis which considers four points. These two methods can get more information than the original one. For example, we can get the information of stability degree of the company. For the Method 3, we quote the suggestion from Irpino (2006) to change the distribution of intervals from uniform to beta. However, the result is similar to the original result.
     In our approach, we collect data of stock prices from 37 companies of semiconductor and 47 companies of TSEC Taiwan 50 index in Taiwan financial market during the 17 weeks from September 1 to December 26, 2008. For TSEC Taiwan 50 index, the results of this analysis are that the future trend of Delta (Delta Electronics Incorporation) which numbers 17 and Foxconn (Foxconn Electronics Incorporation) which numbers 24 are optimistic; And ITE (Integrated Technology Express) which numbers 10 and 7-ELEVEn (President Chain Store Corporation) which numbers 35 are not good. In fact, the trends of these four companies are indicated these results during January 5th to 7th. What’s more, the financial companies are steadier than the electronic industry.
     
     Keywords: Principal component analysis; Interval data; Time dependent
en_US
dc.description.tableofcontents Contents
     
     1 Introduction 1
     2 Literature Review 3
     3 Interval data related to time 6
     4 Extension of “Spaghetti” PCA 9
     4.1 Method 1 — only considered the highest point ................................................9
      4.1.1 Factorial plane ........................................................................................14
      4.2 Method 2 — considered all information ........................................................15
      4.2.1 Factorial plane .....................................................................................21
      4.3 Method 3 — using Beta distribution to original “Spaghetti” PCA ................22
      4.3.1 Factorial plane .....................................................................................26
      4.4 Principal Component Loading ......................................................................27
     5 Real Data Analysis — two stock prices data from Taiwan 28
      5.1 Data collection .................................................................................................28
      5.2 Application in real data ....................................................................................30
      5.2.1 Data of Semiconductor 97 ......................................................................30
      5.2.2 Data of TSEC Taiwan 50 index 97 ........................................................36
     6 Conclusion 39
     7 Appendix 43
      Appendix A:Method 1 ..........................................................................................43
      Appendix A.1:Mean of the jth period for Method1 .........................................43
      Appendix A.2:Variance of the jth period for Method1 ....................................44
      Appendix A.3:Covariance of the jth and the kth period for Method1 ..............45
      Appendix B:Method 2 ..........................................................................................52
     Appendix B.1:Mean of jth period for Method 2 ..............................................52
     Appendix B.2:Variance of the jth period for Method 2 ...................................55
     Appendix B.3:Covariance of the jth and the kth period for Method ................58
      Appendix C:Real Data Analysis ...........................................................................73
      Appendix C.1:Proportion of explained for these two data .............................73
      Appendix C.2:Eigenvectors by using Method1, Method2 and Method3 .......73
      Appendix C.3:The principal component loading for Data Semiconductor ....75
      Appendix C.4:The principal component loading for Data TSEC Taiwan
      50 index .................................................................................76
      Appendix C.5:First factorial plane of Method2 and Method3 for these two
      data ........................................................................................77
      Appendix C.6:Raw data of some stocks .........................................................78
     Appendix C.7:The rank of company`s steady .................................................80
     
     Lists of Tables
     
     1 Table3.1 Stock prices of X1 and X2 at time t1, t2 and t3 ......................................6
     2 Table3.2 Interval data ........................................................................................7
     3 Table3.3 Time dependent interval data ..............................................................7
     4 Table5.1 Eigenvalues and explained inertia for Semiconductor 97 by using
      Method1 and Method2 ......................................................................31
     5 Table5.2 The rank of amplitude from large to small of Semiconductor ..........36
     6 Table5.3 The rank the vibration rate from large to small of Semiconductor .. 36
     7 Table5.4 Eigenvalues and explained inertia for TSEC Taiwan 50 index 97 by
      using Method1 and Method2 .............................................................37
      8 Table5.5 The rank of amplitude from large to small of TSEC Taiwan 50 index
      ............................................................................................................38
      9 Table5.6 The rank the vibration rate from large to small of TSEC Taiwan 50
      index ..................................................................................................38
      10 Table7.1 Eigenvalues and explained inertia for these two data by using
      Method3 ............................................................................................73
      11 Table7.2 Eigenvectors of Semiconductor 97 ...................................................73
      12 Table7.3 Eigenvectors of TSEC Taiwan 50 index 97 .....................................74
      13 Table7.4 Raw data of ITE and ALI .................................................................78
      14 Table7.5 Raw data of ITE, 7-ELEVEn, Delta and Foxconn ...........................79
      15 Table 7.6 The ranks of vibration rate of Semiconductor from small to large .80
     16 Table 7.7 The ranks of vibration rate of TSEC Taiwan 50 index from small to
     large..................................................................................................81
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     Lists of Figures
     
     1 Fig.2.1 Transposition to the origin ....................................................................4
     2 Fig.3.1 Representation of the data by time series ..............................................6
     3 Fig.3.2 Representation of the data by rectangles ...............................................7
     4 Fig.3.3 Representation of the data by diagonal .................................................8
     5 Fig.4.1 Two types of these two groups ............................................................12
     6 Fig.4.2 Left graph represents , the other represents ..........16
     7 Fig.4.3 Four groups of these cases ..................................................................20
      8 Fig.4.4 The types of 11 kinds of cases of Group1 ...........................................21
      9 Fig.4.5 PDFs of Beta(1.7,10) and Beta(10,1.7) ...............................................24
      10 Fig.5.1 The time axis of 4 bargain days of a week ..........................................28
      11 Fig.5.2 The time axis of 5 bargain days of a week ..........................................29
      12 Fig.5.3 Representation of Semiconductor 97 on the first factorial plane by
      using Method1 .....................................................................................31
      13 Fig.5.4 Representation small companies of Semiconductor 97 on the first
      factorial plane by using Method1 ........................................................32
      14 Fig.5.5 Representation of Semiconductor 97 on the first factorial plane by
      using Method1 and Method2 ...............................................................34
      15 Fig.5.6 Representation of TSEC Taiwan 50 index 97 on the first factorial
      plane by using Method1 ......................................................................37
      16 Fig.7.1 The loadings of Mehod1 for Semiconductor ......................................75
      17 Fig.7.2 The loadings of Mehod2 for Semiconductor ......................................75
      18 Fig.7.3 The loadings of Mehod3 for Semiconductor ......................................75
      19 Fig.7.4 The loadings of Mehod1 for TSEC Taiwan 50 index .........................76
      20 Fig.7.5 The loadings of Mehod2 for TSEC Taiwan 50 index .........................76
      21 Fig.7.6 The loadings of Mehod3 for TSEC Taiwan 50 index .........................76
      22 Fig.7.7 Representation of Semiconductor 97 on the first factorial plane by
     using Method2 and Method3 ...............................................................77
      23 Fig.7.8 Representation of TSEC Taiwan 50 index 97 on the first factorial
     plane by using Method2 and Method3 ................................................77
zh_TW
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0096354017en_US
dc.subject (關鍵詞) 主成份分析zh_TW
dc.subject (關鍵詞) 區間型資料zh_TW
dc.subject (關鍵詞) 時間相關zh_TW
dc.subject (關鍵詞) Principal component analysisen_US
dc.subject (關鍵詞) Interval dataen_US
dc.subject (關鍵詞) Time dependenten_US
dc.title (題名) "Spaghetti "主成份分析之延伸-應用於時間相關之區間型台灣股價資料zh_TW
dc.title (題名) An extension of Spaghetti PCA for time dependent interval dataen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) References
     [1] Cazes, P., 1997. Extension de l’analyse en composantes principales à des données
     de type intervalle. Revue de Statistique Appliquée, XLV (3), 5-24.
     [2] Chiu, T.C., 2009. A study of Spaghetti PCA for time dependent interval data.
     [3] Diday, E., Lechevallier, Y. & Opitz, O. (eds.), 1996. Ordinal and Symbolic Data
     Analysis. Springer, ISBN 3-540-61081-2; pp. 372, DM 135.00
     [4] Diday, E. & Esposito, F., 2003. An introduction to Symbolic Data Analysis and
     the SODAS software. Intelligent Data Analysis, 7(6), 583–602, IOS Press.
     [5] D’Urso, P. & Giordani, P., 2004. A least squares approach to principal
     component analysis for interval valued data. Giordani / Chemometrics and
     Intelligent Laboratory Systems, 70, 179–192.
     [6] Goupil, F., Touati, M., Diday, E. & Van Der Veen, H., 2000. Symbolic Analysis
     of Financial Data. (1) LISE-CEREMADE, Université Paris IX Dauphine,
     CNRS UMR 7534. Place du Mal de Lattre de Tassigny, 75775 PARIS
     CEDEX 16. (2) ING ITC ITR, Postbus 1800, 1000 BV Amsterdam
     [7] Gioia, F. & Lauro, N.C., 2005. Basic statistical methods for interval data.
     Statistica Applicata [Italian Journal of Applied Statistics] 17(1): 75-104.
     [8] Gioia, F. & Lauro, N.C., 2006. Principal component analysis on interval
     data. Computational Statistics, Volume 21, Issue 2. Pages: 343 –363.
     [9] Irpino, A., 2006. “Spaghetti” PCA analysis: An extension of principal components
     analysis to time dependent interval data. Pattern Recognition Letters 27,
     504-513
     [10] Lauro, N.C. & Palumbo, F., 2000. Principal components analysis of interval data
     : A symbolic data analysis approach. Comput. Stat. 15 (1), 79-87.
     [11] Lauro, N.C. & Palumbo, F., 2003. A PCA for interval-valued data based on
     midpoints and radii. In: New developments in Psychometrics, Yanai H., Okada A., Shigemasu K.,Kano Y. and Meulman J., eds., Psychometric Society,
     pp. 641-648, Springer, Tokyo.
     [12] Lauro, C.N. & Palumbo, F., 2003. Some results and new perspectives in
     Principal Component Analysis for interval data. 237–244. Cladag Book Short
     Papers.
     [13] Zuccolotto, P., 2007. Principal components of sample estimates: an approach
     through symbolic data analysis. Stat Meth & Appl, 16: 173–192
zh_TW