學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 dataSDA: 用於象徵型資料分析的資料集之 R 套件
dataSDA: Data Sets for Symbolic Data Analysis in R
作者 陳柏維
Chen, Po-Wei
貢獻者 吳漢銘
Wu, Han-Ming
陳柏維
Chen, Po-Wei
關鍵詞 區間值資料
直方圖值資料
R 套件
象徵型資料分析
interval-valued data
histogram-valued data
R package
symbolic data analysis
日期 2023
上傳時間 1-Feb-2024 11:41:51 (UTC+8)
摘要 在傳統資料集的範疇下,分析對象通常被局限於由單一觀察值構成的資料集合。然而,隨著資料的量與複雜性持續增加,資料收集已變得更為龐大和多樣化。為了更加有效地整合管理資料並保留其中蘊含的關鍵資訊,資料收集的變數格式已經超越了單一數值,轉而採用了包含區間、直方圖、機率分佈等在內的多值描述方式,這種資料描述形式被稱作「象徵型資料」。通過這種描述方式,我們能更全面地掌握資料的分佈、特性和變異性,有助於進一步的數據分析和解釋。本研究開發了一個名為 dataSDA 的 R 語言套件。這個套件的主要目標是針對不同的研究主題來收集各種象徵型資料,並進行不同格式的象徵型資料的讀取、寫出及轉換,以及計算象徵型資料的描述性統計量。此套件參考了當前廣泛使用的象徵型資料套件 RSDA 和 HistDAWass的格式架構,並在功能上進行了擴展,例如,從傳統資料依不同條件整合出一象徵型資料。我們利用 dataSDA 套件中的資料集進行了分群、分類和迴歸分析的演示和比較。我們相信,dataSDA 作為一個象徵型資料的收集和處理工具,能夠成為一個重要的象徵型資料來源,並能有效地協助使用者深入象徵型資料分析研究領域,進一步發展象徵型資料的分析方法。dataSDA 套件已發佈在 the Comprehensive R Archive Network (CRAN) 供人下載使用。
Within the context of traditional datasets, the subjects of analysis are typically restricted to data collections composed of singular values of variables. However, as the volume and complexity of data continue to grow, data collection has become increasingly vast and diverse. To more effectively consolidate and manage data while preserving the essential information it contains, the format of data variables has evolved beyond singular values. Instead, it now adopts multivalued descriptive methods that encompass intervals, histograms, and probability distributions. This representation of data is termed ”symbolic data.” Through this descriptive method, we can gain a more comprehensive grasp of the data’s distribution, characteristics, and variability, facilitating further data analysis and interpretation. This study introduced an R package named dataSDA. The primary aim of this package is to gather various symbolic data tailored to different research themes, and to execute the reading, writing, and conversion of symbolic data in diverse formats, as well as compute the descriptive statistics of symbolic variables. This package draws inspiration from the structural framework of widely-used symbolic data packages, RSDA and HistDAWass, and has expanded its functionalities such as generating symbolic data by aggregation of the conventional data. We utilized benchmark datasets within the dataSDA package to demonstrate and compare clustering, classification, and regression analyses in R. We believe that dataSDA, serving as a tool for the collection and processing of symbolic data, can stand as a pivotal source for symbolic data. It holds the potential to effectively guide users deeper into the realm of symbolic data analysis research, fostering the development of analytical methods for symbolic data. The dataSDA package is currently available on the Comprehensive R Archive Network (CRAN).
參考文獻 [1] Bean B. Intkrige: a numerical implementation of interval-valued kriging. R package version 1.0.1;2020. [2] Bean B, Maguire M, Sun Y. The Utah snow load study. Civil and Environmental Engineering Faculty Publications. 2018; Paper 3589. [3] Bertrand P, Goupil F. Descriptive statistics for symbolic data. In: Analysis of Symbolic Data, Bock HH, Diday E. (eds). Springer, Berlin, Heidelberg. 2000;106–124. [4] Billard L. Dependencies and variation components of symbolic interval-valued data. In: Selected contributions in data analysis and classification. Springer. 2007;3–12. [5] Billard L. Sample covariance functions for complex quantitative data. In: Proceedings of World IASC Conference, Yokohama, Japan. 2008;157–163. [6] Billard L, Diday E. Regression analysis for interval-valued data. In: Data Analysis, Classification, and Related Methods. Springer. 2000;369–374. [7] Billard L, Diday E. From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc. 2003;98(462):470–487. [8] Billard L, Diday E. Symbolic Data Analysis: Conceptual Statistics and Data Mining. John Wiley & Sons, Ltd; 2007. [9] Bock HH, Diday E. Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data, Springer. 2000. [10] Borcard D, Gillet F, Legendre P. Numerical Ecology with R. Springer New York; 2011. [11] Borysov SS, Geilhufe RM, Balatsky AV. Organic materials database: An open access online database for data mining. PLoS ONE. 2017;12(2): e0171501. [12] Brito P, Duarte Silva AP. Modelling interval data with normal and skew-normal distributions. J. Appl. Stat. 2012;39(1):3–20. [13] Cazes P, Chouakria A, Diday E, Schecktman Y. Extension de l’analyse en composantes principales ’a des donn’ees de type intervalle. Rev Stat Appl. 1997;45, 5–24. [14] Chiang K, Shu J, Zempleni J, Cui J. Dietary microRNA database (DMD): an archive database and analytic tool for food-borne microRNAs. PLoS ONE. 2015;10(6):e0128089. [15] Chouakria A. Extension de l’analyse en composantes principales ’a des donn’ees de type intervalle.” Doctoral Thesis;University of Paris IX Dauphine; 1998. [16] Chouakria A, Cazes P, Diday E. Symbolic principal component analysis,” In: Analysis of Symbolic Data, Bock HH, Diday E (eds). Berlin, Springer-Verlag; 2000. [17] Dau HA, Keogh E, et al. The UCR time series classification archive. 2019. URL https://www.cs.ucr.edu/ eamonn/time_series_data_2018/ [18] De Carvalho FdA. Fuzzy c-means clustering methods for symbolic interval data. Pattern Recognit. Lett. 2007;28(4):423–437. [19] DeCarvalho, FdA, Lechevallier Y. Partitional clustering algorithms for symbolic interval data based on single adaptive distances. Pattern Recognit. 2009;42(7):1223–1236. [20] Denoeux T, Masson M. Multidimensional scaling of interval-valued dissimilarity data. Pattern Recognit. Lett. 2000;21(1):83–92. [21] Douzal-Chouakria A, Billard L, Diday E. Principal component analysis for interval-valued observations. Stat Anal Data Min. 2011;4(2):229–246. [22] Diday E. The symbolic approach in clustering and related methods of data analysis: the basic choices. In: Classification and Related Methods of Data Analysis, Proceedings of the First Conference of the International Federation of Classification Societies. IFCS-87: Technical University of Aachen. North Holland. 1988;673–684. [23] Diday E, Noirhomme-Fraiture M. Symbolic Data Analysis and the SODAS Software, Wiley-Interscience.; 2008. [24] D’Urso P, Giordani P. A least squares approach to principal component analysis for interval valued data. Chem Intell Lab Syst. 2004;70:179–192. [25] Kelly M, Longjohn R, Nottingham K, The UCI Machine Learning Repository. 2023; https://archive.ics.uci.edu [26] Garcia J. IntervalQuestionStat: tools to deal with interval-valued responses in questionnaires. R package version 0.1.0; 2022. [27] Gilchrist W. Statistical Modelling with Quantile Functions. Chapman & Hall; 2000. [28] Gioia F, Lauro NC, Principal component analysis on interval data. Comput. Stat. 2006;21:343–363. [29] Groenen PJF, Winsberg S, Rodriguez O, Diday E. I-Scal: multidimensional scaling of interval dissimilarities. Comput Stat Data Anal. 2006;51(1):360–378. [30] Grzegorzewski P, Śpiewak M. The sign test and the signed-rank test for interval-valued data. Int. J. Intell. Syst. 2019;34(9):2122–2150. [31] Hastie T, Tibshirani R, Friedman JH. The Elements of Statistical Learning. Springer (2nd edition); 2009. [32] Hayes B, A lucid interval. Am. Sci. 2003;91(6):484–488. [33] Henderson HV, Velleman PF. Building multiple regression models interactively. Biometrics. 1981;37(2):391–411. [34] Ichino M. The quantile method for symbolic principal component analysis. Stat Anal Data Min. 2011;4(2):184–198. [35] Irpino A. ”Spaghetti” PCA analysis: an extension of principal components analysis to time dependent interval data. Pattern Recognit. Lett. 2006;27:504–513. [36] Irpino A, Verde R. A new Wasserstein-based distance for the hierarchical clustering of histogram symbolic data. In: Data Science and Classification. Studies in Classification, Data Analysis, and Knowledge Organization, Batagelj V, Bock HH, Ferligoj A, Žiberna A. (eds). Springer, Berlin, Heidelberg. 2006;185–192. [37] Irpino A, Verde R. Basic statistics for distributional symbolic variables: a new metric-based approach. Adv Data Anal Classif. 2015;9:143–175. [38] Irpino A, Verde R, De Carvalho FdA. Dynamic clustering of histogram data based on adaptive squared Wasserstein distances. Expert Systems with Applications. 2014;41(7):3351–3366. [39] Kao CH, Nakano J, Shieh SH, Tien YJ, Wu HM, Yang CK, Chen CH. Exploratory data analysis of interval-valued symbolic data with matrix visualiza tion. Comput Stat Data Anal. 2014;79:14–29. [40] Kapoor P, Singh H, Gautam A, Chaudhary K, Kumar R, Raghava GPS. TumorHoPe: A database of tumor homing peptides. PLoS ONE. 2012;7(4):e35187. [41] Lauro CN, Palumbo F. Principal component analysis of interval data: a symbolic analysis approach. Comput. Stat. 2000;15(1):73–87. [42] Lauro CN, Gioia F. Dependence and interdependence analysis for interval-valued variables. In: Data Science and Classification, Batagelj V, HBock HH, Ferligoj A, Ziberna A (eds). Berlin, Springer-Verlag. 2006;171–183. [43] Lauro NC, Verde R, Irpino A. Principal component analysis of symbolic data described by intervals. In: Symbolic Data Analysis and the SODAS Software, Diday E, Noirhomme-Fraiture M (eds). Wiley, Chichester. 2008;279–311. [44] Lauro NC, Verde R, Palumbo F. Factorial data analysis on symbolic objects under cohesion constrains. In: Data Analysis, Classification and Related Methods. Springer-Verlag, Heidelberg; 2000. [45] Lee JA, Verleysen M. Quality assessment of dimensionality reduction: Rank-based criteria. Neurocomputing. 2009;72:1431–1443. [46] Lee JA, Verleysen M. Quality assessment of nonlinear dimensionality reduction based on K-ary neighborhoods. JMLR: Workshop and Conference Proceedings. 2008;4: 21–35. [47] Lee JA, Verleysen M. Scale-independent quality criteria for dimensionality reduction. Pattern Recognit. Lett. 2010;31:2248–2257. [48] Leroy B, Chouakria A, Herlin I, Diday E. Approche geometrique et classification pour la reconnaissance de visage, Reconnaissance des Forms et Intelligence Artificelle, INRIA and IRISA and CNRS, France. 1996;548–557. [49] Le-Rademacher J, Billard L. Symbolic covariance principal component analysis and visualization for interval-valued data. J Comput Graph Stat. 2012;21(2):413 -–432. [50] Meng D, Leung Y, Xu Z. A new quality assessment criterion for nonlinear dimensionality reduction. Neurocomputing. 2011;74:941–948. [51] Mokbel B, Lueks W, Gisbrecht A, Hammer B. Visualizing the quality of dimensionality reduction. Neurocomputing. 2013;112:109–123 [52] Neto EAL, Cordeiro GM, de Carvalho FdA. Bivariate symbolic regression models for interval - valued variables. J Stat Comput Simul. 2011;81(11):1727–1744. [53] Neto EAL, de Carvalho FdA. Centre and range method for fitting a linear regression model to symbolic interval data. Comput Stat Data Anal. 2008;52(3):1500–1515. [54] Palumbo F, Lauro CN. A PCA for interval valued data based on midpoints and radii, In: New Developments in Psychometrics, Yanai H, Okada A, Shigematu K, Kano Y, Meulman JJ (eds). Japan, Springer-Verlag. 2003;641–648. [55] Rüschendorf L. Wasserstein metric. In: Encyclopaedia of Mathematics, Hazewinkel M (ed), Springer; 2001. [56] Silva APD, Brito P, Discriminant analysis of interval data: An assessment of parametric and distance-based approaches. J. Classif. 2015;32:516–541. [57] Silva APD, Brito P, Filzmoser P, Dias JG. MAINT.Data: modelling and analysing interval data in R. The R Journal. 2021;13(2):336–364. [58] Tenenbaum JB, de Silva V, Langford JC. A global geometric framework for nonlinear dimensionality reduction. Science. 2000;290:2319–2323. [59] Umbleja K, Ichino M, Yaguchi H. Improving symbolic data visualization for pattern recognition and knowledge discovery. Visual Informatics. 2020;4(1):23–31. [60] Verde R, Irpino A. Dynamic clustering of histogram data: using the right metric. In: Selected Contributions in Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization, Brito P, Cucumel G, Bertrand P, de Carvalho F. (eds). Springer, Berlin, Heidelberg. 2007;123–134. [61] Wang H, Guan R, Wu J. CIPCA: Complete-information-based Principal Component Analysis for interval-valued data. Neurocomputing. 2012;86:158–169. [62] Wickham et al. Welcome to the Tidyverse. Journal of Open Source Software. 2019;4(43):1686. [63] Xu W. Symbolic Data Analysis: Interval-valued Data Regression. PhD thesis, University of Georgia Athens, GA; 2010. [64] Zhang P, Ren Y, Zhang B. A new embedding quality assessment method for manifold learning. Neurocomputing. 2012;97:251–266.
描述 碩士
國立政治大學
統計學系
111354013
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0111354013
資料類型 thesis
dc.contributor.advisor 吳漢銘zh_TW
dc.contributor.advisor Wu, Han-Mingen_US
dc.contributor.author (Authors) 陳柏維zh_TW
dc.contributor.author (Authors) Chen, Po-Weien_US
dc.creator (作者) 陳柏維zh_TW
dc.creator (作者) Chen, Po-Weien_US
dc.date (日期) 2023en_US
dc.date.accessioned 1-Feb-2024 11:41:51 (UTC+8)-
dc.date.available 1-Feb-2024 11:41:51 (UTC+8)-
dc.date.issued (上傳時間) 1-Feb-2024 11:41:51 (UTC+8)-
dc.identifier (Other Identifiers) G0111354013en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/149650-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計學系zh_TW
dc.description (描述) 111354013zh_TW
dc.description.abstract (摘要) 在傳統資料集的範疇下,分析對象通常被局限於由單一觀察值構成的資料集合。然而,隨著資料的量與複雜性持續增加,資料收集已變得更為龐大和多樣化。為了更加有效地整合管理資料並保留其中蘊含的關鍵資訊,資料收集的變數格式已經超越了單一數值,轉而採用了包含區間、直方圖、機率分佈等在內的多值描述方式,這種資料描述形式被稱作「象徵型資料」。通過這種描述方式,我們能更全面地掌握資料的分佈、特性和變異性,有助於進一步的數據分析和解釋。本研究開發了一個名為 dataSDA 的 R 語言套件。這個套件的主要目標是針對不同的研究主題來收集各種象徵型資料,並進行不同格式的象徵型資料的讀取、寫出及轉換,以及計算象徵型資料的描述性統計量。此套件參考了當前廣泛使用的象徵型資料套件 RSDA 和 HistDAWass的格式架構,並在功能上進行了擴展,例如,從傳統資料依不同條件整合出一象徵型資料。我們利用 dataSDA 套件中的資料集進行了分群、分類和迴歸分析的演示和比較。我們相信,dataSDA 作為一個象徵型資料的收集和處理工具,能夠成為一個重要的象徵型資料來源,並能有效地協助使用者深入象徵型資料分析研究領域,進一步發展象徵型資料的分析方法。dataSDA 套件已發佈在 the Comprehensive R Archive Network (CRAN) 供人下載使用。zh_TW
dc.description.abstract (摘要) Within the context of traditional datasets, the subjects of analysis are typically restricted to data collections composed of singular values of variables. However, as the volume and complexity of data continue to grow, data collection has become increasingly vast and diverse. To more effectively consolidate and manage data while preserving the essential information it contains, the format of data variables has evolved beyond singular values. Instead, it now adopts multivalued descriptive methods that encompass intervals, histograms, and probability distributions. This representation of data is termed ”symbolic data.” Through this descriptive method, we can gain a more comprehensive grasp of the data’s distribution, characteristics, and variability, facilitating further data analysis and interpretation. This study introduced an R package named dataSDA. The primary aim of this package is to gather various symbolic data tailored to different research themes, and to execute the reading, writing, and conversion of symbolic data in diverse formats, as well as compute the descriptive statistics of symbolic variables. This package draws inspiration from the structural framework of widely-used symbolic data packages, RSDA and HistDAWass, and has expanded its functionalities such as generating symbolic data by aggregation of the conventional data. We utilized benchmark datasets within the dataSDA package to demonstrate and compare clustering, classification, and regression analyses in R. We believe that dataSDA, serving as a tool for the collection and processing of symbolic data, can stand as a pivotal source for symbolic data. It holds the potential to effectively guide users deeper into the realm of symbolic data analysis research, fostering the development of analytical methods for symbolic data. The dataSDA package is currently available on the Comprehensive R Archive Network (CRAN).en_US
dc.description.tableofcontents Chapter 1 Introduction 1 Chapter 2 Package design and symbolic data manipulation 5 2.1 The dataSDA package design 5 2.2 R functions for reading, writing, and converting symbolic data 6 2.3 An example: the conversion of interval-valued datasets into the symbolic_tbl class 9 2.4 Other functions in dataSDA 15 Chapter 3 Descriptive statistics for symbolic data 17 3.1 Quantification approaches 18 3.2 Distributional approaches 19 3.3 Descriptive statistics for histogram-valued data 22 Chapter 4 A benchmarking study 30 4.1 Cluster analysis 30 4.2 Classification for interval-valued data 34 4.3 Regression Analysis for interval-valued data 35 Chapter 5 Conclusion and future development 38 Reference 41zh_TW
dc.format.extent 667116 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0111354013en_US
dc.subject (關鍵詞) 區間值資料zh_TW
dc.subject (關鍵詞) 直方圖值資料zh_TW
dc.subject (關鍵詞) R 套件zh_TW
dc.subject (關鍵詞) 象徵型資料分析zh_TW
dc.subject (關鍵詞) interval-valued dataen_US
dc.subject (關鍵詞) histogram-valued dataen_US
dc.subject (關鍵詞) R packageen_US
dc.subject (關鍵詞) symbolic data analysisen_US
dc.title (題名) dataSDA: 用於象徵型資料分析的資料集之 R 套件zh_TW
dc.title (題名) dataSDA: Data Sets for Symbolic Data Analysis in Ren_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] Bean B. Intkrige: a numerical implementation of interval-valued kriging. R package version 1.0.1;2020. [2] Bean B, Maguire M, Sun Y. The Utah snow load study. Civil and Environmental Engineering Faculty Publications. 2018; Paper 3589. [3] Bertrand P, Goupil F. Descriptive statistics for symbolic data. In: Analysis of Symbolic Data, Bock HH, Diday E. (eds). Springer, Berlin, Heidelberg. 2000;106–124. [4] Billard L. Dependencies and variation components of symbolic interval-valued data. In: Selected contributions in data analysis and classification. Springer. 2007;3–12. [5] Billard L. Sample covariance functions for complex quantitative data. In: Proceedings of World IASC Conference, Yokohama, Japan. 2008;157–163. [6] Billard L, Diday E. Regression analysis for interval-valued data. In: Data Analysis, Classification, and Related Methods. Springer. 2000;369–374. [7] Billard L, Diday E. From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc. 2003;98(462):470–487. [8] Billard L, Diday E. Symbolic Data Analysis: Conceptual Statistics and Data Mining. John Wiley & Sons, Ltd; 2007. [9] Bock HH, Diday E. Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data, Springer. 2000. [10] Borcard D, Gillet F, Legendre P. Numerical Ecology with R. Springer New York; 2011. [11] Borysov SS, Geilhufe RM, Balatsky AV. Organic materials database: An open access online database for data mining. PLoS ONE. 2017;12(2): e0171501. [12] Brito P, Duarte Silva AP. Modelling interval data with normal and skew-normal distributions. J. Appl. Stat. 2012;39(1):3–20. [13] Cazes P, Chouakria A, Diday E, Schecktman Y. Extension de l’analyse en composantes principales ’a des donn’ees de type intervalle. Rev Stat Appl. 1997;45, 5–24. [14] Chiang K, Shu J, Zempleni J, Cui J. Dietary microRNA database (DMD): an archive database and analytic tool for food-borne microRNAs. PLoS ONE. 2015;10(6):e0128089. [15] Chouakria A. Extension de l’analyse en composantes principales ’a des donn’ees de type intervalle.” Doctoral Thesis;University of Paris IX Dauphine; 1998. [16] Chouakria A, Cazes P, Diday E. Symbolic principal component analysis,” In: Analysis of Symbolic Data, Bock HH, Diday E (eds). Berlin, Springer-Verlag; 2000. [17] Dau HA, Keogh E, et al. The UCR time series classification archive. 2019. URL https://www.cs.ucr.edu/ eamonn/time_series_data_2018/ [18] De Carvalho FdA. Fuzzy c-means clustering methods for symbolic interval data. Pattern Recognit. Lett. 2007;28(4):423–437. [19] DeCarvalho, FdA, Lechevallier Y. Partitional clustering algorithms for symbolic interval data based on single adaptive distances. Pattern Recognit. 2009;42(7):1223–1236. [20] Denoeux T, Masson M. Multidimensional scaling of interval-valued dissimilarity data. Pattern Recognit. Lett. 2000;21(1):83–92. [21] Douzal-Chouakria A, Billard L, Diday E. Principal component analysis for interval-valued observations. Stat Anal Data Min. 2011;4(2):229–246. [22] Diday E. The symbolic approach in clustering and related methods of data analysis: the basic choices. In: Classification and Related Methods of Data Analysis, Proceedings of the First Conference of the International Federation of Classification Societies. IFCS-87: Technical University of Aachen. North Holland. 1988;673–684. [23] Diday E, Noirhomme-Fraiture M. Symbolic Data Analysis and the SODAS Software, Wiley-Interscience.; 2008. [24] D’Urso P, Giordani P. A least squares approach to principal component analysis for interval valued data. Chem Intell Lab Syst. 2004;70:179–192. [25] Kelly M, Longjohn R, Nottingham K, The UCI Machine Learning Repository. 2023; https://archive.ics.uci.edu [26] Garcia J. IntervalQuestionStat: tools to deal with interval-valued responses in questionnaires. R package version 0.1.0; 2022. [27] Gilchrist W. Statistical Modelling with Quantile Functions. Chapman & Hall; 2000. [28] Gioia F, Lauro NC, Principal component analysis on interval data. Comput. Stat. 2006;21:343–363. [29] Groenen PJF, Winsberg S, Rodriguez O, Diday E. I-Scal: multidimensional scaling of interval dissimilarities. Comput Stat Data Anal. 2006;51(1):360–378. [30] Grzegorzewski P, Śpiewak M. The sign test and the signed-rank test for interval-valued data. Int. J. Intell. Syst. 2019;34(9):2122–2150. [31] Hastie T, Tibshirani R, Friedman JH. The Elements of Statistical Learning. Springer (2nd edition); 2009. [32] Hayes B, A lucid interval. Am. Sci. 2003;91(6):484–488. [33] Henderson HV, Velleman PF. Building multiple regression models interactively. Biometrics. 1981;37(2):391–411. [34] Ichino M. The quantile method for symbolic principal component analysis. Stat Anal Data Min. 2011;4(2):184–198. [35] Irpino A. ”Spaghetti” PCA analysis: an extension of principal components analysis to time dependent interval data. Pattern Recognit. Lett. 2006;27:504–513. [36] Irpino A, Verde R. A new Wasserstein-based distance for the hierarchical clustering of histogram symbolic data. In: Data Science and Classification. Studies in Classification, Data Analysis, and Knowledge Organization, Batagelj V, Bock HH, Ferligoj A, Žiberna A. (eds). Springer, Berlin, Heidelberg. 2006;185–192. [37] Irpino A, Verde R. Basic statistics for distributional symbolic variables: a new metric-based approach. Adv Data Anal Classif. 2015;9:143–175. [38] Irpino A, Verde R, De Carvalho FdA. Dynamic clustering of histogram data based on adaptive squared Wasserstein distances. Expert Systems with Applications. 2014;41(7):3351–3366. [39] Kao CH, Nakano J, Shieh SH, Tien YJ, Wu HM, Yang CK, Chen CH. Exploratory data analysis of interval-valued symbolic data with matrix visualiza tion. Comput Stat Data Anal. 2014;79:14–29. [40] Kapoor P, Singh H, Gautam A, Chaudhary K, Kumar R, Raghava GPS. TumorHoPe: A database of tumor homing peptides. PLoS ONE. 2012;7(4):e35187. [41] Lauro CN, Palumbo F. Principal component analysis of interval data: a symbolic analysis approach. Comput. Stat. 2000;15(1):73–87. [42] Lauro CN, Gioia F. Dependence and interdependence analysis for interval-valued variables. In: Data Science and Classification, Batagelj V, HBock HH, Ferligoj A, Ziberna A (eds). Berlin, Springer-Verlag. 2006;171–183. [43] Lauro NC, Verde R, Irpino A. Principal component analysis of symbolic data described by intervals. In: Symbolic Data Analysis and the SODAS Software, Diday E, Noirhomme-Fraiture M (eds). Wiley, Chichester. 2008;279–311. [44] Lauro NC, Verde R, Palumbo F. Factorial data analysis on symbolic objects under cohesion constrains. In: Data Analysis, Classification and Related Methods. Springer-Verlag, Heidelberg; 2000. [45] Lee JA, Verleysen M. Quality assessment of dimensionality reduction: Rank-based criteria. Neurocomputing. 2009;72:1431–1443. [46] Lee JA, Verleysen M. Quality assessment of nonlinear dimensionality reduction based on K-ary neighborhoods. JMLR: Workshop and Conference Proceedings. 2008;4: 21–35. [47] Lee JA, Verleysen M. Scale-independent quality criteria for dimensionality reduction. Pattern Recognit. Lett. 2010;31:2248–2257. [48] Leroy B, Chouakria A, Herlin I, Diday E. Approche geometrique et classification pour la reconnaissance de visage, Reconnaissance des Forms et Intelligence Artificelle, INRIA and IRISA and CNRS, France. 1996;548–557. [49] Le-Rademacher J, Billard L. Symbolic covariance principal component analysis and visualization for interval-valued data. J Comput Graph Stat. 2012;21(2):413 -–432. [50] Meng D, Leung Y, Xu Z. A new quality assessment criterion for nonlinear dimensionality reduction. Neurocomputing. 2011;74:941–948. [51] Mokbel B, Lueks W, Gisbrecht A, Hammer B. Visualizing the quality of dimensionality reduction. Neurocomputing. 2013;112:109–123 [52] Neto EAL, Cordeiro GM, de Carvalho FdA. Bivariate symbolic regression models for interval - valued variables. J Stat Comput Simul. 2011;81(11):1727–1744. [53] Neto EAL, de Carvalho FdA. Centre and range method for fitting a linear regression model to symbolic interval data. Comput Stat Data Anal. 2008;52(3):1500–1515. [54] Palumbo F, Lauro CN. A PCA for interval valued data based on midpoints and radii, In: New Developments in Psychometrics, Yanai H, Okada A, Shigematu K, Kano Y, Meulman JJ (eds). Japan, Springer-Verlag. 2003;641–648. [55] Rüschendorf L. Wasserstein metric. In: Encyclopaedia of Mathematics, Hazewinkel M (ed), Springer; 2001. [56] Silva APD, Brito P, Discriminant analysis of interval data: An assessment of parametric and distance-based approaches. J. Classif. 2015;32:516–541. [57] Silva APD, Brito P, Filzmoser P, Dias JG. MAINT.Data: modelling and analysing interval data in R. The R Journal. 2021;13(2):336–364. [58] Tenenbaum JB, de Silva V, Langford JC. A global geometric framework for nonlinear dimensionality reduction. Science. 2000;290:2319–2323. [59] Umbleja K, Ichino M, Yaguchi H. Improving symbolic data visualization for pattern recognition and knowledge discovery. Visual Informatics. 2020;4(1):23–31. [60] Verde R, Irpino A. Dynamic clustering of histogram data: using the right metric. In: Selected Contributions in Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization, Brito P, Cucumel G, Bertrand P, de Carvalho F. (eds). Springer, Berlin, Heidelberg. 2007;123–134. [61] Wang H, Guan R, Wu J. CIPCA: Complete-information-based Principal Component Analysis for interval-valued data. Neurocomputing. 2012;86:158–169. [62] Wickham et al. Welcome to the Tidyverse. Journal of Open Source Software. 2019;4(43):1686. [63] Xu W. Symbolic Data Analysis: Interval-valued Data Regression. PhD thesis, University of Georgia Athens, GA; 2010. [64] Zhang P, Ren Y, Zhang B. A new embedding quality assessment method for manifold learning. Neurocomputing. 2012;97:251–266.zh_TW