Mimicking Complexity of Structured Data Matrix’s Information Content: Categorical Exploratory Data Analysis | NCCU Academic Hub

學術產出-Periodical Articles

Article View/Open

pdf(182)

Publication Export

Google Scholar^TM

政大圖書館

學術資源探索系統

Citation Infomation

Simple Record
Full Record

題名	Mimicking Complexity of Structured Data Matrix’s Information Content: Categorical Exploratory Data Analysis
作者	周珮婷 Chou, Elizabeth P. Hsieh, Fushing Chen, Ting-Li
貢獻者	統計系
關鍵詞	contingency-kD-lattice ; high order structural dependency ; heterogeneity ; mutual conditional entropy matrix ; principal component analysis (PCA)
日期	2021-05
上傳時間	25-Jun-2021 10:17:21 (UTC+8)
摘要	We develop Categorical Exploratory Data Analysis (CEDA) with mimicking to explore and exhibit the complexity of information content that is contained within any data matrix: categorical, discrete, or continuous. Such complexity is shown through visible and explainable serial multiscale structural dependency with heterogeneity. CEDA is developed upon all features’ categorical nature via histogram and it is guided by all features’ associative patterns (order-2 dependence) in a mutual conditional entropy matrix. Higher-order structural dependency of k(≥3) features is exhibited through block patterns within heatmaps that are constructed by permuting contingency-kD-lattices of counts. By growing k, the resultant heatmap series contains global and large scales of structural dependency that constitute the data matrix’s information content. When involving continuous features, the principal component analysis (PCA) extracts fine-scale information content from each block in the final heatmap. Our mimicking protocol coherently simulates this heatmap series by preserving global-to-fine scales structural dependency. Upon every step of mimicking process, each accepted simulated heatmap is subject to constraints with respect to all of the reliable observed categorical patterns. For reliability and robustness in sciences, CEDA with mimicking enhances data visualization by revealing deterministic and stochastic structures within each scale-specific structural dependency. For inferences in Machine Learning (ML) and Statistics, it clarifies, upon which scales, which covariate feature-groups have major-vs.-minor predictive powers on response features. For the social justice of Artificial Intelligence (AI) products, it checks whether a data matrix incompletely prescribes the targeted system.
關聯	Entropy, Vol.23, No.5, pp.594
資料類型	article
DOI	https://doi.org/10.3390/e23050594

dc.contributor	統計系	-
dc.creator (作者)	周珮婷	-
dc.creator (作者)	Chou, Elizabeth P.	-
dc.creator (作者)	Hsieh, Fushing	-
dc.creator (作者)	Chen, Ting-Li	-
dc.date (日期)	2021-05	-
dc.date.accessioned	25-Jun-2021 10:17:21 (UTC+8)	-
dc.date.available	25-Jun-2021 10:17:21 (UTC+8)	-
dc.date.issued (上傳時間)	25-Jun-2021 10:17:21 (UTC+8)	-
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/135889	-
dc.description.abstract (摘要)	We develop Categorical Exploratory Data Analysis (CEDA) with mimicking to explore and exhibit the complexity of information content that is contained within any data matrix: categorical, discrete, or continuous. Such complexity is shown through visible and explainable serial multiscale structural dependency with heterogeneity. CEDA is developed upon all features’ categorical nature via histogram and it is guided by all features’ associative patterns (order-2 dependence) in a mutual conditional entropy matrix. Higher-order structural dependency of k(≥3) features is exhibited through block patterns within heatmaps that are constructed by permuting contingency-kD-lattices of counts. By growing k, the resultant heatmap series contains global and large scales of structural dependency that constitute the data matrix’s information content. When involving continuous features, the principal component analysis (PCA) extracts fine-scale information content from each block in the final heatmap. Our mimicking protocol coherently simulates this heatmap series by preserving global-to-fine scales structural dependency. Upon every step of mimicking process, each accepted simulated heatmap is subject to constraints with respect to all of the reliable observed categorical patterns. For reliability and robustness in sciences, CEDA with mimicking enhances data visualization by revealing deterministic and stochastic structures within each scale-specific structural dependency. For inferences in Machine Learning (ML) and Statistics, it clarifies, upon which scales, which covariate feature-groups have major-vs.-minor predictive powers on response features. For the social justice of Artificial Intelligence (AI) products, it checks whether a data matrix incompletely prescribes the targeted system.	-
dc.format.extent	2820847 bytes	-
dc.format.mimetype	application/pdf	-
dc.relation (關聯)	Entropy, Vol.23, No.5, pp.594	-
dc.subject (關鍵詞)	contingency-kD-lattice ; high order structural dependency ; heterogeneity ; mutual conditional entropy matrix ; principal component analysis (PCA)	-
dc.title (題名)	Mimicking Complexity of Structured Data Matrix’s Information Content: Categorical Exploratory Data Analysis	-
dc.type (資料類型)	article	-
dc.identifier.doi (DOI)	10.3390/e23050594	-
dc.doi.uri (DOI)	https://doi.org/10.3390/e23050594	-