學術產出-期刊論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

題名 Evaluating reliability of tree-patterns in extreme-K categorical samples problems
作者 周珮婷
Chou, Elizabeth
Hsieh, Yin-Chen
Enriquez, Sabrina
Hsieh, Fushing
貢獻者 統計系
關鍵詞 Extreme-K;exploratory data analysis;hierarchical clustering;ANOVA
日期 2021-07
上傳時間 2022-04-12
摘要 Exploratory Data Analysis (EDA) approaches are adopted to address the difficult extreme-K categorical sample problem. Due to observed data`s categorical nature, all comparisons among populations are performed by comparing their distributions in the form of a histogram with symbolic bins. A distance measure is designed to evaluate the discrepancy between two symbol-based histograms to facilitate Hierarchical Clustering (HC) algorithms. The resultant binary HC-tree then serves as the basis for our EDA task of discovering tree-patterns of interest. Since each population-leaf`s location within a binary HC-tree`s geometry is expressed through a binary code sequence, a binary code segment characterizes all commonly shared tree-patterns for all members. We then generate a large ensemble of mimicries of the observed dataset based on multinomial distributions and construct a large ensemble of binary HC-trees. Upon each identified tree-pattern which we determined based on the observed dataset, we evaluate its reliability and uncertainty through two histograms.
關聯 Journal of Statistical Computation and Simulation, Vol.91, No.18, pp.3828-3849
資料類型 article
DOI https://doi.org/10.1080/00949655.2021.1951266
dc.contributor 統計系
dc.creator (作者) 周珮婷
dc.creator (作者) Chou, Elizabeth
dc.creator (作者) Hsieh, Yin-Chen
dc.creator (作者) Enriquez, Sabrina
dc.creator (作者) Hsieh, Fushing
dc.date (日期) 2021-07
dc.date.accessioned 2022-04-12-
dc.date.available 2022-04-12-
dc.date.issued (上傳時間) 2022-04-12-
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/139849-
dc.description.abstract (摘要) Exploratory Data Analysis (EDA) approaches are adopted to address the difficult extreme-K categorical sample problem. Due to observed data`s categorical nature, all comparisons among populations are performed by comparing their distributions in the form of a histogram with symbolic bins. A distance measure is designed to evaluate the discrepancy between two symbol-based histograms to facilitate Hierarchical Clustering (HC) algorithms. The resultant binary HC-tree then serves as the basis for our EDA task of discovering tree-patterns of interest. Since each population-leaf`s location within a binary HC-tree`s geometry is expressed through a binary code sequence, a binary code segment characterizes all commonly shared tree-patterns for all members. We then generate a large ensemble of mimicries of the observed dataset based on multinomial distributions and construct a large ensemble of binary HC-trees. Upon each identified tree-pattern which we determined based on the observed dataset, we evaluate its reliability and uncertainty through two histograms.
dc.format.extent 148 bytes-
dc.format.mimetype text/html-
dc.relation (關聯) Journal of Statistical Computation and Simulation, Vol.91, No.18, pp.3828-3849
dc.subject (關鍵詞) Extreme-K;exploratory data analysis;hierarchical clustering;ANOVA
dc.title (題名) Evaluating reliability of tree-patterns in extreme-K categorical samples problems
dc.type (資料類型) article
dc.identifier.doi (DOI) 10.1080/00949655.2021.1951266
dc.doi.uri (DOI) https://doi.org/10.1080/00949655.2021.1951266