dc.contributor | 統計系 | |
dc.creator (作者) | 吳漢銘 | |
dc.creator (作者) | Wu, Han-Ming | |
dc.date (日期) | 2021-07 | |
dc.date.accessioned | 2022-04-12 | - |
dc.date.available | 2022-04-12 | - |
dc.date.issued (上傳時間) | 2022-04-12 | - |
dc.identifier.uri (URI) | http://nccur.lib.nccu.edu.tw/handle/140.119/139857 | - |
dc.description.abstract (摘要) | Gene expression data such as those obtained from the hybridization microarray, the serial analysis of gene expression (SAGE) and/or RNA-Seq is being used to study a phenotypic response of interest. It is often characterized by a large amount of genes but with limited samples. Also, a priori knowledge of genes such as the functional and/or curated annotations is accumulated and available over the years. This study intends to incorporate both the biological knowledge of genes and the information of a discrete phenotypic response of subjects into dimension reduction through the framework of symbolic data analysis (SDA). The proposed approach consists of two steps. Firstly, the concepts of the symbolic data analysis will be used to aggregate the expression levels into functional intervals according to their functional categories. For unknown genes, we perform the gene selection procedures to select fewer genes that differentiate subtypes of a phenotypic response. The selected unknown genes are further aggregated into the intervals. Secondly, the regularized sliced inverse regression for interval-valued data is applied where the information of a phenotypic response of subjects acts as the slices. We illustrate the proposed method using several public gene expression data sets for data visualization and the class prediction. The results are compared with those of the regularized PCA. The results show that the proposed method can achieve better performance in understanding biologically relevant processes of genes and subjects than purely data-driven models. | |
dc.format.extent | 4938629 bytes | - |
dc.format.mimetype | application/pdf | - |
dc.relation (關聯) | Statistics Symposium in Memory of Wen-Chen Chen, 中研院 | |
dc.subject (關鍵詞) | data visualization;interval-valued data;symbolic data analysis;sufficient dimension reduction;gene expression;biological knowledge | |
dc.title (題名) | A symbolic data analysis approach to regularized sliced inverse regression for gene expression data with multiple functional categories and a phenotypic response | |
dc.type (資料類型) | conference | |