Publications-Proceedings

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 A symbolic data analysis approach to regularized sliced inverse regression for gene expression data with multiple functional categories and a phenotypic response
作者 吳漢銘
Wu, Han-Ming
貢獻者 統計系
關鍵詞 data visualization;interval-valued data;symbolic data analysis;sufficient dimension reduction;gene expression;biological knowledge
日期 2021-07
上傳時間 2022-04-12
摘要 Gene expression data such as those obtained from the hybridization microarray, the serial analysis of gene expression (SAGE) and/or RNA-Seq is being used to study a phenotypic response of interest. It is often characterized by a large amount of genes but with limited samples. Also, a priori knowledge of genes such as the functional and/or curated annotations is accumulated and available over the years. This study intends to incorporate both the biological knowledge of genes and the information of a discrete phenotypic response of subjects into dimension reduction through the framework of symbolic data analysis (SDA). The proposed approach consists of two steps. Firstly, the concepts of the symbolic data analysis will be used to aggregate the expression levels into functional intervals according to their functional categories. For unknown genes, we perform the gene selection procedures to select fewer genes that differentiate subtypes of a phenotypic response. The selected unknown genes are further aggregated into the intervals. Secondly, the regularized sliced inverse regression for interval-valued data is applied where the information of a phenotypic response of subjects acts as the slices. We illustrate the proposed method using several public gene expression data sets for data visualization and the class prediction. The results are compared with those of the regularized PCA. The results show that the proposed method can achieve better performance in understanding biologically relevant processes of genes and subjects than purely data-driven models.
關聯 Statistics Symposium in Memory of Wen-Chen Chen, 中研院
資料類型 conference
dc.contributor 統計系
dc.creator (作者) 吳漢銘
dc.creator (作者) Wu, Han-Ming
dc.date (日期) 2021-07
dc.date.accessioned 2022-04-12-
dc.date.available 2022-04-12-
dc.date.issued (上傳時間) 2022-04-12-
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/139857-
dc.description.abstract (摘要) Gene expression data such as those obtained from the hybridization microarray, the serial analysis of gene expression (SAGE) and/or RNA-Seq is being used to study a phenotypic response of interest. It is often characterized by a large amount of genes but with limited samples. Also, a priori knowledge of genes such as the functional and/or curated annotations is accumulated and available over the years. This study intends to incorporate both the biological knowledge of genes and the information of a discrete phenotypic response of subjects into dimension reduction through the framework of symbolic data analysis (SDA). The proposed approach consists of two steps. Firstly, the concepts of the symbolic data analysis will be used to aggregate the expression levels into functional intervals according to their functional categories. For unknown genes, we perform the gene selection procedures to select fewer genes that differentiate subtypes of a phenotypic response. The selected unknown genes are further aggregated into the intervals. Secondly, the regularized sliced inverse regression for interval-valued data is applied where the information of a phenotypic response of subjects acts as the slices. We illustrate the proposed method using several public gene expression data sets for data visualization and the class prediction. The results are compared with those of the regularized PCA. The results show that the proposed method can achieve better performance in understanding biologically relevant processes of genes and subjects than purely data-driven models.
dc.format.extent 4938629 bytes-
dc.format.mimetype application/pdf-
dc.relation (關聯) Statistics Symposium in Memory of Wen-Chen Chen, 中研院
dc.subject (關鍵詞) data visualization;interval-valued data;symbolic data analysis;sufficient dimension reduction;gene expression;biological knowledge
dc.title (題名) A symbolic data analysis approach to regularized sliced inverse regression for gene expression data with multiple functional categories and a phenotypic response
dc.type (資料類型) conference