Publications-NSC Projects
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 充分維度縮減在基因集分析上的運用 作者 薛慧敏 貢獻者 統計系 關鍵詞 基因集分析; 差異共變; 充分維度縮減; 非線性相關
Gene set analysis; Differential coexpression; Sufficient dimension reduction; Non-linear associations日期 2016 上傳時間 17-May-2017 16:31:06 (UTC+8) 摘要 在基因微陣列(microarray)實驗中,基因集分析(gene-set analysis, GSA)的目的為檢定多個基因所形成的集合與外顯表現變數(phenotype)的相關顯著性。目前已有多個公開資料庫提供基因組相關資訊。例如分子特徵資料庫(MSigDB)中包含數個系列,其中包括彙整其他基因資料庫以及生物醫學相關學術期刊的結果所定義之基因庫。這些基因集合依據基因之生物功能將基因歸類。當外顯表現變數為二元或類別型態時,文獻上已發表的基因集分析方法多數是偵測基因表現量的平均差異。Cook與Weisberg (1991)曾提出的「切片平均變異法」(sliced average variance estimation)來估計基因資料的充分維度縮減(sufficient dimension reduction)中央子空間(central subspace),若基因集與外顯變數無關,則該空間的維度應當為零。所以我們提出以檢定”中央子空間維度為零”的假設以評估該基因集的顯著性。本方法將可掘取及運用資料中更豐富的資訊,而且本方法將適用於類別、量化的外顯變數資料。運用電腦模擬,我們驗證本方法的有效性。
Gene set analysis (GSA) aims to evaluate the association between the expression of biological pathways, or a priori defined gene sets, and a particular phenotype. Numerous GSA methods have been proposed to assess the enrichment of sets of genes. However, most methods are developed with respectto a specific alternative scenario, such as a differential mean pattern or a differential coexpression. Moreover, a very limited number of methods can handle either binary, categorical, or continuous phenotypes. In this paper, we develop two novel GSA tests, called SDRs, based on the sufficient dimension reduction technique, which aims to capture sufficient information about the relationshipbetween genes and the phenotype. The advantages of our proposed methods are that they allow for categorical and continuous phenotypes, and they are also able to identify a variety of enriched gene sets. Through simulation studies, we compared the type I error and power of SDRs with existing GSA methods for binary, triple, and continuous phenotypes. We found that SDR methods adequately controlthe type I error rate at the pre-specified nominal level, and they have a satisfactory power to detect gene sets with differential coexpression and to test non-linear associations between gene sets and a continuous phenotype. In addition, the SDR methods were compared with seven widely-used GSA methods using two real microarray datasets for illustration. We concluded that the SDR methodsoutperform the others because of their flexibility with regard to handling different kinds of phenotypes and their power to detect a wide range of alternative scenarios. Our real data analysis highlights the differences between GSAmethods for detecting enriched gene sets.關聯 MOST 104-2118-M-004-002 資料類型 report dc.contributor 統計系 dc.creator (作者) 薛慧敏 zh_TW dc.date (日期) 2016 dc.date.accessioned 17-May-2017 16:31:06 (UTC+8) - dc.date.available 17-May-2017 16:31:06 (UTC+8) - dc.date.issued (上傳時間) 17-May-2017 16:31:06 (UTC+8) - dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/109731 - dc.description.abstract (摘要) 在基因微陣列(microarray)實驗中,基因集分析(gene-set analysis, GSA)的目的為檢定多個基因所形成的集合與外顯表現變數(phenotype)的相關顯著性。目前已有多個公開資料庫提供基因組相關資訊。例如分子特徵資料庫(MSigDB)中包含數個系列,其中包括彙整其他基因資料庫以及生物醫學相關學術期刊的結果所定義之基因庫。這些基因集合依據基因之生物功能將基因歸類。當外顯表現變數為二元或類別型態時,文獻上已發表的基因集分析方法多數是偵測基因表現量的平均差異。Cook與Weisberg (1991)曾提出的「切片平均變異法」(sliced average variance estimation)來估計基因資料的充分維度縮減(sufficient dimension reduction)中央子空間(central subspace),若基因集與外顯變數無關,則該空間的維度應當為零。所以我們提出以檢定”中央子空間維度為零”的假設以評估該基因集的顯著性。本方法將可掘取及運用資料中更豐富的資訊,而且本方法將適用於類別、量化的外顯變數資料。運用電腦模擬,我們驗證本方法的有效性。 dc.description.abstract (摘要) Gene set analysis (GSA) aims to evaluate the association between the expression of biological pathways, or a priori defined gene sets, and a particular phenotype. Numerous GSA methods have been proposed to assess the enrichment of sets of genes. However, most methods are developed with respectto a specific alternative scenario, such as a differential mean pattern or a differential coexpression. Moreover, a very limited number of methods can handle either binary, categorical, or continuous phenotypes. In this paper, we develop two novel GSA tests, called SDRs, based on the sufficient dimension reduction technique, which aims to capture sufficient information about the relationshipbetween genes and the phenotype. The advantages of our proposed methods are that they allow for categorical and continuous phenotypes, and they are also able to identify a variety of enriched gene sets. Through simulation studies, we compared the type I error and power of SDRs with existing GSA methods for binary, triple, and continuous phenotypes. We found that SDR methods adequately controlthe type I error rate at the pre-specified nominal level, and they have a satisfactory power to detect gene sets with differential coexpression and to test non-linear associations between gene sets and a continuous phenotype. In addition, the SDR methods were compared with seven widely-used GSA methods using two real microarray datasets for illustration. We concluded that the SDR methodsoutperform the others because of their flexibility with regard to handling different kinds of phenotypes and their power to detect a wide range of alternative scenarios. Our real data analysis highlights the differences between GSAmethods for detecting enriched gene sets. dc.format.extent 1731915 bytes - dc.format.mimetype application/pdf - dc.relation (關聯) MOST 104-2118-M-004-002 dc.subject (關鍵詞) 基因集分析; 差異共變; 充分維度縮減; 非線性相關 dc.subject (關鍵詞) Gene set analysis; Differential coexpression; Sufficient dimension reduction; Non-linear associations dc.title (題名) 充分維度縮減在基因集分析上的運用 zh_TW dc.type (資料類型) report