學術產出-NSC Projects

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 充分維度縮減在基因集分析上的運用
作者 薛慧敏
貢獻者 統計系
關鍵詞 基因集分析; 差異共變; 充分維度縮減; 非線性相關
Gene set analysis; Differential coexpression; Sufficient dimension reduction; Non-linear associations
日期 2016
上傳時間 17-May-2017 16:31:06 (UTC+8)
摘要 在基因微陣列(microarray)實驗中,基因集分析(gene-set analysis, GSA)的目的為檢定多個基因所形成的集合與外顯表現變數(phenotype)的相關顯著性。目前已有多個公開資料庫提供基因組相關資訊。例如分子特徵資料庫(MSigDB)中包含數個系列,其中包括彙整其他基因資料庫以及生物醫學相關學術期刊的結果所定義之基因庫。這些基因集合依據基因之生物功能將基因歸類。當外顯表現變數為二元或類別型態時,文獻上已發表的基因集分析方法多數是偵測基因表現量的平均差異。Cook與Weisberg (1991)曾提出的「切片平均變異法」(sliced average variance estimation)來估計基因資料的充分維度縮減(sufficient dimension reduction)中央子空間(central subspace),若基因集與外顯變數無關,則該空間的維度應當為零。所以我們提出以檢定”中央子空間維度為零”的假設以評估該基因集的顯著性。本方法將可掘取及運用資料中更豐富的資訊,而且本方法將適用於類別、量化的外顯變數資料。運用電腦模擬,我們驗證本方法的有效性。
Gene set analysis (GSA) aims to evaluate the association between the expression of biological pathways, or a priori defined gene sets, and a particular phenotype. Numerous GSA methods have been proposed to assess the enrichment of sets of genes. However, most methods are developed with respect
to a specific alternative scenario, such as a differential mean pattern or a differential coexpression. Moreover, a very limited number of methods can handle either binary, categorical, or continuous phenotypes. In this paper, we develop two novel GSA tests, called SDRs, based on the sufficient dimension reduction technique, which aims to capture sufficient information about the relationship
between genes and the phenotype. The advantages of our proposed methods are that they allow for categorical and continuous phenotypes, and they are also able to identify a variety of enriched gene sets. Through simulation studies, we compared the type I error and power of SDRs with existing GSA methods for binary, triple, and continuous phenotypes. We found that SDR methods adequately control
the type I error rate at the pre-specified nominal level, and they have a satisfactory power to detect gene sets with differential coexpression and to test non-linear associations between gene sets and a continuous phenotype. In addition, the SDR methods were compared with seven widely-used GSA methods using two real microarray datasets for illustration. We concluded that the SDR methods
outperform the others because of their flexibility with regard to handling different kinds of phenotypes and their power to detect a wide range of alternative scenarios. Our real data analysis highlights the differences between GSA
methods for detecting enriched gene sets.
關聯 MOST 104-2118-M-004-002
資料類型 report
dc.contributor 統計系
dc.creator (作者) 薛慧敏zh_TW
dc.date (日期) 2016
dc.date.accessioned 17-May-2017 16:31:06 (UTC+8)-
dc.date.available 17-May-2017 16:31:06 (UTC+8)-
dc.date.issued (上傳時間) 17-May-2017 16:31:06 (UTC+8)-
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/109731-
dc.description.abstract (摘要) 在基因微陣列(microarray)實驗中,基因集分析(gene-set analysis, GSA)的目的為檢定多個基因所形成的集合與外顯表現變數(phenotype)的相關顯著性。目前已有多個公開資料庫提供基因組相關資訊。例如分子特徵資料庫(MSigDB)中包含數個系列,其中包括彙整其他基因資料庫以及生物醫學相關學術期刊的結果所定義之基因庫。這些基因集合依據基因之生物功能將基因歸類。當外顯表現變數為二元或類別型態時,文獻上已發表的基因集分析方法多數是偵測基因表現量的平均差異。Cook與Weisberg (1991)曾提出的「切片平均變異法」(sliced average variance estimation)來估計基因資料的充分維度縮減(sufficient dimension reduction)中央子空間(central subspace),若基因集與外顯變數無關,則該空間的維度應當為零。所以我們提出以檢定”中央子空間維度為零”的假設以評估該基因集的顯著性。本方法將可掘取及運用資料中更豐富的資訊,而且本方法將適用於類別、量化的外顯變數資料。運用電腦模擬,我們驗證本方法的有效性。
dc.description.abstract (摘要) Gene set analysis (GSA) aims to evaluate the association between the expression of biological pathways, or a priori defined gene sets, and a particular phenotype. Numerous GSA methods have been proposed to assess the enrichment of sets of genes. However, most methods are developed with respect
to a specific alternative scenario, such as a differential mean pattern or a differential coexpression. Moreover, a very limited number of methods can handle either binary, categorical, or continuous phenotypes. In this paper, we develop two novel GSA tests, called SDRs, based on the sufficient dimension reduction technique, which aims to capture sufficient information about the relationship
between genes and the phenotype. The advantages of our proposed methods are that they allow for categorical and continuous phenotypes, and they are also able to identify a variety of enriched gene sets. Through simulation studies, we compared the type I error and power of SDRs with existing GSA methods for binary, triple, and continuous phenotypes. We found that SDR methods adequately control
the type I error rate at the pre-specified nominal level, and they have a satisfactory power to detect gene sets with differential coexpression and to test non-linear associations between gene sets and a continuous phenotype. In addition, the SDR methods were compared with seven widely-used GSA methods using two real microarray datasets for illustration. We concluded that the SDR methods
outperform the others because of their flexibility with regard to handling different kinds of phenotypes and their power to detect a wide range of alternative scenarios. Our real data analysis highlights the differences between GSA
methods for detecting enriched gene sets.
dc.format.extent 1731915 bytes-
dc.format.mimetype application/pdf-
dc.relation (關聯) MOST 104-2118-M-004-002
dc.subject (關鍵詞) 基因集分析; 差異共變; 充分維度縮減; 非線性相關
dc.subject (關鍵詞) Gene set analysis; Differential coexpression; Sufficient dimension reduction; Non-linear associations
dc.title (題名) 充分維度縮減在基因集分析上的運用zh_TW
dc.type (資料類型) report