Improving the alignment quality of consistency based aligners with an evaluation function using synonymou...

學術產出-Periodical Articles

Article View/Open

pdf(530)

Publication Export

Google Scholar^TM

政大圖書館

學術資源探索系統

Citation Infomation

Simple Record
Full Record

題名	Improving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words
作者	張家銘 Chang, Jia-Ming Hsu, Wen-Lian Sung, Ting-Yi Notredame, Cédric Lin, Hsin-Nan
貢獻者	資科系
日期	2011-12
上傳時間	30-May-2016 17:24:57 (UTC+8)
摘要	Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently. In this paper, we present a flexible similarity measure for residue pairs to improve the quality of protein sequence alignment. Our approach, called SymAlign, relies on the identification of conserved words found across a sizeable fraction of the considered dataset, and supported by evolutionary analysis. These words are then used to define a position specific substitution matrix that better reflects the biological significance of local similarity. The experiment results show that the SymAlign scoring scheme can be incorporated within T-Coffee to improve sequence alignment accuracy. We also demonstrate that SymAlign is less sensitive to the presence of structurally non-similar proteins. In the analysis of the relationship between sequence identity and structure similarity, SymAlign can better differentiate structurally similar proteins from non- similar proteins. We show that protein sequence alignments can be significantly improved using a similarity estimation based on weighted n-grams. In our analysis of the alignments thus produced, sequence conservation becomes a better indicator of structural similarity. SymAlign also provides alignment visualization that can display sub-optimal alignments on dot-matrices. The visualization makes it easy to identify well-supported alternative alignments that may not have been identified by dynamic programming. SymAlign is available at http://bio-cluster.iis.sinica.edu.tw/SymAlign/.
關聯	PLoS One, Vol.6, No.12, pp.e27872
資料類型	article
DOI	http://dx.doi.org/10.1371/journal.pone.0027872

dc.contributor	資科系	-
dc.creator (作者)	張家銘	zh_TW
dc.creator (作者)	Chang, Jia-Ming	-
dc.creator (作者)	Hsu*, Wen-Lian	en_US
dc.creator (作者)	Sung*, Ting-Yi	en_US
dc.creator (作者)	Notredame, Cédric	en_US
dc.creator (作者)	Lin, Hsin-Nan	en_US
dc.date (日期)	2011-12	-
dc.date.accessioned	30-May-2016 17:24:57 (UTC+8)	-
dc.date.available	30-May-2016 17:24:57 (UTC+8)	-
dc.date.issued (上傳時間)	30-May-2016 17:24:57 (UTC+8)	-
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/97015	-
dc.description.abstract (摘要)	Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently. In this paper, we present a flexible similarity measure for residue pairs to improve the quality of protein sequence alignment. Our approach, called SymAlign, relies on the identification of conserved words found across a sizeable fraction of the considered dataset, and supported by evolutionary analysis. These words are then used to define a position specific substitution matrix that better reflects the biological significance of local similarity. The experiment results show that the SymAlign scoring scheme can be incorporated within T-Coffee to improve sequence alignment accuracy. We also demonstrate that SymAlign is less sensitive to the presence of structurally non-similar proteins. In the analysis of the relationship between sequence identity and structure similarity, SymAlign can better differentiate structurally similar proteins from non- similar proteins. We show that protein sequence alignments can be significantly improved using a similarity estimation based on weighted n-grams. In our analysis of the alignments thus produced, sequence conservation becomes a better indicator of structural similarity. SymAlign also provides alignment visualization that can display sub-optimal alignments on dot-matrices. The visualization makes it easy to identify well-supported alternative alignments that may not have been identified by dynamic programming. SymAlign is available at http://bio-cluster.iis.sinica.edu.tw/SymAlign/.	-
dc.format.extent	396529 bytes	-
dc.format.mimetype	application/pdf	-
dc.relation (關聯)	PLoS One, Vol.6, No.12, pp.e27872	-
dc.title (題名)	Improving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words	-
dc.type (資料類型)	article	-
dc.identifier.doi (DOI)	10.1371/journal.pone.0027872	-
dc.doi.uri (DOI)	http://dx.doi.org/10.1371/journal.pone.0027872	-

學術產出-Periodical Articles

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

Google Scholar^TM