Improving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words

張家銘; Chang, Jia-Ming; Hsu*, Wen-Lian; Sung*, Ting-Yi; Notredame, Cédric; Lin, Hsin-Nan

Please use this identifier to cite or link to this item: https://ah.lib.nccu.edu.tw/handle/140.119/97015

DC Field	Value	Language
dc.contributor	資科系	-
dc.creator	張家銘	zh_TW
dc.creator	Chang, Jia-Ming	-
dc.creator	Hsu*, Wen-Lian	en_US
dc.creator	Sung*, Ting-Yi	en_US
dc.creator	Notredame, Cédric	en_US
dc.creator	Lin, Hsin-Nan	en_US
dc.date	2011-12	-
dc.date.accessioned	2016-05-30T09:24:57Z	-
dc.date.available	2016-05-30T09:24:57Z	-
dc.date.issued	2016-05-30T09:24:57Z	-
dc.identifier.uri	http://nccur.lib.nccu.edu.tw/handle/140.119/97015	-
dc.description.abstract	Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently. In this paper, we present a flexible similarity measure for residue pairs to improve the quality of protein sequence alignment. Our approach, called SymAlign, relies on the identification of conserved words found across a sizeable fraction of the considered dataset, and supported by evolutionary analysis. These words are then used to define a position specific substitution matrix that better reflects the biological significance of local similarity. The experiment results show that the SymAlign scoring scheme can be incorporated within T-Coffee to improve sequence alignment accuracy. We also demonstrate that SymAlign is less sensitive to the presence of structurally non-similar proteins. In the analysis of the relationship between sequence identity and structure similarity, SymAlign can better differentiate structurally similar proteins from non- similar proteins. We show that protein sequence alignments can be significantly improved using a similarity estimation based on weighted n-grams. In our analysis of the alignments thus produced, sequence conservation becomes a better indicator of structural similarity. SymAlign also provides alignment visualization that can display sub-optimal alignments on dot-matrices. The visualization makes it easy to identify well-supported alternative alignments that may not have been identified by dynamic programming. SymAlign is available at http://bio-cluster.iis.sinica.edu.tw/SymAlign/.	-
dc.format.extent	396529 bytes	-
dc.format.mimetype	application/pdf	-
dc.relation	PLoS One, Vol.6, No.12, pp.e27872	-
dc.title	Improving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words	-
dc.type	article	-
dc.identifier.doi	10.1371/journal.pone.0027872	-
dc.doi.uri	http://dx.doi.org/10.1371/journal.pone.0027872	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.openairetype	article	-
item.fulltext	With Fulltext	-
item.grantfulltext	restricted	-
item.cerifentitytype	Publications	-
Appears in Collections:	期刊論文

Files in This Item:

File	Description	Size	Format
Improving.pdf		387.24 kB	Adobe PDF2	View/Open

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Google Scholar^TM

Altmetric

Altmetric

Files in This Item:

Google ScholarTM

Altmetric

Altmetric

Google Scholar^TM