Results 1 - 10
of
20
Large-Scale Comparison of Protein Sequence Alignment Algorithms With Structure Alignments
- Proteins
, 2000
"... Sequence alignment programs such as BLAST and PSI-BLAST are used routinely in pairwise, profile-based, or intermediate-sequencesearch (ISS) methods to detect remote homologies for the purposes of fold assignment and comparative modeling. Yet, the sequence alignment quality of these methods at low se ..."
Abstract
-
Cited by 36 (1 self)
- Add to MetaCart
Sequence alignment programs such as BLAST and PSI-BLAST are used routinely in pairwise, profile-based, or intermediate-sequencesearch (ISS) methods to detect remote homologies for the purposes of fold assignment and comparative modeling. Yet, the sequence alignment quality of these methods at low sequence identity is not known. We have used the CE structure alignment program (Shindyalov and Bourne, Prot Eng 1998;11: 739) to derive sequence alignments for all superfamily and family-level related proteins in the SCOP domain database. CE aligns structures and their sequences based on distances within each protein, rather than on interprotein distances. We compared BLAST, PSI-BLAST, CLUSTALW, and ISS alignments with the CE structural alignments. We found that global alignments with CLUSTALW were very poor at low sequence identity (<25%), as judged by the CE alignments. We used PSI-BLAST to search the nonredundant sequence database (nr) with every sequence in SCOP using up to four iterations. The resulting matrix was used to search a database of SCOP sequences. PSI-BLAST is only slightly better than BLAST in alignment accuracy on a perresidue basis, but PSI-BLAST matrix alignments are much longer than BLAST's, and so align correctly a larger fraction of the total number of aligned residues in the structure alignments. Any two SCOP sequences in the same superfamily that shared a hit or hits in the nr PSI-BLAST searches were identified as linked by the shared intermediate sequence. We examined the quality of the longest SCOP-query/ SCOP-hit alignment via an intermediate sequence, and found that ISS produced longer alignments than PSI-BLAST searches alone, of nearly comparable per-residue quality. At 10--15% sequence identity, BLAST correctly aligns 28%, PSI-BLAST 40%, and ISS ...
Protein structure alignment using a genetic algorithm
- Proteins
, 2000
"... ABSTRACT We have developed a novel, fully automatic method for aligning the three-dimensional structures of two proteins. The basic approach is to first align the proteins ’ secondary structure elements and then extend the alignment to include any equivalent residues found in loops or turns. The ini ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
ABSTRACT We have developed a novel, fully automatic method for aligning the three-dimensional structures of two proteins. The basic approach is to first align the proteins ’ secondary structure elements and then extend the alignment to include any equivalent residues found in loops or turns. The initial secondary structure element alignment is determined by a genetic algorithm. After refinement of the secondary structure element alignment, the protein backbones are superposed and a search is performed to identify any additional equivalent residues in a convergent process. Alignments are evaluated using intramolecular distance matrices. Alignments can be performed with or without sequential connectivity constraints. We have applied the method to proteins from several well-studied families: globins, immunoglobulins, serine proteases, dihydrofolate reductases, and DNA methyltransferases. Agreement with manually curated alignments is excellent. A web-based server and additional supporting information are
An Alternative View of Protein Fold Space
, 2000
"... Comparing and subsequently classifying protein structures information has received significant attention concurrent with the increase in the number of experimentally derived 3-dimensional structures. Classification schemes have focused on biological function found within protein domains and on struc ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Comparing and subsequently classifying protein structures information has received significant attention concurrent with the increase in the number of experimentally derived 3-dimensional structures. Classification schemes have focused on biological function found within protein domains and on structure classification based on topology. Here an alternative view is presented that groups substructures. Substructures are long (50 -- 150 residue) highly repetitive near-contiguous pieces of polypeptide chain that occur frequently in a set of proteins from the PDB defined as structurally nonredundant over the complete polypeptide chain. The substructure classification is based on a previously reported Combinatorial Extension (CE) algorithm that provides a significantly different set of structure alignments than those previously described, having, for example, only a 40% overlap with FSSP. Qualitatively the algorithm provides longer contiguous aligned segments at the price of a slightly highe...
The current excitement in bioinformatics, analysis of whole-genome expression data: How does it relate to protein structure and function?
- Current Opinion in Structural Biology In
, 2000
"... Whole-genome expression profiles provide a rich new data trove for bioinformatics. Initial analyses of the profiles have included clustering and cross-referencing to `external' information on protein structure and function. Expression-profile clusters do relate to protein function, but the correlati ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
Whole-genome expression profiles provide a rich new data trove for bioinformatics. Initial analyses of the profiles have included clustering and cross-referencing to `external' information on protein structure and function. Expression-profile clusters do relate to protein function, but the correlation is not perfect, with the discrepancies partially resulting from the difficulty in consistently defining function. Other attributes of proteins can also be related to expression -- in particular, structure and localization -- and sometimes show a clearer relationship than function. Introduction Bioinformatics has traditionally involved the computational analysis of large molecularbiology data sets. Initially, these were drawn from the world of protein structure. In 1995 the field changed with the advent of complete genome sequences, which represented a new type of largescale data. Now whole-genome expression experiments are providing further sources of large-scale data and transforming bi...
Conserved Key Amino Acid Positions (CKAAPs) Derived From the Analysis of Common Substructures in Proteins
, 2000
"... Anall-against-allproteinstructurecomparisonusingtheCombinatorialExtension (CE)algorithmappliedtoarepresentativesetof PDBstructuresrevealedagalleryofcommonsubstructuresinproteins (http://cl.sdsc.edu/ce.html). Thesesubstructuresrepresentcommonlyidentified folds,domains,orcomponentsthereof.Mostofthe su ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Anall-against-allproteinstructurecomparisonusingtheCombinatorialExtension (CE)algorithmappliedtoarepresentativesetof PDBstructuresrevealedagalleryofcommonsubstructuresinproteins (http://cl.sdsc.edu/ce.html). Thesesubstructuresrepresentcommonlyidentified folds,domains,orcomponentsthereof.Mostofthe subsequencesformingthesesimilarsubstructures havenosignificantsequencesimilarity.Wepresent amethodtoidentifyconservedaminoacidpositions andresidue-dependentpropertyclusterswithin thesesubsequencesstartingwithstructurealignments. Eachofthesubsequencesisalignedtoits homologuesinSWALL,anonredundantprotein sequencedatabase.Themostsimilarsequencesare purgedintoacommonfrequencymatrix,and weightedhomologuesofeachoneofthesubsequencesareusedinscoringforconservedkeyamino acidpositions(CKAAPs).Wehavesetthetop20%of thehigh-scoringpositionsineachsubstructureto beCKAAPs.ItishypothesizedthatCKAAPsmaybe responsibleforthecommonfoldingpatternsin eitheralocalorglobalviewoftheprotein-folding pathway.Whereasignifi...
Rapid Methods for Comparing Protein Structures and Scanning Structure Databases
"... Abstract: Databases of three-dimensional macromolecular structures became so large that fast search tools and comparison methods were needed and were actually designed. All of them employ simplified representations of the threedimensional structure: strings of characters of variable length, which ca ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract: Databases of three-dimensional macromolecular structures became so large that fast search tools and comparison methods were needed and were actually designed. All of them employ simplified representations of the threedimensional structure: strings of characters of variable length, which can be handled with procedures that were designed for sequence analysis; fixed dimension arrays that can be processed with standard statistical methods; ensembles of secondary structural elements, which are much less numerous than the atoms/residues of the protein; and continuous representations of the backbone, through stereochemical figures. Some of these computational procedures were developed long ago, when computers were too slow, and others have been designed recently, with the specific aim of handling large amount of information. The present article is focused on the algorithms that allow fast structure comparison, particularly suitable to handle large databases, and should provide a comprehensive picture, useful for the development and the assessment of novel tools.
structure-based phylogenetic study of insertions
, 2007
"... Insertions and the emergence of novel protein structure: a ..."
AUTOMATED DATA-DRIVEN DISCOVERY OF MOTIF-BASED PROTEIN FUNCTION CLASSIFIERS Xiangyun Wang, Diane Schroeder, Drena Dobbs, and Vasant Honavar 1
- Inf. Sci
, 2003
"... This paper describes an approach to data-driven discovery of decision trees or rules for assigning protein sequences to functional families using sequence motifs. This method is able to capture regularities that can be described in terms of presence or absence of arbitrary combinations of motifs. A ..."
Abstract
- Add to MetaCart
This paper describes an approach to data-driven discovery of decision trees or rules for assigning protein sequences to functional families using sequence motifs. This method is able to capture regularities that can be described in terms of presence or absence of arbitrary combinations of motifs. A training set of peptidase sequences labeled with the corresponding MEROPS functional families or clans is used to automatically construct decision trees that capture regularities sufficient to assign the sequences to their respective functional families. The performance of the resulting decision tree classifiers is then evaluated on an independent test set. We compared the rules constructed using motifs generated by a multiple sequence alignment based motif discovery tool (MEME) with rules constructed using expert annotated PROSIrE motifs (patterns and profiles). Our results indicate that the former provide a potentially powerful high throughput technique for constructing protein function classifiers when adequate training data are available. Examination of the generated rules in relation to known 3-dimensional structures of members in the case of two families (MEROPS families C14 and M12) suggests that the proposed technique may be able to identify combinations of sequence motifs that characterize functionally significant 3-dimensional structural features of proteins.
Open Access
, 2004
"... Research article The structurally constrained protein evolution model accounts for sequence patterns of the LβH superfamily ..."
Abstract
- Add to MetaCart
Research article The structurally constrained protein evolution model accounts for sequence patterns of the LβH superfamily

