• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

M: Comprehensive assessment of automatic structural alignment against a manual standard, the Scop classification of proteins. Protein Sci (1998)

by M Gerstein, Levitt
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 65
Next 10 →

Hidden Markov models for detecting remote protein homologies

by Kevin Karplus, Christian Barrett, Richard Hughey - Bioinformatics , 1998
"... A new hidden Markov model method (SAM-T98) for nding remote homologs of protein sequences is described and evaluated. The method begins with a single target sequence and iteratively builds a hidden Markov model (hmm) from the sequence and homologs found using the hmm for database search. SAM-T98 is ..."
Abstract - Cited by 229 (12 self) - Add to MetaCart
A new hidden Markov model method (SAM-T98) for nding remote homologs of protein sequences is described and evaluated. The method begins with a single target sequence and iteratively builds a hidden Markov model (hmm) from the sequence and homologs found using the hmm for database search. SAM-T98 is also used to construct model libraries automatically from sequences in structural databases. We evaluate the SAM-T98 method with four datasets. Three of the test sets are fold-recognition tests, where the correct answers are determined by structural similarity. The fourth uses a curated database. The method is compared against wu-blastp and against double-blast, a two-step method similar to ISS, but using blast instead of fasta. Results SAM-T98 had the fewest errors in all tests| dramatically so for the fold-recognition tests. At the minimum-error point on the SCOP-domains test, SAM-T98 got 880 true positives and 68 false positives, double-blast got 533 true positives with 71 false positives, and wu-blastp got 353 true positives with 24 false positives. The method is optimized to recognize superfamilies, and would require parameter adjustment to be used to nd family or fold relationships. One key to the performance of the hmm method is a new score-normalization technique that compares the score to the score with a reversed model rather than to a uniform null model. Availability A World Wide Web server, as well as information on obtaining the Sequence Alignment and PREPRINT to appear in Bioinformatics, 1999

A Discriminative Framework for Detecting Remote Protein Homologies

by Tommi Jaakkola , Mark Diekhans, David Haussler , 1999
"... A new method for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily. The method is a variant of support vector machines using a new kernel function. The kernel function is derived from a generative statistical model for a ..."
Abstract - Cited by 163 (4 self) - Add to MetaCart
A new method for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily. The method is a variant of support vector machines using a new kernel function. The kernel function is derived from a generative statistical model for a protein family, in this case a hidden Markov model. This general approach of combining generative models like HMMs with discriminative methods such as support vector machines may have applications in other areas of biosequence analysis as well.

Structure Comparison and Structure Patterns

by Ingvar Eidhammer, Inge Jonassen, William R. Taylor - JOURNAL OF COMPUTATIONAL BIOLOGY , 1999
"... This article investigate different aspects regarding pairwise and multiple structure comparison, and the problem of automatically discover common patterns in a set of structures. Descriptions and representation of structures and patterns are investigated, as well as scoring and algorithms for com ..."
Abstract - Cited by 69 (2 self) - Add to MetaCart
This article investigate different aspects regarding pairwise and multiple structure comparison, and the problem of automatically discover common patterns in a set of structures. Descriptions and representation of structures and patterns are investigated, as well as scoring and algorithms for comparison and discovery. A framework and nomenclature is developed, and a lot of methods are reviewed and placed into this framework.

Within the Twilight Zone: A Sensitive Profile-Profile Comparison Tool Based on Information Theory

by Golan Yona, Michael Levitt - J. Mol. Biol , 2002
"... This paper presents a novel approach to prole-prole comparison. The method compares two input proles (like those that are generated by PSI-BLAST) and assigns a similarity score to assess their statistical similarity. Our prole-prole comparison tool, which allows for gaps, can be used to detect weak ..."
Abstract - Cited by 68 (4 self) - Add to MetaCart
This paper presents a novel approach to prole-prole comparison. The method compares two input proles (like those that are generated by PSI-BLAST) and assigns a similarity score to assess their statistical similarity. Our prole-prole comparison tool, which allows for gaps, can be used to detect weak similarities between protein families. It has also been optimized to produce alignments that are in very good agreement with structural alignments. Tests show that the prole-prole alignments are indeed highly correlated with similarities between secondary structure elements and tertiary structure. Exhaustive evaluations show that our method is signicantly more sensitive in detecting distant homologies than the popular prole-based search programs PSI-BLAST and IMPALA. The relative improvement is the same order of magnitude as the improvement of PSI-BLAST relative to BLAST. Our new tool often detects similarities that fall within the twilight zone of sequence similarity

Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures

by Rachel Kolodny, Patrice Koehl, Michael Levitt - J Mol Biol , 2005
"... The problem of aligning, or establishing a correspondence between, residues of two protein Abbreviations used: ROC, receiver operating ..."
Abstract - Cited by 53 (0 self) - Add to MetaCart
The problem of aligning, or establishing a correspondence between, residues of two protein Abbreviations used: ROC, receiver operating

M: Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions

by Jiang Qian, Marisa Dolled-filhart, Jimmy Lin, Haiyuan Yu, Mark Gerstein - J Mol Biol
"... The complexity of biological systems provides for a great diversity of relationships between genes. The current analysis of whole-genome expression data focuses on relationships based on global correlation over a whole time-course, identifying clusters of genes whose expression levels simultaneously ..."
Abstract - Cited by 45 (4 self) - Add to MetaCart
The complexity of biological systems provides for a great diversity of relationships between genes. The current analysis of whole-genome expression data focuses on relationships based on global correlation over a whole time-course, identifying clusters of genes whose expression levels simultaneously rise and fall. There are, of course, other potential relationships between genes, which are missed by such global clustering. These include activation, where one expects a time-delay between related expression pro®les, and inhibition, where one expects an inverted relationship. Here, we propose a new method, which we call local clustering, for identifying these time-delayed and inverted relationships. It is related to conventional gene-expression clustering in a fashion analogous to the way local sequence alignment (the Smith-Waterman algorithm) is derived from global alignment (Needleman-Wunsch). An integral part of our method is the use of random score distributions to assess the statistical signi®cance of each cluster. We applied our method to the yeast cellcycle

representative are the known structures of the proteins in a complete genome? A comprehensive structural census. Fold Des 3

by Mark Gerstein , 1998
"... Manuscript is 43 Pages in Length (including this one) ..."
Abstract - Cited by 44 (24 self) - Add to MetaCart
Manuscript is 43 Pages in Length (including this one)

Predicting Protein Structure using only Sequence Information

by Kevin Karplus, Christian Barrett, Melissa Cline, Mark Diekhans, Leslie Grate, Richard Hughey - Proteins , 1999
"... A prediction server using the SAM-T98 method dis-This paper presents results of blind predictions subcussed here is available on the World-Wide Web mitted to the CASP3 protein structure prediction experiment. We made predictions using the SAM-T98 method, an iterative hidden Markov model based method ..."
Abstract - Cited by 43 (13 self) - Add to MetaCart
A prediction server using the SAM-T98 method dis-This paper presents results of blind predictions subcussed here is available on the World-Wide Web mitted to the CASP3 protein structure prediction experiment. We made predictions using the SAM-T98 method, an iterative hidden Markov model based method for constructing protein family profiles. The method is purely sequence based—using no structural information—and yet was able to predict structures as well as all but five of the structure-based methods in CASP3. 1

Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels

by Jimmy Lin, Mark Gerstein - Genome Res , 2000
"... We built “whole-genome ” trees based on the presence or absence of particular molecular features (either orthologs or folds) in the genomes of a number of recently sequenced microorganisms. To put these genomic trees into perspective, we compared them to the traditional ribosomal phylogeny and also ..."
Abstract - Cited by 43 (17 self) - Add to MetaCart
We built “whole-genome ” trees based on the presence or absence of particular molecular features (either orthologs or folds) in the genomes of a number of recently sequenced microorganisms. To put these genomic trees into perspective, we compared them to the traditional ribosomal phylogeny and also to trees based on the sequence similarity of individual orthologous proteins. We found that our genomic trees that were based on the overall occurrence of orthologs did not agree with the traditional tree. This discrepancy, however, vanished when one restricted the tree to proteins involved in transcription and translation, not including problematic ones involved in metabolism. Protein folds unite superficially unrelated families of proteins and represent a most fundamental molecular unit described by genomes. We found our genomic occurrence tree based on folds agreed fairly well with the traditional ribosomal phylogeny. Surprisingly, despite this overall agreement, certain classes of folds, particularly all-beta ones, appear to have a somewhat different phylogenetic distribution. We also compared our occurrence trees to whole-genome clusters built based on the composition of amino acids and di-nucleotides. Additional information (clickable trees, plots, etc.) is available from

Large-Scale Comparison of Protein Sequence Alignment Algorithms With Structure Alignments

by J. Michael Sauder, Jonathan W. Arthur, Roland L. Dunbrack - Proteins , 2000
"... Sequence alignment programs such as BLAST and PSI-BLAST are used routinely in pairwise, profile-based, or intermediate-sequencesearch (ISS) methods to detect remote homologies for the purposes of fold assignment and comparative modeling. Yet, the sequence alignment quality of these methods at low se ..."
Abstract - Cited by 36 (1 self) - Add to MetaCart
Sequence alignment programs such as BLAST and PSI-BLAST are used routinely in pairwise, profile-based, or intermediate-sequencesearch (ISS) methods to detect remote homologies for the purposes of fold assignment and comparative modeling. Yet, the sequence alignment quality of these methods at low sequence identity is not known. We have used the CE structure alignment program (Shindyalov and Bourne, Prot Eng 1998;11: 739) to derive sequence alignments for all superfamily and family-level related proteins in the SCOP domain database. CE aligns structures and their sequences based on distances within each protein, rather than on interprotein distances. We compared BLAST, PSI-BLAST, CLUSTALW, and ISS alignments with the CE structural alignments. We found that global alignments with CLUSTALW were very poor at low sequence identity (<25%), as judged by the CE alignments. We used PSI-BLAST to search the nonredundant sequence database (nr) with every sequence in SCOP using up to four iterations. The resulting matrix was used to search a database of SCOP sequences. PSI-BLAST is only slightly better than BLAST in alignment accuracy on a perresidue basis, but PSI-BLAST matrix alignments are much longer than BLAST's, and so align correctly a larger fraction of the total number of aligned residues in the structure alignments. Any two SCOP sequences in the same superfamily that shared a hit or hits in the nr PSI-BLAST searches were identified as linked by the shared intermediate sequence. We examined the quality of the longest SCOP-query/ SCOP-hit alignment via an intermediate sequence, and found that ISS produced longer alignments than PSI-BLAST searches alone, of nearly comparable per-residue quality. At 10--15% sequence identity, BLAST correctly aligns 28%, PSI-BLAST 40%, and ISS ...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University