• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Hughey R: Hidden Markov models for detecting remote protein homologies. Bioinformatics (1998)

by K Karplus, C Barrett
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 150
Next 10 →

A Discriminative Framework for Detecting Remote Protein Homologies

by Tommi Jaakkola , Mark Diekhans, David Haussler , 1999
"... A new method for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily. The method is a variant of support vector machines using a new kernel function. The kernel function is derived from a generative statistical model for a ..."
Abstract - Cited by 163 (4 self) - Add to MetaCart
A new method for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily. The method is a variant of support vector machines using a new kernel function. The kernel function is derived from a generative statistical model for a protein family, in this case a hidden Markov model. This general approach of combining generative models like HMMs with discriminative methods such as support vector machines may have applications in other areas of biosequence analysis as well.

Using the Fisher kernel method to detect remote protein homologies

by Tommi Jaakkola, Mark Diekhans, David Haussler - In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology , 1999
"... A new method, called the Fisher kernel method, for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily. The method is a variant of support vector machines using a new kernel function. The kernel function is derived from a hid ..."
Abstract - Cited by 126 (3 self) - Add to MetaCart
A new method, called the Fisher kernel method, for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily. The method is a variant of support vector machines using a new kernel function. The kernel function is derived from a hidden Markov model. The general approach of combining generative models like HMMs with discriminative methods such as support vector machines may have applications in other areas of biosequence analysis as well.

Combining Pairwise Sequence Similarity and Support Vector Machines for Remote Protein Homology Detection

by Li Liao, William Stafford Noble - J. Comput. Biol , 2002
"... One key element in understanding the molecular machinery of the cell is to understand the meaning, or function, of each protein encoded in the genome. A very successful means of inferring the function of a previously unannotated protein is via sequence similarity with one or more proteins whose func ..."
Abstract - Cited by 116 (12 self) - Add to MetaCart
One key element in understanding the molecular machinery of the cell is to understand the meaning, or function, of each protein encoded in the genome. A very successful means of inferring the function of a previously unannotated protein is via sequence similarity with one or more proteins whose functions are already known. Currently, one of the most powerful such homology detection methods is the SVM-Fisher method of Jaakkola, Diekhans and Haussler (ISMB 2000). This method combines a generative, profile hidden Markov model (HMM) with a discriminative classification algorithm known as a support vector machine (SVM). The current work presents an alternative method for SVMbased protein classification. The method, SVM-pairwise, uses a pairwise sequence similarity algorithm such as SmithWaterman in place of the HMM in the SVM-Fisher method. The resulting algorithm, when tested on its ability to recognize previously unseen families from the SCOP database, yields significantly better remote protein homology detection than SVM-Fisher, profile HMMs and PSI-BLAST.

Review: Protein Secondary Structure Prediction Continues to Rise

by Burkhard Rost - J. Struct. Biol , 2001
"... f prediction accuracy? We shall see. 2001 Academic Press INTRODUCTION History. Linus Pauling correctly guessed the formation of helices and strands (14, 15) (and falsely hypothesized other structures). Three years before Pauling's guess was verified by the publications of the first X-ray structure ..."
Abstract - Cited by 92 (13 self) - Add to MetaCart
f prediction accuracy? We shall see. 2001 Academic Press INTRODUCTION History. Linus Pauling correctly guessed the formation of helices and strands (14, 15) (and falsely hypothesized other structures). Three years before Pauling's guess was verified by the publications of the first X-ray structures (16, 17), one group had already ventured to predict secondary structure from sequence (18). The first-generation prediction methods following in the 1960s and 1970s were all based on single amino acid propensities (19). The second-generation methods dominating the scene until the early 1990s used propensities for segments of 3--51 adjacent residues (19). Basically any imaginable theoretical algorithm had been applied to the problem of predicting secondary structure from sequence. However, it seemed that prediction accuracy stalled at levels slightly above 60% (percentage of residues predicted correctly in one of the three states: helix, strand, and other). The reason for this limit was the

Within the Twilight Zone: A Sensitive Profile-Profile Comparison Tool Based on Information Theory

by Golan Yona, Michael Levitt - J. Mol. Biol , 2002
"... This paper presents a novel approach to prole-prole comparison. The method compares two input proles (like those that are generated by PSI-BLAST) and assigns a similarity score to assess their statistical similarity. Our prole-prole comparison tool, which allows for gaps, can be used to detect weak ..."
Abstract - Cited by 73 (4 self) - Add to MetaCart
This paper presents a novel approach to prole-prole comparison. The method compares two input proles (like those that are generated by PSI-BLAST) and assigns a similarity score to assess their statistical similarity. Our prole-prole comparison tool, which allows for gaps, can be used to detect weak similarities between protein families. It has also been optimized to produce alignments that are in very good agreement with structural alignments. Tests show that the prole-prole alignments are indeed highly correlated with similarities between secondary structure elements and tertiary structure. Exhaustive evaluations show that our method is signicantly more sensitive in detecting distant homologies than the popular prole-based search programs PSI-BLAST and IMPALA. The relative improvement is the same order of magnitude as the improvement of PSI-BLAST relative to BLAST. Our new tool often detects similarities that fall within the twilight zone of sequence similarity

Combining phylogenetic and hidden Markov models in biosequence analysis

by Adam Siepel - J. Comput. Biol , 2004
"... A few models have appeared in recent years that consider not only the way substitutions occur through evolutionary history at each site of a genome, but also the way the process changes from one site to the next. These models combine phylogenetic models of molecular evolution, which apply to individ ..."
Abstract - Cited by 71 (6 self) - Add to MetaCart
A few models have appeared in recent years that consider not only the way substitutions occur through evolutionary history at each site of a genome, but also the way the process changes from one site to the next. These models combine phylogenetic models of molecular evolution, which apply to individual sites, and hidden Markov models, which allow for changes from site to site. Besides improving the realism of ordinary phylogenetic models, they are potentially very powerful tools for inference and prediction—for gene finding, for example, or prediction of secondary structure. In this paper, we review progress on combined phylogenetic and hidden Markov models and present some extensions to previous work. Our main result is a simple and efficient method for accommodating higher-order states in the HMM, which allows for context-sensitive models of substitution— that is, models that consider the effects of neighboring bases on the pattern of substitution. We present experimental results indicating that higher-order states, autocorrelated rates, and multiple functional categories all lead to significant improvements in the fit of a combined phylogenetic and hidden Markov model, with the effect of higher-order states being particularly pronounced.

Assignment of homology to genome sequences using a library of hidden markov models that represent all proteins of known structure

by Julian Gough, Kevin Karplus, Richard Hughey, Cyrus Chothia - J. Mol. Biol , 2001
"... Protein structure prediction, to discover the fold and hence information about the probable function of the sequence of a gene about which nothing is known, is possible via homology to a sequence of ..."
Abstract - Cited by 69 (12 self) - Add to MetaCart
Protein structure prediction, to discover the fold and hence information about the probable function of the sequence of a gene about which nothing is known, is possible via homology to a sequence of

Hybrid Fold Recognition: Combining Sequence Derived Properties with Evolutionary Information.

by Daniel Fischer - Pac. Symp. Biocomput , 2000
"... Introduction Protein fold recognition aims to assign each new amino acid sequence to the known three-dimensional fold which it most closely resembles. The assignment is carried out by searching a library of known structures for a compatible fold. Fold-recognition methods have demonstrated their cap ..."
Abstract - Cited by 51 (5 self) - Add to MetaCart
Introduction Protein fold recognition aims to assign each new amino acid sequence to the known three-dimensional fold which it most closely resembles. The assignment is carried out by searching a library of known structures for a compatible fold. Fold-recognition methods have demonstrated their capabilities in computeraided assessment experiments such as CASP 1 as well as in fully automated assessment experiments such as CAFASP-1 2 . In the former, fold-recognition programs coupled with human intervention were able to correctly predict the folds of proteins of (then) unknown structure. In the latter, the performance of the methods was not as good, but still it was superior to sequence-comparison methods such as PSI-BLAST 3 . CAFASP-1 demonstrated that no single approach was markedly superior to the others evaluated when considered across the entire range of targets. In some cases, exploiting evolutionary information from ne

BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark

by Julie D. Thompson, Patrice Koehl, Olivier Poch - Proteins , 2005
"... ABSTRACT Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing ..."
Abstract - Cited by 48 (1 self) - Add to MetaCart
ABSTRACT Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing and structural proteomics have lead to an explosion in the amount of sequence and structure information available. In response, several new multiple alignment methods have been developed that improve both the efficiency and the quality of protein alignments. Consequently, the benchmarks used to evaluate and compare these methods must also evolve. We present here the latest release of the most widely used multiple alignment benchmark, BAliBASE, which provides high quality, manually refined, reference alignments based on 3D structural superpositions. Version 3.0 of BAliBASE includes new, more challenging test cases, representing the real problems encountered when aligning large sets of complex sequences. Using a novel, semiautomatic update protocol, the number of protein families in the benchmark has been increased and representative test cases are now available that cover most of the protein fold space. The total number of proteins in BAliBASE has also been significantly increased from 1444 to 6255 sequences. In addition, full-length sequences are now provided for all test cases, which represent difficult cases for both global and local alignment programs. Finally, the BAliBASE Web site

Classifying G-protein coupled receptors with support vector machines

by Rachel Karchin, Kevin Karplus, David Haussler - Bioinformatics , 2001
"... Motivation: The enormous amount of protein sequence data uncovered by genome research has increased the demand for computer software that can automate the recognition of new proteins. We discuss the relative merits of various automated methods for recognizing G-protein coupled receptors (GPCRs), a ..."
Abstract - Cited by 47 (0 self) - Add to MetaCart
Motivation: The enormous amount of protein sequence data uncovered by genome research has increased the demand for computer software that can automate the recognition of new proteins. We discuss the relative merits of various automated methods for recognizing G-protein coupled receptors (GPCRs), a superfamily of cell membrane proteins. GPCRs are found in a wide range of organisms and are central to a cellular signalling network that regulates many basic physiological processes. They are the focus of a signicant amount of current pharmaceutical research because they play a key role in many diseases. However, their tertiary structures remain largely unsolved. The methods described in this paper use only primary sequence information to make their predictions. We compare a simple nearest neighbor approach (BLAST), methods based on multiple alignments generated by a statistical prole hidden Markov model, and methods, including support vector machines, that transform protein sequences into xed-length feature vectors. Results: The last is the most computationally expensive method, but our experiments show that, for those interested in annotation-quality classication, the results are worth the eort. In two-fold cross-validation experiments testing recognition of GPCR subfamilies that bind a specic ligand (such as a histamine molecule), the errors per sequence at the minimum error point (MEP) were 13.7% for multi-class SVMs, 17.1% for our SVMtree method of hierarchical multi-class SVM classication, 25.5% for BLAST, 30% for prole HMMs, and 49% for classication based on nearest neighbor feature vector (kernNN). The percentage of true positives recognized before the rst false positive was 65% for both SVM methods, 13% for BLAST, 5% for prole HMMs and 4% ...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University