Results 1 - 10
of
373
Predicting Transmembrane Protein Topology with a Hidden Markov Model: Application to Complete Genomes
- J. MOL. BIOL
, 2001
"... ..."
A combined transmembrane topology and signal peptide prediction method
- J. Mol. Biol
, 2004
"... Hidden Markov models (HMMs) have been successfully applied to the tasks of transmembrane protein topology prediction and signal peptide prediction. In this paper we expand upon this work by making use of the more powerful class of dynamic Bayesian networks (DBNs). Our model, Philius, is inspired by ..."
Abstract
-
Cited by 233 (10 self)
- Add to MetaCart
Hidden Markov models (HMMs) have been successfully applied to the tasks of transmembrane protein topology prediction and signal peptide prediction. In this paper we expand upon this work by making use of the more powerful class of dynamic Bayesian networks (DBNs). Our model, Philius, is inspired by a previously published HMM, Phobius, and combines a signal peptide submodel with a transmembrane submodel. We introduce a two-stage DBN decoder that combines the power of posterior decoding with the grammar constraints of Viterbi-style decoding. Philius also provides protein type, segment, and topology confidence metrics to aid in the interpretation of the predictions. We report a relative improvement of 13 % over Phobius in full-topology prediction accuracy on transmembrane proteins, and a sensitivity and specificity of 0.96 in detecting signal peptides. We also show that our confidence metrics correlate well with the observed precision. In addition, we have made predictions on all 6.3 million proteins in the Yeast Resource Center (YRC) database. This large-scale study provides an overall picture of the relative numbers of proteins that include a signal-peptide and/or one or more transmembrane segments as well as a valuable resource for the scientific community. All DBNs are implemented using the Graphical Models Toolkit. Source code for the models described here is available at
Protein structure prediction and analysis using the Robetta server
- Nucleic Acids Res
, 2004
"... The Robetta server ..."
(Show Context)
Mining the Biomedical Literature in the Genomic Era: An Overview
- JOURNAL OF COMPUTATIONAL BIOLOGY
, 2003
"... The past decade has seen a tremendous growth in the amount of experimental and computational biomedical data, specifically in the areas of Genomics and Proteomics. This growth is accompanied by an accelerated increase in the number of biomedical publications discussing the findings. In the last f ..."
Abstract
-
Cited by 132 (5 self)
- Add to MetaCart
The past decade has seen a tremendous growth in the amount of experimental and computational biomedical data, specifically in the areas of Genomics and Proteomics. This growth is accompanied by an accelerated increase in the number of biomedical publications discussing the findings. In the last few years there is a lot of interest within the scientific community in literature-mining tools to help sort through this abundance of literature, and find the nuggets of information most relevant and useful for specific analysis tasks. This paper
Hmmstr: a hidden markov model for local sequence-structure correlations in proteins
- Journal of Molecular Biology
, 2000
"... *Corresponding authors ..."
Global analysis of the general stress response of Bacillus subtilis
- J. Bacteriol
, 2001
"... Gene arrays containing all currently known open reading frames of Bacillus subtilis were used to examine the general stress response of Bacillus. By proteomics, transcriptional analysis, transposon mutagenesis, and consensus promoter-based screening, 75 genes had previously been described as � B-dep ..."
Abstract
-
Cited by 104 (15 self)
- Add to MetaCart
(Show Context)
Gene arrays containing all currently known open reading frames of Bacillus subtilis were used to examine the general stress response of Bacillus. By proteomics, transcriptional analysis, transposon mutagenesis, and consensus promoter-based screening, 75 genes had previously been described as � B-dependent general stress genes. The present gene array-based analysis confirmed 62 of these already known general stress genes and detected 63 additional genes subject to control by the stress sigma factor � B. At least 24 of these 125 � B-dependent genes seemed to be subject to a second, � B-independent stress induction mechanism. Therefore, this transcriptional profiling revealed almost four times as many regulon members as the proteomic approach, but failure of confirmation of all known members of the � B regulon indicates that even this approach has not yet elucidated the entire regulon. Most of the � B-dependent general stress proteins are probably located in the cytoplasm, but 25 contain at least one membrane-spanning domain, and at least 6 proteins appear to be secreted. The functions of most of the newly described genes are still unknown. However, their classification as � B-dependent stress genes argues that their products most likely perform functions in stress management and help to provide the nongrowing cell with multiple stress resistance. A comprehensive screening program analyzing the multiple stress resistance of mutants with mutations in single stress genes is in progress. The first results of this program, showing the diminished salt resistance of yjbC and yjbD mutants compared to that
Classifying G-protein coupled receptors with support vector machines
- Bioinformatics
, 2001
"... Motivation: The enormous amount of protein sequence data uncovered by genome research has increased the demand for computer software that can automate the recognition of new proteins. We discuss the relative merits of various automated methods for recognizing G-protein coupled receptors (GPCRs), a ..."
Abstract
-
Cited by 94 (3 self)
- Add to MetaCart
Motivation: The enormous amount of protein sequence data uncovered by genome research has increased the demand for computer software that can automate the recognition of new proteins. We discuss the relative merits of various automated methods for recognizing G-protein coupled receptors (GPCRs), a superfamily of cell membrane proteins. GPCRs are found in a wide range of organisms and are central to a cellular signalling network that regulates many basic physiological processes. They are the focus of a signicant amount of current pharmaceutical research because they play a key role in many diseases. However, their tertiary structures remain largely unsolved. The methods described in this paper use only primary sequence information to make their predictions. We compare a simple nearest neighbor approach (BLAST), methods based on multiple alignments generated by a statistical prole hidden Markov model, and methods, including support vector machines, that transform protein sequences into xed-length feature vectors. Results: The last is the most computationally expensive method, but our experiments show that, for those interested in annotation-quality classication, the results are worth the eort. In two-fold cross-validation experiments testing recognition of GPCR subfamilies that bind a specic ligand (such as a histamine molecule), the errors per sequence at the minimum error point (MEP) were 13.7% for multi-class SVMs, 17.1% for our SVMtree method of hierarchical multi-class SVM classication, 25.5% for BLAST, 30% for prole HMMs, and 49% for classication based on nearest neighbor feature vector (kernNN). The percentage of true positives recognized before the rst false positive was 65% for both SVM methods, 13% for BLAST, 5% for prole HMMs and 4% ...