Results 1 - 10
of
766
Predicting Transmembrane Protein Topology with a Hidden Markov Model: Application to Complete Genomes
- J. MOL. BIOL
, 2001
"... ..."
(Show Context)
Improved prediction of signal peptides -- SignalP 3.0
- J. MOL. BIOL.
, 2004
"... We describe improvements of the currently most popular method for prediction of classically secreted proteins, SignalP. SignalP consists of two different predictors based on neural network and hidden Markov model algorithms, where both components have been updated. Motivated by the idea that the cle ..."
Abstract
-
Cited by 655 (7 self)
- Add to MetaCart
We describe improvements of the currently most popular method for prediction of classically secreted proteins, SignalP. SignalP consists of two different predictors based on neural network and hidden Markov model algorithms, where both components have been updated. Motivated by the idea that the cleavage site position and the amino acid composition of the signal peptide are correlated, new features have been included as input to the neural network. This addition, combined with a thorough error-correction of a new data set, have improved the performance of the predictor significantly over SignalP version 2. In version 3, correctness of the cleavage site predictions have increased notably for all three organism groups, eukaryotes, Gram-negative and Grampositive bacteria. The accuracy of cleavage site prediction has increased in the range from 6-17 % over the previous version, whereas the signal peptide discrimination improvement is mainly due to the elimination of false positive predictions, as well as the introduction of a new discrimination score for the neural network. The new method has also been benchmarked against other available methods. Predictions can be made at the publicly available web server
A combined transmembrane topology and signal peptide prediction method
- J. Mol. Biol
, 2004
"... Hidden Markov models (HMMs) have been successfully applied to the tasks of transmembrane protein topology prediction and signal peptide prediction. In this paper we expand upon this work by making use of the more powerful class of dynamic Bayesian networks (DBNs). Our model, Philius, is inspired by ..."
Abstract
-
Cited by 233 (10 self)
- Add to MetaCart
Hidden Markov models (HMMs) have been successfully applied to the tasks of transmembrane protein topology prediction and signal peptide prediction. In this paper we expand upon this work by making use of the more powerful class of dynamic Bayesian networks (DBNs). Our model, Philius, is inspired by a previously published HMM, Phobius, and combines a signal peptide submodel with a transmembrane submodel. We introduce a two-stage DBN decoder that combines the power of posterior decoding with the grammar constraints of Viterbi-style decoding. Philius also provides protein type, segment, and topology confidence metrics to aid in the interpretation of the predictions. We report a relative improvement of 13 % over Phobius in full-topology prediction accuracy on transmembrane proteins, and a sensitivity and specificity of 0.96 in detecting signal peptides. We also show that our confidence metrics correlate well with the observed precision. In addition, we have made predictions on all 6.3 million proteins in the Yeast Resource Center (YRC) database. This large-scale study provides an overall picture of the relative numbers of proteins that include a signal-peptide and/or one or more transmembrane segments as well as a valuable resource for the scientific community. All DBNs are implemented using the Graphical Models Toolkit. Source code for the models described here is available at
Prediction of signal peptides and signal anchors by a hidden Markov model
- Proc. Int. Conf. Intell. Syst. Mol. Biol
, 1998
"... A hidden Markov model of signal peptides has been developed. It contains submodels for the N-terminal part, the hydrophobic region, and the region around the cleavage site. For known signal peptides, the model can be used to assign objective boundaries between these three regions. Applied to our dat ..."
Abstract
-
Cited by 156 (10 self)
- Add to MetaCart
A hidden Markov model of signal peptides has been developed. It contains submodels for the N-terminal part, the hydrophobic region, and the region around the cleavage site. For known signal peptides, the model can be used to assign objective boundaries between these three regions. Applied to our data, the length distributions for the three regions are significantly different from expectations. For instance, the assigned hydrophobic region is between 8 and 12 residues long in almost all eukaryotic signal peptides. This analysis also makes obvious the difference between eukaryotes, Gram-positive bacteria, and Gram-negative bacteria. The model can be used to predict the location of the cleavage site, which it finds correctly in nearly 70 % of signal peptides in a cross-validated test—almost the same accuracy as the best previous method. One of the problems for existing prediction methods is the poor discrimination between signal peptides and uncleaved signal anchors, but this is substantially improved by the hidden Markov model when expanding it with a very simple signal anchor model.
Prediction, conservation analysis and structural characterization of mammalian mucin-type O-glycosylation sites
- Glycobiology
, 2005
"... O-GalNAc-glycosylation is one of the main types of glycosy-lation in mammalian cells. No consensus recognition sequence for the O-glycosyltransferases is known, making prediction methods necessary to bridge the gap between the large num-ber of known protein sequences and the small number of proteins ..."
Abstract
-
Cited by 126 (7 self)
- Add to MetaCart
O-GalNAc-glycosylation is one of the main types of glycosy-lation in mammalian cells. No consensus recognition sequence for the O-glycosyltransferases is known, making prediction methods necessary to bridge the gap between the large num-ber of known protein sequences and the small number of proteins experimentally investigated with regard to glycosy-lation status. From O-GLYCBASE a total of 86 mammalian proteins experimentally investigated for in vivo O-GalNAc sites were extracted. Mammalian protein homolog compar-isons showed that a glycosylated serine or threonine is less likely to be precisely conserved than a nonglycosylated one. The Protein Data Bank was analyzed for structural informa-tion, and 12 glycosylated structures were obtained. All posi-tive sites were found in coil or turn regions. A method for predicting the location for mucin-type glycosylation sites was trained using a neural network approach. The best overall network used as input amino acid composition, averaged surface accessibility predictions together with substitution matrix profile encoding of the sequence. To improve predic-tion on isolated (single) sites, networks were trained on iso-lated sites only. The final method combines predictions from the best overall network and the best isolated site network; this prediction method correctly predicted 76 % of the glyco-sylated residues and 93 % of the nonglycosylated residues. NetOGlyc 3.1 can predict sites for completely new proteins without losing its performance. The fact that the sites could be predicted from averaged properties together with the fact that glycosylation sites are not precisely conserved indicates that mucin-type glycosylation in most cases is a bulk property and not a very site-specific one. NetOGlyc 3.1 is made available at www.cbs.dtu.dk/services/netoglyc. Key words: machine learning/mucin-type/neural networks/ O-glycosylation/prediction
et al. Gene expression patterns in human liver cancers
- Mol Biol Cell
"... Hepatocellular carcinoma (HCC) is a leading cause of death worldwide. Using cDNA microarrays to characterize patterns of gene expression in HCC, we found consistent differences between the expression patterns in HCC compared with those seen in nontumor liver tissues. The expression patterns in HCC w ..."
Abstract
-
Cited by 116 (4 self)
- Add to MetaCart
Hepatocellular carcinoma (HCC) is a leading cause of death worldwide. Using cDNA microarrays to characterize patterns of gene expression in HCC, we found consistent differences between the expression patterns in HCC compared with those seen in nontumor liver tissues. The expression patterns in HCC were also readily distinguished from those associated with tumors metastatic to liver. The global gene expression patterns intrinsic to each tumor were sufficiently distinctive that multiple tumor nodules from the same patient could usually be recognized and distinguished from all the others in the large sample set on the basis of their gene expression patterns alone. The distinctive gene expression patterns are characteristic of the tumors and not the patient; the expression programs seen in clonally independent tumor nodules in the same patient were no more similar than those in tumors from different patients. Moreover, clonally related tumor masses that showed distinct expression profiles were also distinguished by genotypic differences. Some features of the gene expression patterns were associated with specific phenotypic and genotypic characteristics of the tumors, including growth rate, vascular invasion, and p53 overexpression.
Global analysis of the general stress response of Bacillus subtilis
- J. Bacteriol
, 2001
"... Gene arrays containing all currently known open reading frames of Bacillus subtilis were used to examine the general stress response of Bacillus. By proteomics, transcriptional analysis, transposon mutagenesis, and consensus promoter-based screening, 75 genes had previously been described as � B-dep ..."
Abstract
-
Cited by 102 (15 self)
- Add to MetaCart
(Show Context)
Gene arrays containing all currently known open reading frames of Bacillus subtilis were used to examine the general stress response of Bacillus. By proteomics, transcriptional analysis, transposon mutagenesis, and consensus promoter-based screening, 75 genes had previously been described as � B-dependent general stress genes. The present gene array-based analysis confirmed 62 of these already known general stress genes and detected 63 additional genes subject to control by the stress sigma factor � B. At least 24 of these 125 � B-dependent genes seemed to be subject to a second, � B-independent stress induction mechanism. Therefore, this transcriptional profiling revealed almost four times as many regulon members as the proteomic approach, but failure of confirmation of all known members of the � B regulon indicates that even this approach has not yet elucidated the entire regulon. Most of the � B-dependent general stress proteins are probably located in the cytoplasm, but 25 contain at least one membrane-spanning domain, and at least 6 proteins appear to be secreted. The functions of most of the newly described genes are still unknown. However, their classification as � B-dependent stress genes argues that their products most likely perform functions in stress management and help to provide the nongrowing cell with multiple stress resistance. A comprehensive screening program analyzing the multiple stress resistance of mutants with mutations in single stress genes is in progress. The first results of this program, showing the diminished salt resistance of yjbC and yjbD mutants compared to that
Prediction of lipoprotein signal peptides in Gram-negative bacteria. Protein Sci
, 2003
"... A method to predict lipoprotein signal peptides in Gram-negative Eubacteria, LipoP, has been developed. The hidden Markov model (HMM) was able to distinguish between lipoproteins (SPaseII-cleaved proteins), SPaseI-cleaved proteins, cytoplasmic proteins, and transmembrane proteins. This predictor was ..."
Abstract
-
Cited by 96 (0 self)
- Add to MetaCart
A method to predict lipoprotein signal peptides in Gram-negative Eubacteria, LipoP, has been developed. The hidden Markov model (HMM) was able to distinguish between lipoproteins (SPaseII-cleaved proteins), SPaseI-cleaved proteins, cytoplasmic proteins, and transmembrane proteins. This predictor was able to predict 96.8 % of the lipoproteins correctly with only 0.3 % false positives in a set of SPaseI-cleaved, cytoplasmic, and transmembrane proteins. The results obtained were significantly better than those of previously developed methods. Even though Gram-positive lipoprotein signal peptides differ from Gramnegatives, the HMM was able to identify 92.9 % of the lipoproteins included in a Gram-positive test set. A genome search was carried out for 12 Gram-negative genomes and one Gram-positive genome. The results for Escherichia coli K12 were compared with new experimental data, and the predictions by the HMM agree well with the experimentally verified lipoproteins. A neural network-based predictor was developed for comparison, and it gave very similar results. LipoP is available as a Web server at www.cbs.dtu.dk/ services/LipoP/.