Results 1 -
9 of
9
Hidden Markov models for detecting remote protein homologies
- Bioinformatics
, 1998
"... A new hidden Markov model method (SAM-T98) for nding remote homologs of protein sequences is described and evaluated. The method begins with a single target sequence and iteratively builds a hidden Markov model (hmm) from the sequence and homologs found using the hmm for database search. SAM-T98 is ..."
Abstract
-
Cited by 231 (12 self)
- Add to MetaCart
A new hidden Markov model method (SAM-T98) for nding remote homologs of protein sequences is described and evaluated. The method begins with a single target sequence and iteratively builds a hidden Markov model (hmm) from the sequence and homologs found using the hmm for database search. SAM-T98 is also used to construct model libraries automatically from sequences in structural databases. We evaluate the SAM-T98 method with four datasets. Three of the test sets are fold-recognition tests, where the correct answers are determined by structural similarity. The fourth uses a curated database. The method is compared against wu-blastp and against double-blast, a two-step method similar to ISS, but using blast instead of fasta. Results SAM-T98 had the fewest errors in all tests| dramatically so for the fold-recognition tests. At the minimum-error point on the SCOP-domains test, SAM-T98 got 880 true positives and 68 false positives, double-blast got 533 true positives with 71 false positives, and wu-blastp got 353 true positives with 24 false positives. The method is optimized to recognize superfamilies, and would require parameter adjustment to be used to nd family or fold relationships. One key to the performance of the hmm method is a new score-normalization technique that compares the score to the score with a reversed model rather than to a uniform null model. Availability A World Wide Web server, as well as information on obtaining the Sequence Alignment and PREPRINT to appear in Bioinformatics, 1999
Engineering Support Vector Machine Kernels That Recognize Translation Initiation Sites
, 2000
"... Motivation: In order to extract protein sequences from nucleotide sequences, it is an important step to recognize points at which regions start that code for proteins. These points are called translation initiation sites (TIS). Results: The task of finding TIS can be modeled as a classification pro ..."
Abstract
-
Cited by 80 (13 self)
- Add to MetaCart
Motivation: In order to extract protein sequences from nucleotide sequences, it is an important step to recognize points at which regions start that code for proteins. These points are called translation initiation sites (TIS). Results: The task of finding TIS can be modeled as a classification problem. We demonstrate the applicability of support vector machines (SVMs) for this task, and show how to incorporate prior biological knowledge by engineering an appropriate kernel function. With the described techniques the recognition performance can be improved by 26% over leading existing approaches. We provide evidence that existing related methods (e.g. ESTScan) could profit from advanced TIS recognition.
A novel class of RanGTP binding proteins
- J. Cell
, 1997
"... Abstract. The importin-�/ � complex and the GTPase Ran mediate nuclear import of proteins with a classical nuclear localization signal. Although Ran has been implicated also in a variety of other processes, such as cell cycle progression, a direct function of Ran has so far only been demonstrated fo ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
Abstract. The importin-�/ � complex and the GTPase Ran mediate nuclear import of proteins with a classical nuclear localization signal. Although Ran has been implicated also in a variety of other processes, such as cell cycle progression, a direct function of Ran has so far only been demonstrated for importin-mediated nuclear import. We have now identified an entire class of �20 potential Ran targets that share a sequence motif related to the Ran-binding site of importin-�. We have confirmed specific RanGTP binding for some of them, namely for two novel factors, RanBP7 and RanBP8, for CAS, Pse1p, and Msn5p, and for the cell cycle regulator Cse1p from Saccharomyces cerevisiae. We have studied RanBP7 in more detail. Similar to importin-�, it prevents the activation of Ran’s GTPase by RanGAP1 and
Computational Genefinding
, 1998
"... Introduction Computational methodology for finding genes and other functional sites in genomic DNA has evolved significantly over the last 20 years. Excellent recent surveys have been given by Guig'o [10], Claverie [3], Krogh [14] and others. Among the types of functional sites in genomic DNA that ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Introduction Computational methodology for finding genes and other functional sites in genomic DNA has evolved significantly over the last 20 years. Excellent recent surveys have been given by Guig'o [10], Claverie [3], Krogh [14] and others. Among the types of functional sites in genomic DNA that researchers have sought to recognize are splice sites, start and stop codons, branch points, promoters and terminators of transcription, polyadenylation sites, ribosomal binding sites, topoisomerase II binding sites, topoisomerase I cleavage sites, and various transcription factor binding sites [8]. Local sites such as these are called signals and methods for detecting them may be called signal sensors. Genomic DNA signals can be contrasted with extended and variable length regions such as exons and introns, which are recognized by different methods that may be called content sensors [26]. 2 Signal Sensors The most bas
FRAGS: estimation of coding sequence substitution rates from fragmentary data
- BMC Bioinformatics
, 2004
"... ..."
Bioinformatics
, 2003
"... Selection of significant genes via expression patterns is an important problem in microarray experiments. Owing to small sample size and the large number of variables (genes), the selection process can be unstable. This paper proposes a hierarchical Bayesian model for gene (variable) selection. We e ..."
Abstract
- Add to MetaCart
Selection of significant genes via expression patterns is an important problem in microarray experiments. Owing to small sample size and the large number of variables (genes), the selection process can be unstable. This paper proposes a hierarchical Bayesian model for gene (variable) selection. We employ latent variables to specialize the model to a regression setting and uses a Bayesian mixture prior to perform the variable selection. We control the size of the model by assigning a prior distribution over the dimension (number of significant genes) of the model. The posterior distributions of the parameters are not in explicit form and we need to use a combination of truncated sampling and Markov Chain Monte Carlo (MCMC) based computation techniques to simulate the parameters from the posteriors. The Bayesian model is flexible enough to identify significant genes as well as to perform future predictions. The method is applied to cancer classification via cDNA microarrays where the genes BRCA1 and BRCA2 are associated with a hereditary disposition to breast cancer, and the method is used to identify a set of significant genes. The method is also applied successfully to the leukemia data.
A Re-annotation of the Saccharomyces cerevisiae Genome
"... Discrepancies in gene and orphan number indicated by previous analyses suggest that S. cerevisiae would benefit from a consistent re-annotation. In this analysis three new genes are identified and 46 alterations to gene coordinates are described. 370 ORFs are defined as totally spurious ORFs which s ..."
Abstract
- Add to MetaCart
Discrepancies in gene and orphan number indicated by previous analyses suggest that S. cerevisiae would benefit from a consistent re-annotation. In this analysis three new genes are identified and 46 alterations to gene coordinates are described. 370 ORFs are defined as totally spurious ORFs which should be disregarded. At least a further 193 genes could be described as very hypothetical, based on a number of criteria. It was found that disparate genes with sequence overlaps over ten amino acids (especially at the N-terminus) are rare in both S. cerevisiae and Sz. pombe. A new S. cerevisiae gene number estimate with an upper limit of 5804 is proposed, but after the removal of very hypothetical genes and pseudogenes this is reduced to 5570. Although this is likely to be closer to the true upper limit, it is still predicted to be an overestimate of gene number. A complete list of revised gene coordinates is available from the Sanger Centre (S. cerevisiae reannotation: ftp://ftp/pub/yeast/SCreannotation). Copyright # 2001 John Wiley & Sons, Ltd. Keywords: annotation; Schizosaccharomyces pombe; Sacccharomyces cerevisiae; comparative genomics; sequence orphans; hypothetical proteins
RESEARCH ARTICLE Open Access
"... Controversies in modern evolutionary biology: the imperative for error detection and quality control ..."
Abstract
- Add to MetaCart
Controversies in modern evolutionary biology: the imperative for error detection and quality control
BMC Genomics BioMed Central
"... Research article Insights into the Musa genome: Syntenic relationships to rice and ..."
Abstract
- Add to MetaCart
Research article Insights into the Musa genome: Syntenic relationships to rice and

