Results 1 - 10
of
10
Prediction of complete gene structures in human genomic DNA
- J. Mol. Biol
, 1997
"... The problem of identifying genes in genomic DNA sequences by computational methods has attracted considerable research attention in recent years. From one point of view, the problem is closely ..."
Abstract
-
Cited by 487 (7 self)
- Add to MetaCart
The problem of identifying genes in genomic DNA sequences by computational methods has attracted considerable research attention in recent years. From one point of view, the problem is closely
A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA
, 1996
"... We present a statistical model of genes in DNA. A Generalized Hidden Markov Model (GHMM) provides the framework for describing the grammar of a legal parse of a DNA sequence (Stormo & Haussler 1994). Probabilities are assigned to transitions between states in the GHMM and to the generation of each n ..."
Abstract
-
Cited by 122 (13 self)
- Add to MetaCart
We present a statistical model of genes in DNA. A Generalized Hidden Markov Model (GHMM) provides the framework for describing the grammar of a legal parse of a DNA sequence (Stormo & Haussler 1994). Probabilities are assigned to transitions between states in the GHMM and to the generation of each nucleotide base given a particular state. Machine learning techniques are applied to optimize these probabilities using a standardized training set. Given a new candidate sequence, the best parse is deduced from the model using a dynamic programming algorithm to identify the path through the model with maximum probability. The GHMM is flexible and modular, so new sensors and additional states can be inserted easily. In addition, it provides simple solutions for integrating cardinality constraints, reading frame constraints, "indels", and homology searching. The description and results of an implementation of such a gene-finding model, called Genie, is presented. The exon sensor is a codon fre...
Two methods for improving performance of an HMM and their application for gene finding
, 1997
"... A hidden Markov model for gene finding consists of submodels for coding regions, splice sites, introns, intergenic regions and possibly more. It is described how to estimate the model as a whole from labeled sequences instead of estimating the individual parts independently from subsequences. It is ..."
Abstract
-
Cited by 96 (5 self)
- Add to MetaCart
A hidden Markov model for gene finding consists of submodels for coding regions, splice sites, introns, intergenic regions and possibly more. It is described how to estimate the model as a whole from labeled sequences instead of estimating the individual parts independently from subsequences. It is argued that the standard maximum likelihood estimation criterion is not optimal for training such a model. Instead of maximizing the probability of the DNA sequence, one should maximize the probability of the correct prediction. Such a criterion, called conditional maximum likelihood, is used for the gene finder `HMMgene '. A new (approximative) algorithm is described, which finds the most probable prediction summed over all paths yielding the same prediction. We show that these methods contribute significantly to the high performance of HMMgene. Keywords: Hidden Markov model, gene finding, maximum likelihood, statistical sequence analysis. Introduction As the genome projects evolve autom...
Improved Splice Site Detection in Genie
- J. COMPUT. BIOL
, 1997
"... We present an improved splice site predictor for the genefinding program Genie. Genie is based on a generalized Hidden Markov Model (GHMM) that describes the grammar of a legal parse of a multi-exon gene in a DNA sequence. In Genie, probabilities are estimated for gene features by using dynamic prog ..."
Abstract
-
Cited by 41 (3 self)
- Add to MetaCart
We present an improved splice site predictor for the genefinding program Genie. Genie is based on a generalized Hidden Markov Model (GHMM) that describes the grammar of a legal parse of a multi-exon gene in a DNA sequence. In Genie, probabilities are estimated for gene features by using dynamic programming to combine information from multiple content and signal sensors, including sensors that integrate matches to homologous sequences from a database. One of the hardest problems in genefinding is to determine the complete gene structure correctly. The splice site sensors are the key signal sensors that address this problem. We replaced the existing splice site sensors in Genie with two novel neural networks based on dinucleotide frequencies. Using these novel sensors, Genie shows significant improvements in the sensitivity and specificity of gene structure identification. Experimental results in tests using a standard set of annotated genes showed that Genie identified 86% of coding nuc...
Automatic RNA Secondary Structure Determination with Stochastic Context-Free Grammars
, 1995
"... We have developed a method for predicting the common secondary structure of large RNA multiple alignments using only the information in the alignment. It uses a series of progressively more sensitive searches of the data in an iterative manner to discover regions of base pairing; the first pass exam ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We have developed a method for predicting the common secondary structure of large RNA multiple alignments using only the information in the alignment. It uses a series of progressively more sensitive searches of the data in an iterative manner to discover regions of base pairing; the first pass examines the entire multiple alignment. The searching uses two methods to find base pairings. Mutual information is used to measure covariation between pairs of columns in the multiple alignment and a minimum length encoding method is used to detect column pairs with high potential to base pair. Dynamic programming is used to recover the optimal tree made up of the best potential base pairs and to create a stochastic context-free grammar. The information in the tree guides the next iteration of searching. The method is similar to the traditional comparative sequence analysis technique. The method correctly identifies most of the common secondary structure in 16S and 23S rRNA.
Genie - Gene Finding in Drosophila melanogaster
, 2000
"... this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact. ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
Computational Genefinding
, 1998
"... Introduction Computational methodology for finding genes and other functional sites in genomic DNA has evolved significantly over the last 20 years. Excellent recent surveys have been given by Guig'o [10], Claverie [3], Krogh [14] and others. Among the types of functional sites in genomic DNA that ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Introduction Computational methodology for finding genes and other functional sites in genomic DNA has evolved significantly over the last 20 years. Excellent recent surveys have been given by Guig'o [10], Claverie [3], Krogh [14] and others. Among the types of functional sites in genomic DNA that researchers have sought to recognize are splice sites, start and stop codons, branch points, promoters and terminators of transcription, polyadenylation sites, ribosomal binding sites, topoisomerase II binding sites, topoisomerase I cleavage sites, and various transcription factor binding sites [8]. Local sites such as these are called signals and methods for detecting them may be called signal sensors. Genomic DNA signals can be contrasted with extended and variable length regions such as exons and introns, which are recognized by different methods that may be called content sensors [26]. 2 Signal Sensors The most bas
Stochastic segment interaction models for biological sequence analysis
, 2004
"... We introduce a class of probability models for sequences of random variables with complex long-range dependency structure, called stochastic segment interaction models, motivated by problems arising in the analysis of biopolymer sequence data. We generalize and extend previous work in this area, and ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We introduce a class of probability models for sequences of random variables with complex long-range dependency structure, called stochastic segment interaction models, motivated by problems arising in the analysis of biopolymer sequence data. We generalize and extend previous work in this area, and make explicit the relations to existing literature on hidden Markov models (HMMs) and “generalized ” HMMs. We show that this class of models allows for incorporation of non-local interaction information in biological sequence analysis. We demonstrate this approach by developing models for prediction of 3D contacts in protein sequences using models for amino acid dependencies in β-sheets. We provide algorithms for Bayesian inference on these models via dynamic programming and Markov chain Monte Carlo simulation. Results are presented from an application to protein structure prediction from sequence.
Prediction of Gene-encoding regions in E.Coli DNA using an Optimal Parse Method with Multiple Types of Evidence
"... We implement a variant of the Optimal Parse method described by Stormo and Haussler [17] as a C++ program to recognize geneencoding regions in anonymous E.Coli DNA (called "contigs"). The "parse" produced by the program is a set of nonoverlapping, alternating gene and nongene regions on a given DNA ..."
Abstract
- Add to MetaCart
We implement a variant of the Optimal Parse method described by Stormo and Haussler [17] as a C++ program to recognize geneencoding regions in anonymous E.Coli DNA (called "contigs"). The "parse" produced by the program is a set of nonoverlapping, alternating gene and nongene regions on a given DNA sequence. Gene and nongene regions are recognized using scoring functions which rank each possible subsequence of a contig with a probability representing the subsequence's similarity to a gene or nongene region. By using probability "sensors" which model the statistical content of genes and nongenes, we may build functions that find the "optimal" parse (which is a path through the sequence of alternating gene/nongene regions that has the highest probability). The Optimal Parse method allows us to use multiple types of evidence, such as codon usage, translation initiation site, and start and stop codon usage. Using machine learning to adjust the weights of evidence, we can train the program ...
www.jatit.org INSILICO PROMOTER PREDICTION USING GREY RELATIONAL ANALYSIS
"... In machine learning, multiclass or multi-label classification is the special case within statistical classification of assigning one of several class labels to an input object. The multiclass problem is more complex than binary classification and less researched problem. In biology promoter is the D ..."
Abstract
- Add to MetaCart
In machine learning, multiclass or multi-label classification is the special case within statistical classification of assigning one of several class labels to an input object. The multiclass problem is more complex than binary classification and less researched problem. In biology promoter is the DNA region where the transcription initiation takes place. Reliable recognition of promoter region is essential for understanding biological mechanical of the gene. This study proposes a new approach for predicting the promoter from the DNA sequence based on the modeling of Grey Relational Analysis (GRA). In order to construct a promoter prediction system, GRA approach is developed and applied to the real data set with 2111 samples of promoters and non-promoters of 4 species. The results of the current model are compared to those of traditional ones, logistic regression and back-propagation neural network. The results illustrate that the prediction of the proposed GRA model demonstrates better prediction accuracy than the conventional ones. The current results show that the proposed GRA provides a novel approach in predicting the promoter from a genome.

