MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

A Hidden Markov Model that finds genes in E. coli DNA (1994)

by E. Coli Dna ,  Anders Krogh ,  I. Saira Mian ,  David Haussler ,  Em Algorithm
Add To MetaCart

Abstract:

A hidden Markov model (HMM) has been developed to find protein coding genes in E. coli DNA using E. coli genome DNA sequence from the EcoSeq6 database maintained by Kenn Rudd. This HMM includes states that model the codons and their frequencies in E. coli genes, as well as the patterns found in the intergenic region, including repetitive extragenic palindromic sequences and the Shine-Delgarno motif. To account for potential sequencing errors and or frameshifts in raw genomic DNA sequence, it allows for the (very unlikely) possiblity of insertions and deletions of individual nucleotides within a codon. The parameters of the HMM are estimated using approximately one million nucleotides of annotated DNA in EcoSeq6 and the model tested on a disjoint set of contigs containing about 325,000 nucleotides. The HMM finds the exact locations of about 80% of the known E. coli genes, and approximate locations for about 10%. It also finds several potentially new genes, and locates several places wer...

Citations

32 String Variable Grammars: a logic grammar formalism for dna sequences – Searls - 1993
29 Application of neural networks and other machine learning algorithms to dna sequence analysis – Lapedes, Barnes, et al. - 1989
6 Massively parallel biosequence analysis – Hughey - 1993
4 Nucleic Acids Res – Fickett - 1982
2 Nucleic Acids Res – Rudd, Miller, et al. - 1991
2 Alignment of E. coli DNA sequences to a revised, integrated genomic restriction map – Rudd, Miller - 1992
2 Nature 356 – Sulston, Du, et al. - 1992
2 Protein classification by nonlinear optimal filtering of amino-acid sequences Unpublished manuscript – White, Stultz, et al. - 1991
1 Dec 15 – Uberbacher, Mural