@MISC{Eddy04whatis, author = {Sean R. Eddy}, title = {What is a hidden Markov model?}, year = {2004} }
Share
OpenURL
Abstract
Often, problems in biological sequence analysis are just a matter of putting the right label on each residue. In gene identification, we want to label nucleotides as exons, introns, or intergenic sequence. In sequence alignment, we want to associate residues in a query sequence with ho-mologous residues in a target database sequence. We can always write an ad hoc program for any given problem, but the same potentially frustrating issues will always recur. One issue is that we often want to incorporate multiple heterogenous sources of information. A genefinder, for in-stance, ought to combine splice site consenses, codon bias, exon/intron length preferences, and open reading frame analysis all in one scoring system. How should all those parameters be set? How should different kinds of information be weighted? A second issue is being able to interpret results probabilistically. Finding a best scoring answer is one thing, but what does the score mean, and how confident are we that the best answer, or any given part of it, is correct? A third issue is extensibility. The moment we perfect our ad hoc genefinder, we wish we had also modeled translational initiation consensus, alternative splicing, and a polyadenylation signal. All too often, piling more reality onto a fragile ad hoc program makes it collapse under its own weight. Hidden Markov models (HMMs) are a formal foundation for making probabilistic models of