## Statistical Significance of Probabilistic Sequence Alignment and Related Local Hidden Markov Models (2001)

Venue: | J. COMP. BIOL |

Citations: | 17 - 2 self |

### BibTeX

@ARTICLE{Yu01statisticalsignificance,

author = {Yi-Kuo Yu and Terence Hwa},

title = {Statistical Significance of Probabilistic Sequence Alignment and Related Local Hidden Markov Models},

journal = {J. COMP. BIOL},

year = {2001},

volume = {8},

number = {3}

}

### Years of Citing Articles

### OpenURL

### Abstract

The score statistics of probabilistic gapped local alignment of random sequences is investigated both analytically and numerically. The full probabilistic algorithm (e.g., the “local” version of maximum-likelihood or hidden Markov model method) is found to have anomalous statistics. A modified “semi-probabilistic” alignment consisting of a hybrid of Smith–Waterman and probabilistic alignment is then proposed and studied in detail. It is predicted that the score statistics of the hybrid algorithm is of the Gumbel universal form, with the key Gumbel parameter l taking on a fixed asymptotic value for a wide variety of scoring systems and parameters. A simple recipe for the computation of the “relative entropy,” and from it the finite size correction to l, is also given. These predictions compare well with direct numerical simulations for sequences of lengths between 100 and 1,000 examined using various PAM substitution scores and affine gap functions. The sensitivity of the hybrid method in the detection of sequence homology is also studied using correlated sequences generated from toy mutation models. It is found to be comparable to that of the Smith–Waterman alignment and significantly better than the Viterbi version of the probabilistic alignment.