Results 1 
4 of
4
Learning String Edit Distance
, 1997
"... In many applications, it is necessary to determine the similarity of two strings. A widelyused notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic mo ..."
Abstract

Cited by 248 (2 self)
 Add to MetaCart
In many applications, it is necessary to determine the similarity of two strings. A widelyused notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string edit distance. Our stochastic model allows us to learn a string edit distance function from a corpus of examples. We illustrate the utility of our approach by applying it to the difficult problem of learning the pronunciation of words in conversational speech. In this application, we learn a string edit distance with nearly one fifth the error rate of the untrained Levenshtein distance. Our approach is applicable to any string classification problem that may be solved using a similarity function against a database of labeled prototypes.
Significantly Lower Entropy Estimates for Natural DNA Sequences
 Journal of Computational Biology
, 1996
"... If DNA were a random string over its alphabet fA; C; G; Tg, an optimal code would assign 2 bits to each nucleotide. We imagine DNA to be a highly ordered, purposeful molecule, and might therefore reasonably expect statistical models of its string representation to produce much lower entropy estima ..."
Abstract

Cited by 47 (1 self)
 Add to MetaCart
If DNA were a random string over its alphabet fA; C; G; Tg, an optimal code would assign 2 bits to each nucleotide. We imagine DNA to be a highly ordered, purposeful molecule, and might therefore reasonably expect statistical models of its string representation to produce much lower entropy estimates. Surprisingly this has not been the case for many natural DNA sequences, including portions of the human genome. We introduce a new statistical model (compression algorithm), the strongest reported to date, for naturally occurring DNA sequences. Conventional techniques code a nucleotide using only slightly fewer bits (1.90) than one obtains by relying only on the frequency statistics of individual nucleotides (1.95). Our method in some cases increases this gap by more than fivefold (1.66) and may lead to better performance in microbiological pattern recognition applications. One of our main contributions, and the principle source of these improvements, is the formal inclusion of inexac...
A General Decomposition Theorem that Extends the BaumWelch and ExpectationMaximization Paradigm to Rational Forms
, 2001
"... We consider the problem of maximizing certain positive rational functions of a form that includes statistical constructs such as conditional mixture densities and conditional hidden Markov models. The wellknown BaumWelch and expectation maximization (EM) algorithms do not apply to rational function ..."
Abstract
 Add to MetaCart
(Show Context)
We consider the problem of maximizing certain positive rational functions of a form that includes statistical constructs such as conditional mixture densities and conditional hidden Markov models. The wellknown BaumWelch and expectation maximization (EM) algorithms do not apply to rational functions and are therefore limited to the simpler maximumlikelihood form of such models. Our main result is a general decomposition theorem that like BaumWelch/EM, breaks up each iteration of the maximization task into independent subproblems that are more easily solved – but applies to rational functions as well. It extends the central inequality of BaumWelch/EM and associated highlevel algorithms to the rational case, and reduces to the standard inequality and algorithms for simpler problems. Keywords: BaumWelch (forward backward algorithm), Expectation Maximization (EM), hidden Markov models (HMM), conditional mixture density estimation, discriminative training, Maximum Mutual Information (MMI) Criterion. 1