Results 1  10
of
27
Learning String Edit Distance
, 1997
"... In many applications, it is necessary to determine the similarity of two strings. A widelyused notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic mo ..."
Abstract

Cited by 193 (2 self)
 Add to MetaCart
In many applications, it is necessary to determine the similarity of two strings. A widelyused notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string edit distance. Our stochastic model allows us to learn a string edit distance function from a corpus of examples. We illustrate the utility of our approach by applying it to the difficult problem of learning the pronunciation of words in conversational speech. In this application, we learn a string edit distance with nearly one fifth the error rate of the untrained Levenshtein distance. Our approach is applicable to any string classification problem that may be solved using a similarity function against a database of labeled prototypes.
Hidden Markov processes
 IEEE Trans. Inform. Theory
, 2002
"... Abstract—An overview of statistical and informationtheoretic aspects of hidden Markov processes (HMPs) is presented. An HMP is a discretetime finitestate homogeneous Markov chain observed through a discretetime memoryless invariant channel. In recent years, the work of Baum and Petrie on finite ..."
Abstract

Cited by 170 (3 self)
 Add to MetaCart
Abstract—An overview of statistical and informationtheoretic aspects of hidden Markov processes (HMPs) is presented. An HMP is a discretetime finitestate homogeneous Markov chain observed through a discretetime memoryless invariant channel. In recent years, the work of Baum and Petrie on finitestate finitealphabet HMPs was expanded to HMPs with finite as well as continuous state spaces and a general alphabet. In particular, statistical properties and ergodic theorems for relative entropy densities of HMPs were developed. Consistency and asymptotic normality of the maximumlikelihood (ML) parameter estimator were proved under some mild conditions. Similar results were established for switching autoregressive processes. These processes generalize HMPs. New algorithms were developed for estimating the state, parameter, and order of an HMP, for universal coding and classification of HMPs, and for universal decoding of hidden Markov channels. These and other related topics are reviewed in this paper. Index Terms—Baum–Petrie algorithm, entropy ergodic theorems, finitestate channels, hidden Markov models, identifiability, Kalman filter, maximumlikelihood (ML) estimation, order estimation, recursive parameter estimation, switching autoregressive processes, Ziv inequality. I.
Dynamic Programming Search for Continuous Speech Recognition
, 1999
"... Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery efficient and practical pruning str ..."
Abstract

Cited by 48 (1 self)
 Add to MetaCart
Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery efficient and practical pruning strategy so that very large search spaces can be handled. Second, the dynamic programming strategy has turned out to be extremely flexible in adapting to new requirements. Examples of such requirements are the lexical tree organization of the pronunciation lexicon and the generation of a word graph instead of the single best sentence. In this paper, we attempt to systematically review the use of dynamic programming search strategies for smallvocabulary and largevocabulary continuous speech recognition. The following methods are described in detail: search using a linear lexicon, search using a lexical tree, languagemodel lookahead and word graph generation.
Efficient Search Using Posterior Phone Probability Estimates
 In Proc. ICASSP
, 1995
"... In this paper we present a novel, efficient search strategy for large vocabulary continuous speech recognition (LVCSR). The search algorithm, based on stack decoding, uses posterior phone probability estimates to substantially increase its efficiency with minimal effect on accuracy. In particular, t ..."
Abstract

Cited by 38 (9 self)
 Add to MetaCart
In this paper we present a novel, efficient search strategy for large vocabulary continuous speech recognition (LVCSR). The search algorithm, based on stack decoding, uses posterior phone probability estimates to substantially increase its efficiency with minimal effect on accuracy. In particular, the search space is dramatically reduced by phone deactivation pruning where phones with a small local posterior probability are deactivated. This approach is particularly wellsuited to hybrid connectionist/hidden Markov model systems because posterior phone probabilities are directly computed by the acoustic model. On large vocabulary tasks, using a trigram language model, this increased the search speed by an order of magnitude, with 2% or less relative search error. Results from a hybrid system are presented using the Wall Street Journal LVCSR database for a 20,000 word task using a backedoff trigram languagemodel. For this task, our singlepass decodertook around 15× realtime on an HP73...
Decoder Technology For Connectionist Large Vocabulary Speech Recognition
, 1995
"... The search problem in large vocabulary continuous speech recognition (LVCSR) is to locate the most probable string of words for a spoken utterance given the acoustic signal and a set of sentence models. Searching the space of possible utterances is difficult because of the large vocabulary size and ..."
Abstract

Cited by 24 (4 self)
 Add to MetaCart
The search problem in large vocabulary continuous speech recognition (LVCSR) is to locate the most probable string of words for a spoken utterance given the acoustic signal and a set of sentence models. Searching the space of possible utterances is difficult because of the large vocabulary size and the complexity imposed when longspan language models are used. This report describes an efficient search procedure and its software embodiment in a decoder, NOWAY, which has been incorporated in ABBOT, a hybrid connectionist/ hidden Markov model (HMM) LVCSR system [15]. The search algorithm is based on stack decoding and uses both likelihood and posteriorbased pruning. The use of the posteriorbased phone deactivation pruning techniques is wellsuited to hybrid connectionist/HMM systems because posterior phone probabilities are directly computed by the connectionist acoustic model. The singlepass decoder has been evaluate on the large vocabulary North American Business News task using a...
Startsynchronous search for large vocabulary continuous speech recognition
 IEEE Trans. Speech and Audio Processing
"... Abstract — In this paper, we present a novel, efficient search strategy for large vocabulary continuous speech recognition. The search algorithm, based on a stack decoder framework, utilizes phonelevel posterior probability estimates (produced by a connectionist/hidden Markov model acoustic model) ..."
Abstract

Cited by 20 (10 self)
 Add to MetaCart
Abstract — In this paper, we present a novel, efficient search strategy for large vocabulary continuous speech recognition. The search algorithm, based on a stack decoder framework, utilizes phonelevel posterior probability estimates (produced by a connectionist/hidden Markov model acoustic model) as a basis for phone deactivation pruning—a highly efficient method of reducing the required computation. The singlepass algorithm is naturally factored into the timeasynchronous processing of the word sequence and the timesynchronous processing of the hidden Markov model state sequence. This enables the search to be decoupled from the language model while still maintaining the computational benefits of timesynchronous processing. The incorporation of the language model in the search is discussed and computationally cheap approximations to the full language model are introduced. Experiments were performed on the North American Business News task using a 60 000 word vocabulary and a trigram language model. Results indicate that the computational cost of the search may be reduced by more than a factor of 40 with a relative search error of less than 2 % using the techniques discussed in the paper. Index Terms — Hidden Markov model, large vocabulary continuous speech recognition, phone deactivation pruning, search, stack decoding. I.
An effective algorithm for string correction using generalized edit distancesIII. Computational complexity of Xhe algorithm and some app~cations Infor~tion Sci
"... This paper deals with the problem of estimating a transmitted string X, from the corresponding received string Y, which is a noisy version of X,. We assume that Y contains*any number of substitution, insertion, and deletion errors, and that no two consecutive symbols of X, were deleted in transmissi ..."
Abstract

Cited by 18 (10 self)
 Add to MetaCart
This paper deals with the problem of estimating a transmitted string X, from the corresponding received string Y, which is a noisy version of X,. We assume that Y contains*any number of substitution, insertion, and deletion errors, and that no two consecutive symbols of X, were deleted in transmission. We have shown that for channels which cause independent errors, and whose error probabilities exceed those of noisy strings studied in the literature [ 121, at least 99.5 % of the erroneous strings will not contain two consecutive deletion errors. The best estimate X * of X, is defined as that element of H which minimizes the generalized Levenshtein distance D ( X/Y) between X and Y. Using dynamic programming principles, an algorithm is presented which yields X+ without computing individually the distances between every word of H and Y. Though this algorithm requires more memory, it can be shown that it is, in general, computationally less complex than all other existing algorithms which perform the same task. I.
A dynamic Bayesian framework to model context and memory in edit distance learning: An application to pronunciation classification
 In Proc. ACL
, 2005
"... Sitting at the intersection between statistics and machine learning, Dynamic Bayesian Networks have been applied with much success in many domains, such as speech recognition, vision, and computational biology. While Natural Language Processing increasingly relies on statistical methods, we think th ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
Sitting at the intersection between statistics and machine learning, Dynamic Bayesian Networks have been applied with much success in many domains, such as speech recognition, vision, and computational biology. While Natural Language Processing increasingly relies on statistical methods, we think they have yet to use Graphical Models to their full potential. In this paper, we report on experiments in learning edit distance costs using Dynamic Bayesian Networks and present results on a pronunciation classification task. By exploiting the ability within the DBN framework to rapidly explore a large model space, we obtain a 40 % reduction in error rate compared to a previous transducerbased method of learning edit distance. 1
TaskSpecific Minimum BayesRisk Decoding using Learned Edit Distance
 IN: PROCEEDINGS OF ICSLP 2004, JEJU ISLAND, SOUTH KOREA
, 2004
"... This paper extends the minimum Bayesrisk framework to incorporate a loss function specific to the task and the ASR system. The errors are modeled as a noisy channel and the parameters are learned from the data. The resulting loss function is used in the risk criterion for decoding. Experiments on a ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
This paper extends the minimum Bayesrisk framework to incorporate a loss function specific to the task and the ASR system. The errors are modeled as a noisy channel and the parameters are learned from the data. The resulting loss function is used in the risk criterion for decoding. Experiments on a large vocabulary conversational speech recognition system demonstrate significant gains of about 1% absolute over MAP hypothesis and about 0.6% absolute over untrained loss function. The approach is general enough to be applicable to other sequence recognition problems such as in Optical Character Recognition (OCR) and in analysis of biological sequences.
Efficient ErrorCorrecting Viterbi Parsing
, 1998
"... The problem of ErrorCorrecting Parsing (ECP) using an insertiondeletion substitution error model and a Finite State Machine is examined. The Viterbi algorithm can be straightforwardly extended to perform ECP, though the resulting computational complexity can become prohibitive for many applica ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
The problem of ErrorCorrecting Parsing (ECP) using an insertiondeletion substitution error model and a Finite State Machine is examined. The Viterbi algorithm can be straightforwardly extended to perform ECP, though the resulting computational complexity can become prohibitive for many applications.