Results 1 - 10
of
37
From HMM's to Segment Models: A Unified View of Stochastic Modeling for Speech Recognition
, 1996
"... ..."
An Application of Recurrent Nets to Phone Probability Estimation
- IEEE Transactions on Neural Networks
, 1994
"... This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed ..."
Abstract
-
Cited by 165 (8 self)
- Add to MetaCart
This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed
Efficient Search Using Posterior Phone Probability Estimates
- In Proc. ICASSP
, 1995
"... In this paper we present a novel, efficient search strategy for large vocabulary continuous speech recognition (LVCSR). The search algorithm, based on stack decoding, uses posterior phone probability estimates to substantially increase its efficiency with minimal effect on accuracy. In particular, t ..."
Abstract
-
Cited by 30 (8 self)
- Add to MetaCart
In this paper we present a novel, efficient search strategy for large vocabulary continuous speech recognition (LVCSR). The search algorithm, based on stack decoding, uses posterior phone probability estimates to substantially increase its efficiency with minimal effect on accuracy. In particular, the search space is dramatically reduced by phone deactivation pruning where phones with a small local posterior probability are deactivated. This approach is particularly well-suited to hybrid connectionist/hidden Markov model systems because posterior phone probabilities are directly computed by the acoustic model. On large vocabulary tasks, using a trigram language model, this increased the search speed by an order of magnitude, with 2% or less relative search error. Results from a hybrid system are presented using the Wall Street Journal LVCSR database for a 20,000 word task using a backed-off trigram languagemodel. For this task, our single-pass decodertook around 15× realtime on an HP73...
Hybrid HMM/ANN Systems for Speech Recognition: Overview and New Research Directions
- in Adaptive Processing of Sequences and Data Structures, ser. Lecture Notes in Artificial Intelligence (1387
, 1998
"... ..."
Decoder Technology For Connectionist Large Vocabulary Speech Recognition
, 1995
"... The search problem in large vocabulary continuous speech recognition (LVCSR) is to locate the most probable string of words for a spoken utterance given the acoustic signal and a set of sentence models. Searching the space of possible utterances is difficult because of the large vocabulary size and ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
The search problem in large vocabulary continuous speech recognition (LVCSR) is to locate the most probable string of words for a spoken utterance given the acoustic signal and a set of sentence models. Searching the space of possible utterances is difficult because of the large vocabulary size and the complexity imposed when long-span language models are used. This report describes an efficient search procedure and its software embodiment in a decoder, NOWAY, which has been incorporated in ABBOT, a hybrid connectionist/ hidden Markov model (HMM) LVCSR system [15]. The search algorithm is based on stack decoding and uses both likelihood- and posterior-based pruning. The use of the posterior-based phone deactivation pruning techniques is well-suited to hybrid connectionist/HMM systems because posterior phone probabilities are directly computed by the connectionist acoustic model. The single-pass decoder has been evaluate on the large vocabulary North American Business News task using a...
Estimation of Global Posteriors and Forward-Backward Training of Hybrid HMM/ANN Systems
- in Proc. Europ. Conf. Speech Communication and Technology
"... The results of our research presented in this paper are two-fold. First, an estimation of global posteriors is formalized in the framework of hybrid HMM/ANN systems. It is shown that hybrid HMM/ANN systems, in which the ANN part estimates local posteriors, can be used to modelize global model poster ..."
Abstract
-
Cited by 23 (14 self)
- Add to MetaCart
The results of our research presented in this paper are two-fold. First, an estimation of global posteriors is formalized in the framework of hybrid HMM/ANN systems. It is shown that hybrid HMM/ANN systems, in which the ANN part estimates local posteriors, can be used to modelize global model posteriors. This formalization provides us with a clear theory in which both REMAP and "classical" Viterbi trained hybrid systems are unified. Second, a new forwardbackward training of hybrid HMM/ANN systems is derived from the previous formulation. Comparisons of performance between Viterbi and forward- backward hybrid systems are presented and discussed. 1. INTRODUCTION In [1, 2] it was shown that it is possible to express the global posterior probability P (M jX; \Theta) of a model (stochastic finite state acceptor) M given the acoustic data X and the parameters \Theta in terms of the local posteriors (conditional transition probabilities) P (q n l jq n\Gamma1 k ; xn ; \Theta) (where q n ...
Using Self-Organizing Maps and Learning Vector Quantization for Mixture Density Hidden Markov Models
, 1997
"... This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the col ..."
Abstract
-
Cited by 19 (8 self)
- Add to MetaCart
This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the collected speech samples is a difficult task because of the natural variation in the speech. Two neural computing paradigms, the Self-Organizing Map (SOM) and the Learning Vector Quantization (LVQ) are used in the experiments to improve the recognition performance of the models. A HMM consists of sequential states which are trained to model the feature changes in the signal produced during the modeled process. The output densities applied in this work are mixtures of Gaussian density functions. SOMs are applied to initialize and train the mixtures to give a smooth and faithful presentation of the feature vector space defined by the corresponding training samples. The SOM maps similar feature vect...
Hidden Markov Models and Neural Networks for Speech Recognition
, 1998
"... The Hidden Markov Model (HMMs) is one of the most successful modeling approaches for acoustic events in speech recognition, and more recently it has proven useful for several problems in biological sequence analysis. Although the HMM is good at capturing the temporal nature of processes such as spee ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
The Hidden Markov Model (HMMs) is one of the most successful modeling approaches for acoustic events in speech recognition, and more recently it has proven useful for several problems in biological sequence analysis. Although the HMM is good at capturing the temporal nature of processes such as speech, it has a very limited capacity for recognizing complex patterns involving more than first order dependencies in the observed data sequences. This is due to the first order state process and the assumption of state conditional independence between observations. Artificial Neural Networks (NNs) are almost the opposite: they cannot model dynamic, temporally extended phenomena very well, but are good at static classification and regression tasks. Combining the two frameworks in a sensible way can therefore lead to a more powerful model with better classification abilities. The overall aim of this work has been to develop a probabilistic hybrid of hidden Markov models and neural networks and ...
Learning Dynamics for Exemplar-based Gesture Recognition
- IN IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION
, 2003
"... This paper addresses the problem of capturing the dynamics for exemplar-based recognition systems. Traditional HMM provides a probabilistic tool to capture system dynamics and in exemplar paradigm, HMM states are typically coupled with the exemplars. Alternatively, we propose a non-parametric HMM ap ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
This paper addresses the problem of capturing the dynamics for exemplar-based recognition systems. Traditional HMM provides a probabilistic tool to capture system dynamics and in exemplar paradigm, HMM states are typically coupled with the exemplars. Alternatively, we propose a non-parametric HMM approach that uses a discrete HMM with arbitrary states (decoupled from exemplars) to capture the dynamics over a large exemplar space where a nonparametric estimation approach is used to model the exemplar distribution. This reduces the need for lengthy and non-optimal training of the HMM observation model. We used the proposed approach for view-based recognition of gestures. The approach is based on representing each gesture as a sequence of learned body poses (exemplars). The gestures are recognized through a probabilistic framework for matching these body poses and for imposing temporal constraints between different poses using the proposed nonparametric HMM.
Confidence Measures From Local Posterior Probability Estimates
- Computer Speech and Language
, 1999
"... In this paper we introduce a set of related confidence measures for large vocabulary continuous speech recognition (LVCSR) based on local phone posterior probability estimates output by an acceptor HMM acoustic model. In addition to their computational efficiency, these confidence measures are attra ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
In this paper we introduce a set of related confidence measures for large vocabulary continuous speech recognition (LVCSR) based on local phone posterior probability estimates output by an acceptor HMM acoustic model. In addition to their computational efficiency, these confidence measures are attractive as they may be applied at the state-, phone-, word- or utterance-levels, potentially enabling discrimination between different causes of low confidence recognizer output, such as unclear acoustics or mismatched pronunciation models. We have evaluated these confidence measures for utterance verification using a number of different metrics. Experiments reveal several trends in `profitability of rejection', as measured by the unconditional error rate of a hypothesis test. These trends suggest that crude pronunciation models can mask the relatively subtle reductions in confidence caused by out-of-vocabulary (OOV) words and disfluencies, but not the gross model mismatches elicited by non-sp...

