Results 1 - 10
of
17
Confidence Measures for Hybrid HMM/ANN Speech Recognition
- In Proceedings of EuroSpeech
, 1997
"... In this paper we introduce four acoustic confidence measures which are derived from the output of a hybrid HMM/ANN large vocabulary continuous speech recognition system. These confidence measures, based on local posterior probability estimates computed by an ANN, are evaluated at both phone and word ..."
Abstract
-
Cited by 26 (6 self)
- Add to MetaCart
In this paper we introduce four acoustic confidence measures which are derived from the output of a hybrid HMM/ANN large vocabulary continuous speech recognition system. These confidence measures, based on local posterior probability estimates computed by an ANN, are evaluated at both phone and word levels, using the North American Business News corpus. 1. INTRODUCTION A reliable measure of the confidence of a speech recogniser's output is useful in many circumstances. A word may be hypothesised with low confidence when an out-of-vocabulary (OOV) word is encountered or when the word model is matched against unclear acoustics caused by disfluencies or noise. Both OOV words and unclear acoustics are a major source of recogniser error. A confidence measure based on can be used to reject those hypotheses which are likely to be erroneous (i.e., have a low confidence) in a hypothesis test. Additionally, a reliable confidence measure may be of practical use in recognition search (confidence ...
Hybrid HMM/ANN Systems for Speech Recognition: Overview and New Research Directions
- in Adaptive Processing of Sequences and Data Structures, ser. Lecture Notes in Artificial Intelligence (1387
, 1998
"... ..."
Hidden Markov Models and Neural Networks for Speech Recognition
, 1998
"... The Hidden Markov Model (HMMs) is one of the most successful modeling approaches for acoustic events in speech recognition, and more recently it has proven useful for several problems in biological sequence analysis. Although the HMM is good at capturing the temporal nature of processes such as spee ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
The Hidden Markov Model (HMMs) is one of the most successful modeling approaches for acoustic events in speech recognition, and more recently it has proven useful for several problems in biological sequence analysis. Although the HMM is good at capturing the temporal nature of processes such as speech, it has a very limited capacity for recognizing complex patterns involving more than first order dependencies in the observed data sequences. This is due to the first order state process and the assumption of state conditional independence between observations. Artificial Neural Networks (NNs) are almost the opposite: they cannot model dynamic, temporally extended phenomena very well, but are good at static classification and regression tasks. Combining the two frameworks in a sensible way can therefore lead to a more powerful model with better classification abilities. The overall aim of this work has been to develop a probabilistic hybrid of hidden Markov models and neural networks and ...
Confidence Measures From Local Posterior Probability Estimates
- Computer Speech and Language
, 1999
"... In this paper we introduce a set of related confidence measures for large vocabulary continuous speech recognition (LVCSR) based on local phone posterior probability estimates output by an acceptor HMM acoustic model. In addition to their computational efficiency, these confidence measures are attra ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
In this paper we introduce a set of related confidence measures for large vocabulary continuous speech recognition (LVCSR) based on local phone posterior probability estimates output by an acceptor HMM acoustic model. In addition to their computational efficiency, these confidence measures are attractive as they may be applied at the state-, phone-, word- or utterance-levels, potentially enabling discrimination between different causes of low confidence recognizer output, such as unclear acoustics or mismatched pronunciation models. We have evaluated these confidence measures for utterance verification using a number of different metrics. Experiments reveal several trends in `profitability of rejection', as measured by the unconditional error rate of a hypothesis test. These trends suggest that crude pronunciation models can mask the relatively subtle reductions in confidence caused by out-of-vocabulary (OOV) words and disfluencies, but not the gross model mismatches elicited by non-sp...
Start-synchronous search for large vocabulary continuous speech recognition
- IEEE Trans. Speech and Audio Processing
"... Abstract — In this paper, we present a novel, efficient search strategy for large vocabulary continuous speech recognition. The search algorithm, based on a stack decoder framework, utilizes phone-level posterior probability estimates (produced by a connectionist/hidden Markov model acoustic model) ..."
Abstract
-
Cited by 17 (9 self)
- Add to MetaCart
Abstract — In this paper, we present a novel, efficient search strategy for large vocabulary continuous speech recognition. The search algorithm, based on a stack decoder framework, utilizes phone-level posterior probability estimates (produced by a connectionist/hidden Markov model acoustic model) as a basis for phone deactivation pruning—a highly efficient method of reducing the required computation. The single-pass algorithm is naturally factored into the time-asynchronous processing of the word sequence and the time-synchronous processing of the hidden Markov model state sequence. This enables the search to be decoupled from the language model while still maintaining the computational benefits of time-synchronous processing. The incorporation of the language model in the search is discussed and computationally cheap approximations to the full language model are introduced. Experiments were performed on the North American Business News task using a 60 000 word vocabulary and a trigram language model. Results indicate that the computational cost of the search may be reduced by more than a factor of 40 with a relative search error of less than 2 % using the techniques discussed in the paper. Index Terms — Hidden Markov model, large vocabulary continuous speech recognition, phone deactivation pruning, search, stack decoding. I.
Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition
- IEEE Transactions on Audio, Speech, and Language Processing
, 2012
"... Abstract—We propose a novel context-dependent (CD) model for large vocabulary speech recognition (LVSR) that leverages recent advances in using deep belief networks for phone recognition. We describe a pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Abstract—We propose a novel context-dependent (CD) model for large vocabulary speech recognition (LVSR) that leverages recent advances in using deep belief networks for phone recognition. We describe a pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output. The deep belief network pre-training algorithm is a robust and often helpful way to initialize deep neural networks generatively that can aid in optimization and reduce generalization error. We illustrate the key components of our model, describe the procedure for applying CD-DNN-HMMs to LVSR, and analyze the effects of various modeling choices on performance. Experiments on a challenging business search dataset demonstrate that CD-DNN-HMMs can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs, with an absolute sentence accuracy improvement of 5.8 % and 9.2 % (or relative error reduction of 16.0 % and 23.2%) over the CD-GMM-HMMs trained using the minimum phone error rate (MPE) and maximum likelihood (ML) criteria, respectively. Index Terms—Speech recognition, deep belief network, context-dependent phone, LVSR, DNN-HMM, ANN-HMM I.
NON-STATIONARY MULTI-CHANNEL (MULTI-STREAM) PROCESSING TOWARDS ROBUST AND ADAPTIVE ASR
"... In this paper, we discuss the rationale behind multi-channel processing as applied to multi-stream automatic speech recognition (ASR). In this framework, we will develop dif-ferent mathematical models and discuss some interesting relationships with psycho-acoustic evidence.In the case of multi-chan ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
In this paper, we discuss the rationale behind multi-channel processing as applied to multi-stream automatic speech recognition (ASR). In this framework, we will develop dif-ferent mathematical models and discuss some interesting relationships with psycho-acoustic evidence.In the case of multi-channel processing, it is assumed that the speech signal is processed by different "experts",each expert focusing on a different characteristic of the signal, and that the different channels are combined at some(temporal) stage to yield a global recognition output. Although we believe that the discussion below is valid fornumerous multi-channel problems (e.g., audio and visual streams, in the case of audio-visual ASR), the present pa-per will mainly discuss the possible combination strategies (with application to multi-band ASR) and their relationshipswith different mathematical models. Finally, we will show that the proposed approaches could provide us with a newparadigm for noise robust and adaptive ASR.
Hidden Markov Models and other Finite State Automata for Sequence Processing
, 2001
"... Introduction During these last 20 years, Finite State Automata (FSA), and more particularly Stochastic Finite State Automata (SFSA) and different variants of Hidden Markov Models (HMMs), have been used quite successfully to address several complex sequential pattern recognition problems, such as co ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Introduction During these last 20 years, Finite State Automata (FSA), and more particularly Stochastic Finite State Automata (SFSA) and different variants of Hidden Markov Models (HMMs), have been used quite successfully to address several complex sequential pattern recognition problems, such as continuous speech recognition, cursive (handwritten) text recognition, time series prediction, biological sequence analysis, and many others. FSA allow complex learning problems to be solved by assuming that the sequential pattern can be decomposed into piecewise stationary segments, encoded through the topology of the FSA. Each stationary segment can be parametrized in terms of a deterministic or stochastic function. In the latter case, it may also be possible that the SFSA state sequence is not observed directly but is a probabilistic function of the underlying finite state Markov chain. This thus yields to the definition of the powerful Hidden Markov Models, involving two concurrent s
Hidden Neural Networks
- Neural Computation
"... A general framework for hybrids of Hidden Markov models (HMMs) and neural networks (NNs) called Hidden Neural Networks (HNNs) is described. The paper begins by reviewing standard HMMs and estimation by conditional maximum likelihood, which is used by the HNN. In the HNN the usual HMM probability par ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
A general framework for hybrids of Hidden Markov models (HMMs) and neural networks (NNs) called Hidden Neural Networks (HNNs) is described. The paper begins by reviewing standard HMMs and estimation by conditional maximum likelihood, which is used by the HNN. In the HNN the usual HMM probability parameters are replaced by the outputs of state specific neural networks. As opposed to many other hybrids, the HNN is normalized globally and therefore has a valid probabilistic interpretation. All parameters in the HNN are estimated simultaneously according to the discriminative conditional maximum likelihood criterion. An evaluation of the HNN on the task of recognizing broad phoneme classes in the TIMIT database shows clear performance gains compared to standard HMMs tested on the same task. 1 Introduction Hidden Markov models is one of the most successful modeling approaches for acoustic events in speech recognition (Rabiner 1989; Juang & Rabiner 1991), and more recently they have proven ...
Towards robust and adaptive speech recognition models
- IDIAP Research Reprort No. IDIAP-PR
, 2003
"... In this paper, we discuss a family of new Automatic Speech Recognition (ASR) approaches, which somewhat deviate from the usual ASR approaches but which have recently been shown to be more robust to nonstationary noise, without requiring specific adaptation or “multi-style ” training. More specifical ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this paper, we discuss a family of new Automatic Speech Recognition (ASR) approaches, which somewhat deviate from the usual ASR approaches but which have recently been shown to be more robust to nonstationary noise, without requiring specific adaptation or “multi-style ” training. More specifically, we will motivate and briefly describe new approaches based on multi-stream and subband ASR. These approaches extend the standard hidden Markov model (HMM) based approach by assuming that the different (frequency) streams representing the speech signal are processed by different (independent) “experts”, each expert focusing on a different characteristic of the signal, and that the different stream likelihoods (or posteriors) are combined at some (temporal) stage to yield a global recognition output. As a further extension to multi-stream ASR, we will finally introduce a new approach, referred to as HMM2, where the HMM emission probabilities are estimated via state specific feature based HMMs responsible for merging the stream information and modeling their possible correlation. Key words. Robust speech recognition, hidden Markov models, subband processing, multistream processing. 1. Introduction. Current

