Results 1 
5 of
5
Efficient Search Using Posterior Phone Probability Estimates
 In Proc. ICASSP
, 1995
"... In this paper we present a novel, efficient search strategy for large vocabulary continuous speech recognition (LVCSR). The search algorithm, based on stack decoding, uses posterior phone probability estimates to substantially increase its efficiency with minimal effect on accuracy. In particular, t ..."
Abstract

Cited by 38 (9 self)
 Add to MetaCart
(Show Context)
In this paper we present a novel, efficient search strategy for large vocabulary continuous speech recognition (LVCSR). The search algorithm, based on stack decoding, uses posterior phone probability estimates to substantially increase its efficiency with minimal effect on accuracy. In particular, the search space is dramatically reduced by phone deactivation pruning where phones with a small local posterior probability are deactivated. This approach is particularly wellsuited to hybrid connectionist/hidden Markov model systems because posterior phone probabilities are directly computed by the acoustic model. On large vocabulary tasks, using a trigram language model, this increased the search speed by an order of magnitude, with 2% or less relative search error. Results from a hybrid system are presented using the Wall Street Journal LVCSR database for a 20,000 word task using a backedoff trigram languagemodel. For this task, our singlepass decodertook around 15× realtime on an HP73...
THE USE OF RECURRENT NEURAL NETWORKS IN CONTINUOUS SPEECH RECOGNITION
, 1996
"... This chapter describes a use of recurrent neural networks (i.e., feedback is incorporated in the computation) as an acoustic model for continuous speech recognition. The form of the recurrent neural network is described along with an appropriate parameter estimation procedure. For each frame of acou ..."
Abstract

Cited by 24 (6 self)
 Add to MetaCart
This chapter describes a use of recurrent neural networks (i.e., feedback is incorporated in the computation) as an acoustic model for continuous speech recognition. The form of the recurrent neural network is described along with an appropriate parameter estimation procedure. For each frame of acoustic data, the recurrent network generates an estimate of the posterior probability of of the possible phones given the observed acoustic signal. The posteriors are then converted into scaled likelihoods and used as the observation probabilities within a conventional decoding paradigm (e.g., Viterbi decoding). The advantages of using recurrent networks are that they require a small number of parameters and provide a fast decoding capability (relative 3 to conventional, largevocabulary, HMM systems).
Startsynchronous search for large vocabulary continuous speech recognition
 IEEE Trans. Speech and Audio Processing
"... Abstract — In this paper, we present a novel, efficient search strategy for large vocabulary continuous speech recognition. The search algorithm, based on a stack decoder framework, utilizes phonelevel posterior probability estimates (produced by a connectionist/hidden Markov model acoustic model) ..."
Abstract

Cited by 20 (10 self)
 Add to MetaCart
(Show Context)
Abstract — In this paper, we present a novel, efficient search strategy for large vocabulary continuous speech recognition. The search algorithm, based on a stack decoder framework, utilizes phonelevel posterior probability estimates (produced by a connectionist/hidden Markov model acoustic model) as a basis for phone deactivation pruning—a highly efficient method of reducing the required computation. The singlepass algorithm is naturally factored into the timeasynchronous processing of the word sequence and the timesynchronous processing of the hidden Markov model state sequence. This enables the search to be decoupled from the language model while still maintaining the computational benefits of timesynchronous processing. The incorporation of the language model in the search is discussed and computationally cheap approximations to the full language model are introduced. Experiments were performed on the North American Business News task using a 60 000 word vocabulary and a trigram language model. Results indicate that the computational cost of the search may be reduced by more than a factor of 40 with a relative search error of less than 2 % using the techniques discussed in the paper. Index Terms — Hidden Markov model, large vocabulary continuous speech recognition, phone deactivation pruning, search, stack decoding. I.
unknown title
"... This chapter was written in 1994. Further advances have been made such as: context dependent phone modelling; forwardbackward training and adaptation using linear input transformations. This chapter describes a use of recurrent neural networks (i.e., feedback is incorpo rated in the computation) ..."
Abstract
 Add to MetaCart
(Show Context)
This chapter was written in 1994. Further advances have been made such as: context dependent phone modelling; forwardbackward training and adaptation using linear input transformations. This chapter describes a use of recurrent neural networks (i.e., feedback is incorpo rated in the computation) as an acoustic model for continuous speech recognition. The form of the recurrent neural network is described along with an appropriate pa rameter estimation procedure. For each frame of acoustic data, the recurrent network generates an estimate of the posterior probability of of the possible phones given the observed acoustic signal. The posteriors are then converted into scaled likelihoods and used as the observation probabilities within a conventional decoding paradigm (e.g., Viterbi decoding). The advantages of using recurrent networks are that they require a small number of parameters and provide a fast decoding capability (relative to conventional, largevocabulary, HMM systems). Most if not all automatic speech recognition systems explicitly or implicitly compute a (equivalently, etc.) indicating how well an input acoustic signal matches a speech model of the hypothesised utterance. A fundamental problem in speech recognition is how this score may be computed,