Results 1 
9 of
9
An efficient A* stack decoder algorithm for continuous speech recognition with a stochastic language model
 In Proc. IEEE ICASSP’93
, 1993
"... The stack decoder is an attractive algorithm for controlling the acoustic and language model matching in a continuous speech recognizer. A previous paper described a nearoptimal admissible Viterbi A * search algorithm for use with noncrossword acoustic models and nogrammar language models [16]. T ..."
Abstract

Cited by 49 (0 self)
 Add to MetaCart
The stack decoder is an attractive algorithm for controlling the acoustic and language model matching in a continuous speech recognizer. A previous paper described a nearoptimal admissible Viterbi A * search algorithm for use with noncrossword acoustic models and nogrammar language models [16]. This paper extends this algorithm to include unigram language models and describes a modified version of the algorithm which includes the full (forward) decoder, crossword acoustic models and longerspan language models. The resultant algorithm is not admissible, but has been demonstrated to have a low probability of search error and to be very efficient.
Efficient Search Using Posterior Phone Probability Estimates
 In Proc. ICASSP
, 1995
"... In this paper we present a novel, efficient search strategy for large vocabulary continuous speech recognition (LVCSR). The search algorithm, based on stack decoding, uses posterior phone probability estimates to substantially increase its efficiency with minimal effect on accuracy. In particular, t ..."
Abstract

Cited by 38 (9 self)
 Add to MetaCart
In this paper we present a novel, efficient search strategy for large vocabulary continuous speech recognition (LVCSR). The search algorithm, based on stack decoding, uses posterior phone probability estimates to substantially increase its efficiency with minimal effect on accuracy. In particular, the search space is dramatically reduced by phone deactivation pruning where phones with a small local posterior probability are deactivated. This approach is particularly wellsuited to hybrid connectionist/hidden Markov model systems because posterior phone probabilities are directly computed by the acoustic model. On large vocabulary tasks, using a trigram language model, this increased the search speed by an order of magnitude, with 2% or less relative search error. Results from a hybrid system are presented using the Wall Street Journal LVCSR database for a 20,000 word task using a backedoff trigram languagemodel. For this task, our singlepass decodertook around 15× realtime on an HP73...
TimeFirst Search For Large Vocabulary Speech Recognition
, 1998
"... This paper describes a new search technique for large vocabulary speech recognition based on a stack decoder. Considerable memory savings are achieved with the combination of a tree based lexicon and a new search technique. The search proceeds timefirst, that is partial path hypotheses are extended ..."
Abstract

Cited by 25 (5 self)
 Add to MetaCart
This paper describes a new search technique for large vocabulary speech recognition based on a stack decoder. Considerable memory savings are achieved with the combination of a tree based lexicon and a new search technique. The search proceeds timefirst, that is partial path hypotheses are extended into the future in the inner loop and a tree walk over the lexicon is performed as an outer loop. Partial word hypotheses are grouped based on language model state. The stack maintains information about groups of hypotheses and whole groups are extended by one word to form new stack entries. An implementation is described of a onepass decoder employing a 65,000 word lexicon and a diskbased trigram language model. Real time operation is achieved with a small search error, a search space of about 5 Mbyte and a total memory usage of about 35 Mbyte. 1. INTRODUCTION Search is an interesting problem in the field of large vocabulary speech recognition. Typically the acoustic vectors correspondi...
Decoder Technology For Connectionist Large Vocabulary Speech Recognition
, 1995
"... The search problem in large vocabulary continuous speech recognition (LVCSR) is to locate the most probable string of words for a spoken utterance given the acoustic signal and a set of sentence models. Searching the space of possible utterances is difficult because of the large vocabulary size and ..."
Abstract

Cited by 24 (4 self)
 Add to MetaCart
The search problem in large vocabulary continuous speech recognition (LVCSR) is to locate the most probable string of words for a spoken utterance given the acoustic signal and a set of sentence models. Searching the space of possible utterances is difficult because of the large vocabulary size and the complexity imposed when longspan language models are used. This report describes an efficient search procedure and its software embodiment in a decoder, NOWAY, which has been incorporated in ABBOT, a hybrid connectionist/ hidden Markov model (HMM) LVCSR system [15]. The search algorithm is based on stack decoding and uses both likelihood and posteriorbased pruning. The use of the posteriorbased phone deactivation pruning techniques is wellsuited to hybrid connectionist/HMM systems because posterior phone probabilities are directly computed by the connectionist acoustic model. The singlepass decoder has been evaluate on the large vocabulary North American Business News task using a...
Startsynchronous search for large vocabulary continuous speech recognition
 IEEE Trans. Speech and Audio Processing
"... Abstract — In this paper, we present a novel, efficient search strategy for large vocabulary continuous speech recognition. The search algorithm, based on a stack decoder framework, utilizes phonelevel posterior probability estimates (produced by a connectionist/hidden Markov model acoustic model) ..."
Abstract

Cited by 20 (10 self)
 Add to MetaCart
Abstract — In this paper, we present a novel, efficient search strategy for large vocabulary continuous speech recognition. The search algorithm, based on a stack decoder framework, utilizes phonelevel posterior probability estimates (produced by a connectionist/hidden Markov model acoustic model) as a basis for phone deactivation pruning—a highly efficient method of reducing the required computation. The singlepass algorithm is naturally factored into the timeasynchronous processing of the word sequence and the timesynchronous processing of the hidden Markov model state sequence. This enables the search to be decoupled from the language model while still maintaining the computational benefits of timesynchronous processing. The incorporation of the language model in the search is discussed and computationally cheap approximations to the full language model are introduced. Experiments were performed on the North American Business News task using a 60 000 word vocabulary and a trigram language model. Results indicate that the computational cost of the search may be reduced by more than a factor of 40 with a relative search error of less than 2 % using the techniques discussed in the paper. Index Terms — Hidden Markov model, large vocabulary continuous speech recognition, phone deactivation pruning, search, stack decoding. I.
Efficient Evaluation Of The Lvcsr Search Space Using The Noway Decoder
 In ICASSP
, 1996
"... This work further develops and analyses the large vocabulary continuous speech recognition (LVCSR) search strategy reported at ICASSP95 [1]. In particular, the posteriorbased phone deactivation pruning approach has been extended to include phonedependent thresholds and an improved estimate of the ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
This work further develops and analyses the large vocabulary continuous speech recognition (LVCSR) search strategy reported at ICASSP95 [1]. In particular, the posteriorbased phone deactivation pruning approach has been extended to include phonedependent thresholds and an improved estimate of the least upper bound on the utterance logprobability has been developed. Analysis of the pruning procedures and of the search's interaction with the language model has also been performed. Experiments were carried out using the ARPA North American Business News task with a 20,000 word vocabulary and a trigram language model. As a result of these improvements and analyses, the computational cost of the recognition process performed by the noway decoder has been substantially reduced. 1. INTRODUCTION At ICASSP95, we introduced an efficient search procedure [1] that was implemented as a software decoder known as noway and used in the Abbot hybrid connectionist/ HMM LVCSR system [2, 3]. Key fea...
The 1994 Abbot Hybrid ConnectionistHMM LargeVocabulary Recognition System
, 1995
"... ABBOT is the hybrid connectionisthidden Markov model largevocabulary speech recognition system developed at Cambridge University. In this system, a recurrent network maps each acoustic vector to an estimate of the posterior probabilities of the phone classes. The maximum likelihood word string is t ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
ABBOT is the hybrid connectionisthidden Markov model largevocabulary speech recognition system developed at Cambridge University. In this system, a recurrent network maps each acoustic vector to an estimate of the posterior probabilities of the phone classes. The maximum likelihood word string is then extracted using Markov models. As in traditional hidden Markov models, the Markov process is used to model the lexical and language model constraints. This paper describes the system which participated in the November 1994 ARPA evaluation of continuous speech recognition systems. The emphasis of the paper is on the differences between the 1993 and 1994 versions of the ABBOT system. This includes the utilization of a larger training corpus (SI284 versus SI84), the extension of the lexicon from 5,000 words to 65,000 words, the application of a trigram language model, and the development of a nearrealtime singlepass decoder well suited for the hybrid approach. Experimental results are rep...
Bayesian Protein Secondary Structure Prediction With NearOptimal Segmentations
"... Abstract—Secondary structure prediction is an invaluable tool in determining the 3D structure and function of proteins. Typically, protein secondary structure prediction methods suffer from low accuracy instrand predictions, where nonlocal interactions play a significant role. There is a considera ..."
Abstract
 Add to MetaCart
Abstract—Secondary structure prediction is an invaluable tool in determining the 3D structure and function of proteins. Typically, protein secondary structure prediction methods suffer from low accuracy instrand predictions, where nonlocal interactions play a significant role. There is a considerable need to model such longrange interactions that contribute to the stabilization of a protein molecule. In this paper, we introduce an alternative decoding technique for the hidden semiMarkov model (HSMM) originally employed in the BSPSS algorithm, and further developed in the IPSSP algorithm. The proposed method is based on the Nbest paradigm where a set of most likely segmentations is computed. To generate suboptimal segmentations (i.e., alternative prediction sequences), we developed two Nbest search algorithms. The first one is an stack decoder algorithm that extends paths (or hypotheses) by one symbol at each iteration. The second algorithm locally keeps
7 THE USE OF RECURRENT NEURAL NETWORKS IN CONTINUOUS SPEECH RECOGNITION
"... This chapter describes a use of recurrent neural networks (i.e., feedback is incorporated in the computation) as an acoustic model for continuous speech recognition. The form of the recurrent neural network is described along with an appropriate parameter estimation procedure. For each frame of acou ..."
Abstract
 Add to MetaCart
This chapter describes a use of recurrent neural networks (i.e., feedback is incorporated in the computation) as an acoustic model for continuous speech recognition. The form of the recurrent neural network is described along with an appropriate parameter estimation procedure. For each frame of acoustic data, the recurrent network generates an estimate of the posterior probability of of the possible phones given the observed acoustic signal. The posteriors are then converted into scaled likelihoods and used as the observation probabilities within a conventional decoding paradigm (e.g., Viterbi decoding). The advantages of using recurrent networks are that they require a small number of parameters and provide a fast decoding capability (relative 3 to conventional, largevocabulary, HMM systems).