Results 1 
8 of
8
Dynamic Programming Search for Continuous Speech Recognition
, 1999
"... Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery efficient and practical pruning str ..."
Abstract

Cited by 48 (1 self)
 Add to MetaCart
Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery efficient and practical pruning strategy so that very large search spaces can be handled. Second, the dynamic programming strategy has turned out to be extremely flexible in adapting to new requirements. Examples of such requirements are the lexical tree organization of the pronunciation lexicon and the generation of a word graph instead of the single best sentence. In this paper, we attempt to systematically review the use of dynamic programming search strategies for smallvocabulary and largevocabulary continuous speech recognition. The following methods are described in detail: search using a linear lexicon, search using a lexical tree, languagemodel lookahead and word graph generation.
StateBased Gaussian Selection In Large Vocabulary Continuous Speech Recognition Using HMMs
, 1997
"... This paper investigates the use of Gaussian Selection (GS) to increase the speed of a large vocabulary speech recognition system. Typically 3070% of the computational time of a HMMbased speech recogniser is spent calculating probabilities. The aim of GS is to reduce this load by dividing the acoust ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
This paper investigates the use of Gaussian Selection (GS) to increase the speed of a large vocabulary speech recognition system. Typically 3070% of the computational time of a HMMbased speech recogniser is spent calculating probabilities. The aim of GS is to reduce this load by dividing the acoustic space into a set of clusters and associating a "shortlist" of Gaussians with each of these clusters. Any Gaussian not in the shortlist is simply approximated. This paper examines new techniques for obtaining "good" shortlists. All the new schemes make use of state information, specifically which state each of the components belongs to. In this way a maximum number of components per state may be specified, hence reducing the size of the shortlist. The first technique introduced is a simple extension of the standard GS one, which uses this state information. Then, more complex schemes based on maximising the likelihood of the training data are proposed. These new approaches are compared...
Fast Likelihood Computation Methods For Continuous Mixture Densities In Large Vocabulary Speech Recognition
 In Proc. of the European Conf. on Speech Communication and Technology
, 1997
"... This paper studies algorithms for reducing the computational effort of the mixture density calculations in HMMbased speech recognition systems. These likelihood calculations take about 70 \Gamma 85% of the total recognition time in the RWTH system for large vocabulary continuous speech recognition. ..."
Abstract

Cited by 14 (10 self)
 Add to MetaCart
This paper studies algorithms for reducing the computational effort of the mixture density calculations in HMMbased speech recognition systems. These likelihood calculations take about 70 \Gamma 85% of the total recognition time in the RWTH system for large vocabulary continuous speech recognition. To reduce the computational cost of the likelihood calculations, we investigate several space partitioning methods. A detailed comparison of these techniques is given on the North American Business Corpus (NAB'94) for a 20 000word task. As a result, the socalled projection search algorithm in combination with the VQ method reduces the cost of likelihood computation by a factor of about 8 with no significant loss in the word recognition accuracy. 1.
On Supervised Learning From Sequential Data With Applications For Speech Recognition
, 1999
"... visualization of the problem to model human speech. A large number of example sequences of observation vectors (shown connected as continuous trajectories) depending on a given sequence of class labels, with each class representing for example a phoneme (here the name Keiko with given durations). In ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
visualization of the problem to model human speech. A large number of example sequences of observation vectors (shown connected as continuous trajectories) depending on a given sequence of class labels, with each class representing for example a phoneme (here the name Keiko with given durations). In this synthetic example, the onedimensional target data would be represented poorly by a unimodal Gaussian distribution with a constant variance (which corresponds to using the squarederror objective function), which would average the two separate branches, indicated by the fat lines as the mean and constant variance of the single Gaussian. Compare this figure with Figure 3.10, Figure 3.11 and Figure 3.12 to see a subsequent improvement of the model.
Use Of Gaussian Selection In Large Vocabulary Continuous Speech Recognition Using HMMs
, 1996
"... This paper investigates the use of Gaussian Selection (GS) to reduce the state likelihood computation in HMMbased systems. These likelihood calculations contribute significantly (30 to 70%) to the computational load. Previously, it has been reported that when GS is used on large systems the recogni ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
This paper investigates the use of Gaussian Selection (GS) to reduce the state likelihood computation in HMMbased systems. These likelihood calculations contribute significantly (30 to 70%) to the computational load. Previously, it has been reported that when GS is used on large systems the recognition accuracy tends to degrade above a \Theta3 reduction in likelihood computation. To explain this degradation, this paper investigates the tradeoffs necessary between achieving good state likelihoods and low computation. In addition, the problem of unseen states in a cluster is examined. It is shown that further improvements are possible. For example, using a different assignment measure, with a constraint on the number of components per state per cluster, enabled the recognition accuracy on a 5k speakerindependent task to be maintained up to a \Theta5 reduction in likelihood computation. 1. INTRODUCTION In recent years, high accuracy large vocabulary continuous speech recognition sys...
Towards A Compact Speech Recognizer: Subspace Distribution Clustering Hidden Markov Model
, 1998
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xiii 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 The Problem: Too Many Parameters : : : : : : : : : : : : : : : : : : : : : : 3 1.2 Proposed Solution: It Is Time to ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xiii 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 The Problem: Too Many Parameters : : : : : : : : : : : : : : : : : : : : : : 3 1.2 Proposed Solution: It Is Time to Share More! : : : : : : : : : : : : : : : : : 4 1.3 Thesis Summary and Outline : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2 Review of Acoustic Modeling Using Hidden Markov Model : : : : : : : 9 2.1 Speech Characteristics : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 2.2 Selection of Input Speech Space and Speech Model : : : : : : : : : : : : : : 10 2.2.1 Cepstral Input : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 2.2.2 Hidden Markov Model : : : : : : : : : : : : : : : : : : : : : : : : : : 11 2.2.3 Our Choice of HMM for Acoustic Modeling : : : : : : : : : : : : : : 14 2.3 Speech Unit to Model : : : : : : : : : : : : : : : : : : : : : : : : : : ...
A HighSpeed, LowResource ASR BackEnd Based on Custom Arithmetic
"... Abstract—With the skyrocketing popularity of mobile devices, new processing methods tailored to a specific application have become necessary for lowresource systems. This work presents a highspeed, lowresource speech recognition system using custom arithmetic units, where all system variables are ..."
Abstract
 Add to MetaCart
Abstract—With the skyrocketing popularity of mobile devices, new processing methods tailored to a specific application have become necessary for lowresource systems. This work presents a highspeed, lowresource speech recognition system using custom arithmetic units, where all system variables are represented by integer indices and all arithmetic operations are replaced by hardwarebased table lookups. To this end, several reordering and rescaling techniques, including two accumulation structures for Gaussian evaluation and a novel method for the normalization of Viterbi search scores, are proposed to ensure low entropy for all variables. Furthermore, a discriminatively inspired distortion measure is investigated for scalar quantization of forward probabilities to maximize the recognition rate. Finally, heuristic algorithms are explored to optimize systemwide resource allocation. Our best bitwidth allocation scheme only requires 59 kB of ROMs to hold the lookup tables, and its recognition performance with various vocabulary sizes in both clean and noisy conditions is nearly as good as that of a system using a 32bit floatingpoint unit. Simulations on various architectures show that, on most modern processor designs, we can expect a cyclecount speedup of at least three times over systems with floatingpoint units. Additionally, the memory bandwidth is reduced by over 70 % and the offline storage for model parameters is reduced by 80%. Index Terms—Alpha recursion, bitwidth allocation, custom arithmetic, discriminative distortion measure, forward probability normalization and scaling, high speed, low resource, normalization, quantization, speech recognition. I.
Use Of Gaussian Selection In Large Vocabulary Continuous Speech Recognition Using HMMs
, 1996
"... This paper investigates the use of Gaussian Selection (GS) to reduce the state likelihood computation in HMMbased systems. These likelihood calculations contribute significantly (30 to 70%) to the computational load. Previously, it has been reported that when GS is used on large systems the recogni ..."
Abstract
 Add to MetaCart
This paper investigates the use of Gaussian Selection (GS) to reduce the state likelihood computation in HMMbased systems. These likelihood calculations contribute significantly (30 to 70%) to the computational load. Previously, it has been reported that when GS is used on large systems the recognition accuracy tends to degrade above a #3 reduction in likelihood computation. To explain this degradation, this paper investigates the tradeoffs necessary between achieving good state likelihoods and low computation. In addition, the problem of unseen states in a cluster is examined. It is shown that further improvements are possible. For example, using a different assignment measure, with a constraint on the number of components per state per cluster, enabled the recognition accuracy on a 5k speakerindependent task to be maintained up to a #5 reduction in likelihood computation.