Results 1 - 10
of
30
Survey of the State of the Art in Human Language Technology
, 1995
"... Contents 1 Spoken Language Input 1 Ron Cole & Victor Zue, chapter editors 1.1 Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 Victor Zue & Ron Cole 1.2 Speech Recognition : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 Victor Zue, Ron Cole, & Wayne Ward 1.3 Sig ..."
Abstract
-
Cited by 47 (0 self)
- Add to MetaCart
Contents 1 Spoken Language Input 1 Ron Cole & Victor Zue, chapter editors 1.1 Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 Victor Zue & Ron Cole 1.2 Speech Recognition : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 Victor Zue, Ron Cole, & Wayne Ward 1.3 Signal Representation : : : : : : : : : : : : : : : : : : : : : : : : : : 11 Melvyn J. Hunt 1.4 Robust Speech Recognition : : : : : : : : : : : : : : : : : : : : : : 17 Richard M. Stern 1.5 HMM Methods in Speech Recognition : : : : : : : : : : : : : : : 24 Renato De Mori & Fabio Brugnara 1.6 Language Representation : : : : : : : : : : : : : : : : : : : : : : : : 35 Salim Roukos 1.7 Speaker Recognition : : : : : : : : : : : : : : : : : : : : : : : : : : :<F35.37
Genones: Generalized Mixture Tying in Continuous Hidden Markov Model-Based Speech Recognizers
- IEEE Transactions on Speech and Audio Processing
, 1996
"... An algorithm is proposed that achieves a good trade-off between modeling resolution and robustness by using a new, general scheme for tying of mixture components in continuous mixture-density hidden Markov model (HMM)-based speech recognizers. The sets of HMM states that share the same mixture co ..."
Abstract
-
Cited by 36 (7 self)
- Add to MetaCart
An algorithm is proposed that achieves a good trade-off between modeling resolution and robustness by using a new, general scheme for tying of mixture components in continuous mixture-density hidden Markov model (HMM)-based speech recognizers. The sets of HMM states that share the same mixture components are determined automatically using agglomerative clustering techniques. Experimental results on ARPA's Wall-Street Journal corpus show that this scheme reduces errors by 25% over typical tied-mixture systems. New fast algorithms for computing Gaussian likelihoods--the most time-consuming aspect of continuous-density HMM systems--are also presented. These new algorithms significantly reduce the number of Gaussian densities that are evaluated with little or no impact on speech recognition accuracy. Corresponding Author: Vassilios Digalakis Address: Electronic and Computer Engineering Department Technical University of Crete, Kounoupidiana Chania, 73100 GREECE Phone: +30-821...
State-Based Gaussian Selection In Large Vocabulary Continuous Speech Recognition Using HMMs
, 1998
"... This paper investigates the use of Gaussian Selection (GS) to increase the speed of a large vocabulary speech recognition system. Typically 30-70% of the computational time of a continuous density HMM-based speech recogniser is spent calculating probabilities. The aim of GS is to reduce this load ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
This paper investigates the use of Gaussian Selection (GS) to increase the speed of a large vocabulary speech recognition system. Typically 30-70% of the computational time of a continuous density HMM-based speech recogniser is spent calculating probabilities. The aim of GS is to reduce this load by selecting the subset of Gaussian component likelihoods that should be computed given a particular input vector. This paper examines new techniques for obtaining "good" Gaussian subsets or "shortlists". All the new schemes make use of state information, specifically which state each of the Gaussian components belongs to. In this way a maximum number of Gaussian components per state may be specified, hence reducing the size of the shortlist. The first technique introduced is a simple extension of the standard GS method, which uses this state information. Then, more complex schemes based on maximising the likelihood of the training data are proposed. These new approaches are compared with the standard GS scheme on a large vocabulary speech recognition task. On this task, the use of state information reduced the percentage of Gaussians computed to 10-15%, compared with 20-30% for the standard GS scheme, with little degradation in performance. 1 M.J.F.Gales is now at the IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA. 2 K.M. Knill is now at Nuance Communications, 1380 Willow Rd, Menlo Park, CA 94025, USA. List of Tables 1 Change in the average forced alignment likelihood of the ARPA 1994 H1 development data for SGS and SBGS systems, compared to the standard no GS system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2 Recognition performance of the standard no GS, SGS and SBGS systems on the ARPA 1994 H...
Acoustic Modeling Improvements In A Segment-Based Speech Recognizer
- PROC. IEEE ASRU WORKSHOP
, 1999
"... In this paper we report on some recent improvements on the acoustic modeling in a segment-based speech recognition system. Context-dependent segment models and improved pronunciation modeling are shown to reduce word error rates in a telephone -based, conversational system by over 18%, while the ..."
Abstract
-
Cited by 19 (8 self)
- Add to MetaCart
In this paper we report on some recent improvements on the acoustic modeling in a segment-based speech recognition system. Context-dependent segment models and improved pronunciation modeling are shown to reduce word error rates in a telephone -based, conversational system by over 18%, while the technique of Gaussian selection reduces overall computation by more than a factor of two.
Discriminative Training of Hidden Markov Models
, 1998
"... vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4 Finding the Best Transcription . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.5 Setting the Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 Objective Functions 19 3.1 Properties of Maximum Likelihood Estimators . . . . . . . . . . . . . . . . . . . 19 3.2 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Maximum Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 Frame Discrimination . . . . . . . . . . . . . . . . ....
Fast Likelihood Computation Methods For Continuous Mixture Densities In Large Vocabulary Speech Recognition
- In Proc. of the European Conf. on Speech Communication and Technology
, 1997
"... This paper studies algorithms for reducing the computational effort of the mixture density calculations in HMM-based speech recognition systems. These likelihood calculations take about 70 \Gamma 85% of the total recognition time in the RWTH system for large vocabulary continuous speech recognition. ..."
Abstract
-
Cited by 11 (8 self)
- Add to MetaCart
This paper studies algorithms for reducing the computational effort of the mixture density calculations in HMM-based speech recognition systems. These likelihood calculations take about 70 \Gamma 85% of the total recognition time in the RWTH system for large vocabulary continuous speech recognition. To reduce the computational cost of the likelihood calculations, we investigate several space partitioning methods. A detailed comparison of these techniques is given on the North American Business Corpus (NAB'94) for a 20 000word task. As a result, the so-called projection search algorithm in combination with the VQ method reduces the cost of likelihood computation by a factor of about 8 with no significant loss in the word recognition accuracy. 1.
Speech Recognition Using Augmented Conditional Random Fields
"... Abstract—Acoustic modeling based on hidden Markov models (HMMs) is employed by state-of-the-art stochastic speech recognition systems. Although HMMs are a natural choice to warp the time axis and model the temporal phenomena in the speech signal, their conditional independence properties limit their ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Abstract—Acoustic modeling based on hidden Markov models (HMMs) is employed by state-of-the-art stochastic speech recognition systems. Although HMMs are a natural choice to warp the time axis and model the temporal phenomena in the speech signal, their conditional independence properties limit their ability to model spectral phenomena well. In this paper, a new acoustic modeling paradigm based on augmented conditional random fields (ACRFs) is investigated and developed. This paradigm addresses some limitations of HMMs while maintaining many of the aspects which have made them successful. In particular, the acoustic modeling problem is reformulated in a data driven, sparse, augmented space to increase discrimination. Acoustic context modeling is explicitly integrated to handle the sequential phenomena of the speech signal. We present an efficient framework for estimating these models that ensures scalability and generality. In the TIMIT
Subspace constrained gaussian mixture models for speech recognition
- IEEE Transactions on Speech and Audio Processing
, 2005
"... Abstract — A standard approach to automatic speech recognition uses Hidden Markov Models whose state dependent distributions are Gaussian mixture models. Each Gaussian can be viewed as an exponential model whose features are linear and quadratic monomials in the acoustic vector. We consider here mod ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Abstract — A standard approach to automatic speech recognition uses Hidden Markov Models whose state dependent distributions are Gaussian mixture models. Each Gaussian can be viewed as an exponential model whose features are linear and quadratic monomials in the acoustic vector. We consider here models in which the weight vectors of these exponential models are constrained to lie in an affine subspace shared by all the Gaussians. This class of models includes Gaussian models with linear constraints placed on the precision (inverse covariance) matrices (such as diagonal covariance, MLLT, or EMLLT) as well as the LDA/HLDA models used for feature selection which tie the part of the Gaussians in the directions not used for discrimination. In this paper we present algorithms for training these models using a maximum likelihood criterion. We present experiments on both small vocabulary, resource constrained, grammar based tasks as well as large vocabulary, unconstrained resource tasks to explore the rather large parameter space of models that fit within our framework. In particular, we demonstrate significant improvements can be obtained in both word error rate and computational complexity. I.
Dimensional reduction, covariance modeling and computational complexity in asr systems
- in ASR systems,” in Proc. ICASSP
, 2003
"... In this paper, we study acoustic modeling for speech recognition using mixtures of exponential models with linear and quadratic features tied across all context dependent states. These models are one version of the SPAM models introduced in [1]. They generalize diagonal covariance, MLLT, EMLLT, and ..."
Abstract
-
Cited by 7 (7 self)
- Add to MetaCart
In this paper, we study acoustic modeling for speech recognition using mixtures of exponential models with linear and quadratic features tied across all context dependent states. These models are one version of the SPAM models introduced in [1]. They generalize diagonal covariance, MLLT, EMLLT, and full covariance models. Reduction of the dimension of the acoustic vectors using LDA/HDA projections corresponds to a special case of reducing the exponential model feature space. We see, in one speech recognition task, that SPAM models on an LDA projected space of varying dimensions achieve a significant fraction of the WER improvement in going from MLLT to full covariance modeling, while maintaining the low computational cost of the MLLT models. Further, the feature precomputation cost can be minimized using the hybrid feature technique of [2]; and the number of Gaussians one needs to compute can be greatly reducing using hierarchical clustering of the Gaussians (with fixed feature space). Finally, we show that reducing the quadratic and linear feature spaces separately produces models with better accuracy, but comparable computational complexity, to LDA/HDA based models. 1.
Fast Implementation Methods For Viterbi-Based Word-Spotting
- In Proc. ICASSP
, 1996
"... This paper explores methods of increasing the speed of a Viterbi-based word-spotting system for audio document retrieval. Fast processing is essential since the user expects to receive the results of a keyword search many times faster than the actual length of the speech. A number of computational s ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This paper explores methods of increasing the speed of a Viterbi-based word-spotting system for audio document retrieval. Fast processing is essential since the user expects to receive the results of a keyword search many times faster than the actual length of the speech. A number of computational short-cuts to the standard Viterbi word-spotter are presented. These are based on exploiting the background Viterbi phone recognition path that is computed to provide a normalisation base. An initial approximation using the phone transition boundaries reduces the retrieval time by a factor of 5, while achieving a slight improvement in word-spotting performance. To further reduce retrieval time, pattern matching, feature selection, and Gaussian selection techniques are applied to this approximate pass to give a total \Theta50 increase in speed with little loss in performance. In addition, a low memory requirement means that these approaches can be implemented on any platform, including hand-he...

