Results 1  10
of
38
Survey of the state of the art in human language technology
 Studies In Natural Language Processing, XIIXIII
, 1997
"... Sponsors: ..."
Genones: Generalized Mixture Tying in Continuous Hidden Markov ModelBased Speech Recognizers
 IEEE Transactions on Speech and Audio Processing
, 1996
"... An algorithm is proposed that achieves a good tradeoff between modeling resolution and robustness by using a new, general scheme for tying of mixture components in continuous mixturedensity hidden Markov model (HMM)based speech recognizers. The sets of HMM states that share the same mixture co ..."
Abstract

Cited by 41 (7 self)
 Add to MetaCart
An algorithm is proposed that achieves a good tradeoff between modeling resolution and robustness by using a new, general scheme for tying of mixture components in continuous mixturedensity hidden Markov model (HMM)based speech recognizers. The sets of HMM states that share the same mixture components are determined automatically using agglomerative clustering techniques. Experimental results on ARPA's WallStreet Journal corpus show that this scheme reduces errors by 25% over typical tiedmixture systems. New fast algorithms for computing Gaussian likelihoodsthe most timeconsuming aspect of continuousdensity HMM systemsare also presented. These new algorithms significantly reduce the number of Gaussian densities that are evaluated with little or no impact on speech recognition accuracy. Corresponding Author: Vassilios Digalakis Address: Electronic and Computer Engineering Department Technical University of Crete, Kounoupidiana Chania, 73100 GREECE Phone: +30821...
Acoustic Modeling Improvements In A SegmentBased Speech Recognizer
 PROC. IEEE ASRU WORKSHOP
, 1999
"... In this paper we report on some recent improvements on the acoustic modeling in a segmentbased speech recognition system. Contextdependent segment models and improved pronunciation modeling are shown to reduce word error rates in a telephone based, conversational system by over 18%, while the ..."
Abstract

Cited by 24 (10 self)
 Add to MetaCart
In this paper we report on some recent improvements on the acoustic modeling in a segmentbased speech recognition system. Contextdependent segment models and improved pronunciation modeling are shown to reduce word error rates in a telephone based, conversational system by over 18%, while the technique of Gaussian selection reduces overall computation by more than a factor of two.
StateBased Gaussian Selection In Large Vocabulary Continuous Speech Recognition Using HMMs
, 1997
"... This paper investigates the use of Gaussian Selection (GS) to increase the speed of a large vocabulary speech recognition system. Typically 3070% of the computational time of a HMMbased speech recogniser is spent calculating probabilities. The aim of GS is to reduce this load by dividing the acoust ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
This paper investigates the use of Gaussian Selection (GS) to increase the speed of a large vocabulary speech recognition system. Typically 3070% of the computational time of a HMMbased speech recogniser is spent calculating probabilities. The aim of GS is to reduce this load by dividing the acoustic space into a set of clusters and associating a "shortlist" of Gaussians with each of these clusters. Any Gaussian not in the shortlist is simply approximated. This paper examines new techniques for obtaining "good" shortlists. All the new schemes make use of state information, specifically which state each of the components belongs to. In this way a maximum number of components per state may be specified, hence reducing the size of the shortlist. The first technique introduced is a simple extension of the standard GS one, which uses this state information. Then, more complex schemes based on maximising the likelihood of the training data are proposed. These new approaches are compared...
Speech Recognition Using Augmented Conditional Random Fields
"... Abstract—Acoustic modeling based on hidden Markov models (HMMs) is employed by stateoftheart stochastic speech recognition systems. Although HMMs are a natural choice to warp the time axis and model the temporal phenomena in the speech signal, their conditional independence properties limit their ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
Abstract—Acoustic modeling based on hidden Markov models (HMMs) is employed by stateoftheart stochastic speech recognition systems. Although HMMs are a natural choice to warp the time axis and model the temporal phenomena in the speech signal, their conditional independence properties limit their ability to model spectral phenomena well. In this paper, a new acoustic modeling paradigm based on augmented conditional random fields (ACRFs) is investigated and developed. This paradigm addresses some limitations of HMMs while maintaining many of the aspects which have made them successful. In particular, the acoustic modeling problem is reformulated in a data driven, sparse, augmented space to increase discrimination. Acoustic context modeling is explicitly integrated to handle the sequential phenomena of the speech signal. We present an efficient framework for estimating these models that ensures scalability and generality. In the TIMIT
Discriminative Training of Hidden Markov Models
, 1998
"... vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4 Finding the Best Transcription . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.5 Setting the Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 Objective Functions 19 3.1 Properties of Maximum Likelihood Estimators . . . . . . . . . . . . . . . . . . . 19 3.2 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Maximum Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 Frame Discrimination . . . . . . . . . . . . . . . . ....
Fast Likelihood Computation Methods For Continuous Mixture Densities In Large Vocabulary Speech Recognition
 In Proc. of the European Conf. on Speech Communication and Technology
, 1997
"... This paper studies algorithms for reducing the computational effort of the mixture density calculations in HMMbased speech recognition systems. These likelihood calculations take about 70 \Gamma 85% of the total recognition time in the RWTH system for large vocabulary continuous speech recognition. ..."
Abstract

Cited by 14 (10 self)
 Add to MetaCart
This paper studies algorithms for reducing the computational effort of the mixture density calculations in HMMbased speech recognition systems. These likelihood calculations take about 70 \Gamma 85% of the total recognition time in the RWTH system for large vocabulary continuous speech recognition. To reduce the computational cost of the likelihood calculations, we investigate several space partitioning methods. A detailed comparison of these techniques is given on the North American Business Corpus (NAB'94) for a 20 000word task. As a result, the socalled projection search algorithm in combination with the VQ method reduces the cost of likelihood computation by a factor of about 8 with no significant loss in the word recognition accuracy. 1.
Fabbrizio, “Intelligent virtual agents for contact center automation
 in IEEE Signal Processing Magazine, Volume 22, Number 5
, 2005
"... [A humanmachine communication system for nextgeneration contact centers] ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
[A humanmachine communication system for nextgeneration contact centers]
Subspace constrained gaussian mixture models for speech recognition
 IEEE Transactions on Speech and Audio Processing
, 2005
"... Abstract — A standard approach to automatic speech recognition uses Hidden Markov Models whose state dependent distributions are Gaussian mixture models. Each Gaussian can be viewed as an exponential model whose features are linear and quadratic monomials in the acoustic vector. We consider here mod ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
Abstract — A standard approach to automatic speech recognition uses Hidden Markov Models whose state dependent distributions are Gaussian mixture models. Each Gaussian can be viewed as an exponential model whose features are linear and quadratic monomials in the acoustic vector. We consider here models in which the weight vectors of these exponential models are constrained to lie in an affine subspace shared by all the Gaussians. This class of models includes Gaussian models with linear constraints placed on the precision (inverse covariance) matrices (such as diagonal covariance, MLLT, or EMLLT) as well as the LDA/HLDA models used for feature selection which tie the part of the Gaussians in the directions not used for discrimination. In this paper we present algorithms for training these models using a maximum likelihood criterion. We present experiments on both small vocabulary, resource constrained, grammar based tasks as well as large vocabulary, unconstrained resource tasks to explore the rather large parameter space of models that fit within our framework. In particular, we demonstrate significant improvements can be obtained in both word error rate and computational complexity. I.
Use Of Gaussian Selection In Large Vocabulary Continuous Speech Recognition Using HMMs
, 1996
"... This paper investigates the use of Gaussian Selection (GS) to reduce the state likelihood computation in HMMbased systems. These likelihood calculations contribute significantly (30 to 70%) to the computational load. Previously, it has been reported that when GS is used on large systems the recogni ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
This paper investigates the use of Gaussian Selection (GS) to reduce the state likelihood computation in HMMbased systems. These likelihood calculations contribute significantly (30 to 70%) to the computational load. Previously, it has been reported that when GS is used on large systems the recognition accuracy tends to degrade above a \Theta3 reduction in likelihood computation. To explain this degradation, this paper investigates the tradeoffs necessary between achieving good state likelihoods and low computation. In addition, the problem of unseen states in a cluster is examined. It is shown that further improvements are possible. For example, using a different assignment measure, with a constraint on the number of components per state per cluster, enabled the recognition accuracy on a 5k speakerindependent task to be maintained up to a \Theta5 reduction in likelihood computation. 1. INTRODUCTION In recent years, high accuracy large vocabulary continuous speech recognition sys...