Results 1 
9 of
9
Modeling Inverse Covariance Matrices by Basis Expansion
, 2003
"... This paper proposes a new covariance modeling technique for Gaussian Mixture Models. Specifically the inverse covariance (precision) matrix of each Gaussian is expanded in a rank1 basis i.e., j = P j = k , 2 R; a k 2 R . A generalized EM algorithm is proposed to obtain maximum likelihood paramete ..."
Abstract

Cited by 34 (9 self)
 Add to MetaCart
This paper proposes a new covariance modeling technique for Gaussian Mixture Models. Specifically the inverse covariance (precision) matrix of each Gaussian is expanded in a rank1 basis i.e., j = P j = k , 2 R; a k 2 R . A generalized EM algorithm is proposed to obtain maximum likelihood parameter estimates for the basis set fa k a k=1 and the expansion coefficients f g. This model, called the Extended Maximum Likelihood Linear Transform (EMLLT) model, is extremely flexible: by varying the number of basis elements from D = d to D = d(d + 1)=2 one gradually moves from a Maximum Likelihood Linear Transform (MLLT) model to a fullcovariance model. Experimental results on two speech recognition tasks show that the EMLLT model can give relative gains of up to 35% in the word error rate over a standard diagonal covariance model, 30% over a standard MLLT model.
Discriminative Estimation of Subspace Constrained Gaussian Mixture Models for Speech Recognition
"... Abstract — In this paper we study discriminative training of acoustic models for speech recognition under two criteria: maximum mutual information (MMI) and a novel “error weighted” training technique. We present a proof that the standard MMI training technique is valid for a very general class of a ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Abstract — In this paper we study discriminative training of acoustic models for speech recognition under two criteria: maximum mutual information (MMI) and a novel “error weighted” training technique. We present a proof that the standard MMI training technique is valid for a very general class of acoustic models with any kind of parameter tying. We report experimental results for subspace constrained Gaussian mixture models (SCGMMs), where the exponential model weights of all Gaussians are required to belong to a common “tied ” subspace, as well as for Subspace Precision and Mean (SPAM) models which impose separate subspace constraints on the precision matrices (i.e. inverse covariance matrices) and means. It has been shown previously that SCGMMs and SPAM models generalize and yield significant error rate improvements over previously considered model classes such as diagonal models, models with semitied covariances, and EMLLT (extended maximum likelihood linear transformation) models. We show here that MMI and error weighted training each individually result in over 20 % relative reduction in word error rate on a digit task over maximum likelihood (ML) training. We also show that a gain of as much as 28 % relative can be achieved by combining these two discriminative estimation techniques. I.
Adaptation of front end parameters in a speech recognizer
"... In this paper we consider the problem of adapting parameters of the algorithm used for extraction of features. Typical speech recognition systems use a sequence of modules to extract features which are then used for recognition. We present a method to adapt the parameters in these modules under a va ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
In this paper we consider the problem of adapting parameters of the algorithm used for extraction of features. Typical speech recognition systems use a sequence of modules to extract features which are then used for recognition. We present a method to adapt the parameters in these modules under a variety of criteria, e.g maximum likelihood, maximum mutual information. This method works under the assumption that the functions that the modules implement are differentiable with respect to their inputs and parameters. We use this framework to optimize a linear transform preceding the linear discriminant analysis (LDA) matrix and show that it gives significantly better performance than a linear transform after the LDA matrix with small amounts of data. We show that linear transforms can be estimated by directly optimizing likelihood or the MMI objective without using auxiliary functions. We also apply the method to optimize the Mel bins, and the compression power in a system that uses power law compression. 1.
Structured Precision Matrix Modelling for Speech Recognition
, 2006
"... Declaration This dissertation is the result of my own work and includes nothing which is the outcome of the work done in collaboration, except where stated. It has not been submitted in whole or part for a degree at any other university. The length of this thesis including footnotes and appendices i ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Declaration This dissertation is the result of my own work and includes nothing which is the outcome of the work done in collaboration, except where stated. It has not been submitted in whole or part for a degree at any other university. The length of this thesis including footnotes and appendices is approximately 53,000 words. ii Summary The most extensively and successfully applied acoustic model for speech recognition is the Hidden Markov Model (HMM). In particular, a multivariate Gaussian Mixture Model (GMM) is typically used to represent the output density function of each HMM state. For reasons of efficiency, the covariance matrix associated with each Gaussian component is assumed diagonal and the probability of successive observations is assumed independent given the HMM state sequence. Consequently, the spectral (intraframe) and temporal (interframe) correlations are poorly modelled. This thesis investigates ways of improving these aspects by extending the standard HMM. Parameters for these extended models are estimated discriminatively using the
Comparison of Subspace Methods for Gaussian Mixture Models in Speech Recognition
"... Speech recognizers typically use highdimensional feature vectors to capture the essential cues for speech recognition purposes. The acoustics are then commonly modeled with a Hidden Markov Model with Gaussian Mixture Models as observation probability density functions. Using unrestricted Gaussian p ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Speech recognizers typically use highdimensional feature vectors to capture the essential cues for speech recognition purposes. The acoustics are then commonly modeled with a Hidden Markov Model with Gaussian Mixture Models as observation probability density functions. Using unrestricted Gaussian parameters might lead to intolerable model costs both evaluation and storagewise, which limits their practical use only to some highend systems. The classical approach to tackle with these problems is to assume independent features and constrain the covariance matrices to being diagonal. This can be thought as constraining the second order parameters to lie in a fixed subspace consisting of rank1 terms. In this paper we discuss the differences between recently proposed subspace methods for GMMs with emphasis placed on the applicability of the models to a practical LVCSR system. Index Terms: speech recognition, acoustic modeling, Gaussian mixture, multivariate normal distribution, subspace method
Diagonal Priors for Full Covariance Speech Recognition
"... Abstract—We investigate the use of full covariance Gaussians for largevocabulary speech recognition. The large number of parameters gives high modelling power, but when training data is limited, the standard sample covariance matrix is often poorly conditioned, and has high variance. We explain how ..."
Abstract
 Add to MetaCart
Abstract—We investigate the use of full covariance Gaussians for largevocabulary speech recognition. The large number of parameters gives high modelling power, but when training data is limited, the standard sample covariance matrix is often poorly conditioned, and has high variance. We explain how these problems may be solved by the use of a diagonal covariance smoothing prior, and relate this to the shrinkage estimator, for which the optimal shrinkage parameter may itself be estimated from the training data. We also compare the use of generatively and discriminatively trained priors. Results are presented on a large vocabulary conversational telephone speech recognition task. I.
IJDAR manuscript No. (will be inserted by the editor) Building Compact Recognizers of Handwritten Chinese Characters Using Precision Constrained Gaussian Model, Minimum Classification Error Training and Parameter Compression
"... Abstract In our previous work, a socalled precision constrained Gaussian model (PCGM) was proposed for character modeling to design compact recognizers of handwritten Chinese characters. A maximum likelihood training procedure was developed to estimate model parameters from training data. In this p ..."
Abstract
 Add to MetaCart
Abstract In our previous work, a socalled precision constrained Gaussian model (PCGM) was proposed for character modeling to design compact recognizers of handwritten Chinese characters. A maximum likelihood training procedure was developed to estimate model parameters from training data. In this paper, we extend the above work by using minimum classification error (MCE) training to improve recognition accuracy and using both split vector quantization and scalar quantization techniques to further compress model parameters. Experimental results on a handwritten character recognition task with a vocabulary of 2965 Kanji characters demonstrate that MCEtrained and compressed PCGMbased classifiers can achieve much higher recognition accuracies than their counterparts based on traditional modified quadratic discriminant function (MQDF) when the footprint of the classifiers has to be made very small, e.g., less than 2MB.
unknown title
"... We describe an acoustic modeling approach in which all phonetic states share a common Gaussian Mixture Model structure, and the means and mixture weights vary in a subspace of the total parameter space. We call this a Subspace Gaussian Mixture Model (SGMM). Globally shared parameters define the subs ..."
Abstract
 Add to MetaCart
We describe an acoustic modeling approach in which all phonetic states share a common Gaussian Mixture Model structure, and the means and mixture weights vary in a subspace of the total parameter space. We call this a Subspace Gaussian Mixture Model (SGMM). Globally shared parameters define the subspace. This style of acoustic model allows for a much more compact representation and gives better results than a conventional modeling approach, particularly with smaller amounts of training data.
unknown title
"... We describe an acoustic modeling approach in which all phonetic states share a common Gaussian Mixture Model structure, and the means and mixture weights vary in a subspace of the total parameter space. We call this a Subspace Gaussian Mixture Model (SGMM). Globally shared parameters define the subs ..."
Abstract
 Add to MetaCart
We describe an acoustic modeling approach in which all phonetic states share a common Gaussian Mixture Model structure, and the means and mixture weights vary in a subspace of the total parameter space. We call this a Subspace Gaussian Mixture Model (SGMM). Globally shared parameters define the subspace. This style of acoustic model allows for a much more compact representation and gives better results than a conventional modeling approach, particularly with smaller amounts of training data.