Results 1 -
6 of
6
Modeling Inverse Covariance Matrices by Basis Expansion
, 2003
"... This paper proposes a new covariance modeling technique for Gaussian Mixture Models. Specifically the inverse covariance (precision) matrix of each Gaussian is expanded in a rank-1 basis i.e., j = P j = k , 2 R; a k 2 R . A generalized EM algorithm is proposed to obtain maximum likelihood paramete ..."
Abstract
-
Cited by 31 (9 self)
- Add to MetaCart
This paper proposes a new covariance modeling technique for Gaussian Mixture Models. Specifically the inverse covariance (precision) matrix of each Gaussian is expanded in a rank-1 basis i.e., j = P j = k , 2 R; a k 2 R . A generalized EM algorithm is proposed to obtain maximum likelihood parameter estimates for the basis set fa k a k=1 and the expansion coefficients f g. This model, called the Extended Maximum Likelihood Linear Transform (EMLLT) model, is extremely flexible: by varying the number of basis elements from D = d to D = d(d + 1)=2 one gradually moves from a Maximum Likelihood Linear Transform (MLLT) model to a full-covariance model. Experimental results on two speech recognition tasks show that the EMLLT model can give relative gains of up to 35% in the word error rate over a standard diagonal covariance model, 30% over a standard MLLT model.
Large Vocabulary Conversational Speech Recognition With The Extended Maximum Likelihood Linear Transformation (EMLLT) Model
- in Proc. Eurospeech
, 2002
"... This paper applies the recently proposed Extended Maximum Likelihood Linear Transformation (EMLLT) model in a Speaker Adaptive Training (SAT) context on the Switchboard database. Adaptation is carried out with maximum likelihood estimation of linear transforms for the means, precisions (inverse cova ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
This paper applies the recently proposed Extended Maximum Likelihood Linear Transformation (EMLLT) model in a Speaker Adaptive Training (SAT) context on the Switchboard database. Adaptation is carried out with maximum likelihood estimation of linear transforms for the means, precisions (inverse covariances) and the feature-space under the EMLLT model. This paper shows the first experimental evidence that significant word-error-rate improvements can be achieved with the EMLLT model (in both VTL and VTL+SAT training contexts) over a state-of-the-art diagonal covariance model in a difficult large-vocabulary conversational speech recognition task. The improvements were of the order of 1% absolute in multiple scenarios.
Maximum likelihood training of subspaces for inverse covariance modeling
- in Proc. ICASSP
, 2003
"... Speech recognition systems typically use mixtures of diagonal Gaussians to model the acoustics. Using Gaussians with a more general covariance structure can give improved performance; EM-LLT [1] and SPAM [2] models give improvements by restricting the inverse covariance to a linear/affine subspace s ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
Speech recognition systems typically use mixtures of diagonal Gaussians to model the acoustics. Using Gaussians with a more general covariance structure can give improved performance; EM-LLT [1] and SPAM [2] models give improvements by restricting the inverse covariance to a linear/affine subspace spanned by rank one and full rank matrices respectively. In this paper we consider training these subspaces to maximize likelihood. For EMLLT ML training the subspace results in significant gains over the scheme proposed in [1]. For SPAM ML training of the subspace slightly improves performance over the method reported in [2]. For the same subspace size an EMLLT model is more efficient computationally than a SPAM model, while the SPAM model is more accurate. This paper proposes a hybrid method of structuring the inverse covariances that both has good accuracy and is computationally efficient. 1.
Subspace constrained gaussian mixture models for speech recognition
- IEEE Transactions on Speech and Audio Processing
, 2005
"... Abstract — A standard approach to automatic speech recognition uses Hidden Markov Models whose state dependent distributions are Gaussian mixture models. Each Gaussian can be viewed as an exponential model whose features are linear and quadratic monomials in the acoustic vector. We consider here mod ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Abstract — A standard approach to automatic speech recognition uses Hidden Markov Models whose state dependent distributions are Gaussian mixture models. Each Gaussian can be viewed as an exponential model whose features are linear and quadratic monomials in the acoustic vector. We consider here models in which the weight vectors of these exponential models are constrained to lie in an affine subspace shared by all the Gaussians. This class of models includes Gaussian models with linear constraints placed on the precision (inverse covariance) matrices (such as diagonal covariance, MLLT, or EMLLT) as well as the LDA/HLDA models used for feature selection which tie the part of the Gaussians in the directions not used for discrimination. In this paper we present algorithms for training these models using a maximum likelihood criterion. We present experiments on both small vocabulary, resource constrained, grammar based tasks as well as large vocabulary, unconstrained resource tasks to explore the rather large parameter space of models that fit within our framework. In particular, we demonstrate significant improvements can be obtained in both word error rate and computational complexity. I.
Discriminative estimation of subspace precision and mean (SPAM) models
- in Proc. Eurospeech
, 2003
"... The SPAM model was recently proposed as a very general method for modeling Gaussians with constrained means and covariances. It has been shown to yield significant error rate improvements over other methods of constraining covariances such as diagonal, semi-tied covariances, and extended maximum lik ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
The SPAM model was recently proposed as a very general method for modeling Gaussians with constrained means and covariances. It has been shown to yield significant error rate improvements over other methods of constraining covariances such as diagonal, semi-tied covariances, and extended maximum likelihood linear transformations. In this paper we address the problem of discriminative estimation of SPAM model parameters, in an attempt to further improve its performance. We present discriminative estimation under two criteria: maximum mutual information (MMI) and an “error-weighted ” training. We show that both these methods individually result in over 20 % relative reduction in word error rate on a digit task over maximum likelihood (ML) estimated SPAM model parameters. We also show that a gain of as much as 28 % relative can be achieved by combining these two discriminative estimation techniques. The techniques developed in this paper also apply directly to an extension of SPAM called subspace constrained exponential models. 1.
Discriminative Estimation of Subspace Constrained Gaussian Mixture Models for Speech Recognition
"... Abstract — In this paper we study discriminative training of acoustic models for speech recognition under two criteria: maximum mutual information (MMI) and a novel “error weighted” training technique. We present a proof that the standard MMI training technique is valid for a very general class of a ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract — In this paper we study discriminative training of acoustic models for speech recognition under two criteria: maximum mutual information (MMI) and a novel “error weighted” training technique. We present a proof that the standard MMI training technique is valid for a very general class of acoustic models with any kind of parameter tying. We report experimental results for subspace constrained Gaussian mixture models (SCG-MMs), where the exponential model weights of all Gaussians are required to belong to a common “tied ” subspace, as well as for Subspace Precision and Mean (SPAM) models which impose separate subspace constraints on the precision matrices (i.e. inverse covariance matrices) and means. It has been shown previously that SCGMMs and SPAM models generalize and yield significant error rate improvements over previously considered model classes such as diagonal models, models with semi-tied covariances, and EMLLT (extended maximum likelihood linear transformation) models. We show here that MMI and error weighted training each individually result in over 20 % relative reduction in word error rate on a digit task over maximum likelihood (ML) training. We also show that a gain of as much as 28 % relative can be achieved by combining these two discriminative estimation techniques. I.

