Results 1  10
of
14
Covariance Modelling for NoiseRobust Speech Recognition
"... Model compensation is a standard way of improving speech recognisers’ robustness to noise. Most model compensation techniques produce diagonal covariances. However, this fails to handle any changes in the feature correlations due to the noise. This paper presents a scheme that allows fullcovariance ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
Model compensation is a standard way of improving speech recognisers’ robustness to noise. Most model compensation techniques produce diagonal covariances. However, this fails to handle any changes in the feature correlations due to the noise. This paper presents a scheme that allows fullcovariance matrices to be estimated. One problem is that full covariance matrix estimation will be more sensitive approximations, those for the dynamic parameters are known to crude. In this paper a linear transformation of a window of consecutive frames is used as the basis for dynamic parameter compensation. A second problem is that the resulting full covariance matrices slow down decoding. This is addressed by using predictive linear transforms that decorrelate the feature space, so that the decoder can then use diagonal covariance matrices. On a noisecorrupted Resource Management task, the proposed scheme outperformed the standard VTS compensation scheme.
ModelBased Approaches to Handling Uncertainty
"... Abstract A powerful approach for handling uncertainty in observations is to modify the statistical model of the data to appropriately reflect this uncertainty. For the task of noise robust speech recognition, this requires modifying an underlying ”clean” acoustic model to be representative of speech ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
Abstract A powerful approach for handling uncertainty in observations is to modify the statistical model of the data to appropriately reflect this uncertainty. For the task of noise robust speech recognition, this requires modifying an underlying ”clean” acoustic model to be representative of speech in a particular target acoustic environment. This chapter describes the underlying concepts of modelbased noise compensation for robust speech recognition and how it can be applied to standard systems. The chapter will then consider important practical issues. These include: i) acoustic environment noise parameter estimation; ii) efficient acoustic model compensation and likelihood calculation; iii) and adaptive training to handle multistyle training data. The chapter will conclude by discussing the limitations of the current approaches and research options to address them. 1
Transforming Features to Compensate Speech Recogniser Models for Noise
"... To make speech recognisers robust to noise, either the features or the models can be compensated. Feature enhancement is often fast; model compensation is often more accurate, because it predicts the corrupted speech distribution. It is therefore able, for example, to take uncertainty about the clea ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
To make speech recognisers robust to noise, either the features or the models can be compensated. Feature enhancement is often fast; model compensation is often more accurate, because it predicts the corrupted speech distribution. It is therefore able, for example, to take uncertainty about the clean speech into account. This paper reanalyses the recentlyproposed predictive linear transformations for noise compensation as minimising the KL divergence between the predicted corrupted speech and the adapted models. New schemes are then introduced which apply observationdependent transformations in the frontend to adapt the backend distributions. One applies transforms in the exact same manner as the popular minimum mean square error (MMSE) feature enhancement scheme, and is as fast. The new method performs better on AURORA 2. Index Terms: speech recognition, noise robustness 1.
Asymptotically exact noisecorrupted speech likelihoods
 In Proc. InterSpeech, 2010
"... Model compensation techniques for noiserobust speech recognition approximate the corrupted speech distribution. This paper introduces a sampling method that, given speech and noise distributions and a mismatch function, in the limit calculates the corrupted speech likelihood exactly. Though it is t ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
(Show Context)
Model compensation techniques for noiserobust speech recognition approximate the corrupted speech distribution. This paper introduces a sampling method that, given speech and noise distributions and a mismatch function, in the limit calculates the corrupted speech likelihood exactly. Though it is too slow to compensate a speech recognition system, it enables a more finegrained assessment of compensation techniques, based on the KL divergence of individual components. This makes it possible to evaluate the impact of approximations that compensation schemes make, such as the form of the mismatch function. Index Terms: speech recognition, noise robustness 1.
Noise Compensation for Subspace Gaussian Mixture Models
"... Joint uncertainty decoding (JUD) is an effective modelbased noise compensation technique for conventional Gaussian mixture model (GMM) based speech recognition systems. In this paper, we apply JUD to subspace Gaussian mixture model (SGMM) based acoustic models. The total number of Gaussians in the ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Joint uncertainty decoding (JUD) is an effective modelbased noise compensation technique for conventional Gaussian mixture model (GMM) based speech recognition systems. In this paper, we apply JUD to subspace Gaussian mixture model (SGMM) based acoustic models. The total number of Gaussians in the SGMM acoustic model is usually much larger than for conventional GMMs, which limits the application of approaches which explicitly compensate each Gaussian, such as vector Taylor series (VTS). However, by clustering the Gaussian components into a number of regression classes, JUDbased noise compensation can be successfully applied to SGMM systems. We evaluate the JUD/SGMM technique using the Aurora 4 corpus, and the experimental results indicated that it is more accurate than conventional GMMbased systems using either VTS or JUD noise compensation. 1.
Nonlinear compensation using the gaussnewton method for noiserobust speech recognition
 Speech, and Language Processing, 20(8):2191–2206, 2012. References 387
"... Abstract—In this paper, we present the GaussNewton method as a unified approach to estimating noise parameters of the prevalent nonlinear compensation models, such as vector Taylor series (VTS), datadriven parallel model combination (DPMC), and unscented transform (UT), for noiserobust speech r ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract—In this paper, we present the GaussNewton method as a unified approach to estimating noise parameters of the prevalent nonlinear compensation models, such as vector Taylor series (VTS), datadriven parallel model combination (DPMC), and unscented transform (UT), for noiserobust speech recognition. While iterative estimation of noise means in a generalized EM framework has been widely known, we demonstrate that such approaches are variants of the GaussNewton method. Furthermore, we propose a novel noise variance estimation algorithm that is consistent with the GaussNewton principle. The formulation of the GaussNewton method reduces the noise estimation problem to determining the Jacobians of the corrupted speech parameters. For samplingbased compensations, we present two methods, sample Jacobian average (SJA) and crosscovariance (XCOV), to evaluate these Jacobians. The proposed noise estimation algorithm is evaluated for various compensation models on two tasks. The first is to fit a GMM model to artificially corrupted samples, and the second is to perform speech recognition on the Aurora 2 database. The significant performance improvements confirm the efficacy of the GaussNewton method to estimating the noise parameters of the nonlinear compensation models. Index Terms—GaussNewton method, nonlinear compensation, robust speech recognition, vector Taylor series. I.
Joint Uncertainty Decoding for Noise Robust Subspace Gaussian Mixture Models
"... Abstract—Joint uncertainty decoding (JUD) is a modelbased noise compensation technique for conventional Gaussian Mixture Model (GMM) based speech recognition systems. Unlike vector Taylor series (VTS) compensation which operates on the individual Gaussian components in an acoustic model, JUD cluste ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Joint uncertainty decoding (JUD) is a modelbased noise compensation technique for conventional Gaussian Mixture Model (GMM) based speech recognition systems. Unlike vector Taylor series (VTS) compensation which operates on the individual Gaussian components in an acoustic model, JUD clusters the Gaussian components into a smaller number of classes, sharing the compensation parameters for the set of Gaussians in a given class. This significantly reduces the computational cost. In this paper, we investigate noise compensation for subspace Gaussian mixture model (SGMM) based speech recognition systems using JUD. The total number of Gaussian components in an SGMM is typically very large. Therefore direct compensation of the individual Gaussian components, as performed by VTS, is computationally expensive. In this paper we show that JUDbased noise compensation can be successfully applied to SGMMs in a computationally efficient way. We evaluate the JUD/SGMM technique on the standard Aurora 4 corpus. Our experimental results indicate that the JUD/SGMM system results in lower word error rates compared with a conventional GMM system with either VTSbased or JUDbased noise compensation. Index Terms—subspace Gaussian mixture model, vector Taylor series, joint uncertainty decoding, noise robust ASR, Aurora 4
Incremental adaptation with vts and joint adaptively trained systems
 in Proc. Interspeech
, 2009
"... Recently adaptive training schemes using model based compensation approaches such as VTS and JUD have been proposed. Adaptive training allows the use of multienvironment training data whilst training a neutral, “clean”, acoustic model to be trained. This paper describes and assesses the advantages ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Recently adaptive training schemes using model based compensation approaches such as VTS and JUD have been proposed. Adaptive training allows the use of multienvironment training data whilst training a neutral, “clean”, acoustic model to be trained. This paper describes and assesses the advantages of using incremental, rather than batch, mode adaptation with these adaptively trained systems. Incremental adaptation reduces the latency during recognition, and has the possibility of reducing the error rate for slowly varying noise. The work is evaluated on a large scale multienvironment training configuration targeted at incar speech recognition. Results on incar collected test data indicate that incremental adaptation is an attractive option when using these adaptively trained systems. Index Terms: adaptive training, incremental adaptation, noise compensation
Importance Sampling to Compute Likelihoods of NoiseCorrupted Speech ✩
"... One way of making speech recognisers more robust to noise is model compensation. Rather than enhancing the incoming observations, model compensation techniques modify a recogniser’s stateconditional distributions so they model the speech in the target environment. Because the interaction between sp ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
One way of making speech recognisers more robust to noise is model compensation. Rather than enhancing the incoming observations, model compensation techniques modify a recogniser’s stateconditional distributions so they model the speech in the target environment. Because the interaction between speech and noise is nonlinear, even for Gaussian speech and noise the corrupted speech distribution has no closed form. Thus, model compensation methods approximate it with a parametric distribution, such as a Gaussian or a mixture of Gaussians. The impact of this approximation has never been quantified. This paper therefore introduces a nonparametric method to compute the likelihood of a corrupted speech observation. It uses sampling and, given speech and noise distributions and a mismatch function, is exact in the limit. It therefore gives a theoretical bound for model compensation. Though computing the likelihood is computationally expensive, the novel method enables a performance comparison based on the criterion that model compensation methods aim to minimise: the KL divergence to the ideal compensation. It gives the point where the KullbackLeibler (KL) divergence is zero. This paper examines the performance of various compensation methods, such as vector Taylor series (VTS) and datadriven parallel model combination (DPMC). It shows that more accurate modelling than GaussianforGaussian compensation improves the performance of speech recognition.
noiserobust speech recognition
, 2010
"... Model compensation techniques for noiserobust speech recognition approximate the corrupted speech distribution. This work introduces a sampling method that, given speech and noise distributions and a mismatch function, in the limit calculates the corrupted speech likelihood exactly. For this, it tr ..."
Abstract
 Add to MetaCart
(Show Context)
Model compensation techniques for noiserobust speech recognition approximate the corrupted speech distribution. This work introduces a sampling method that, given speech and noise distributions and a mismatch function, in the limit calculates the corrupted speech likelihood exactly. For this, it transforms the integral in the likelihood expression, and then applies sequential importance resampling. Though it is too slow to compensate a speech recognition system, it enables a more finegrained assessment of compensation techniques, based on the kl divergences to the ideal compensation for individual components. The kl divergence appears to predict the word error rate well. This technique also makes it possible to evaluate the impact of approximations that compensation schemes make. For example, this work examines the influence of the assumption that the corrupted speech distribution is Gaussian and diagonalising that Gaussian’s covariance. It also assesses the impact of a common approximation to the mismatch function for vts compensation, namely setting the