Results 1 - 10
of
20
Uncertainty decoding for noise robust speech recognition
- in Proc. Interspeech
, 2004
"... This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings ..."
Abstract
-
Cited by 26 (8 self)
- Add to MetaCart
This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings
Adaptive training with joint uncertainty decoding for robust recognition of noisy data
- IN PROCEEDINGS OF ICASSP, VOLUME IV
, 2007
"... Standard noise compensation techniques for automatic speech recognition assume a clean trained acoustic model. What is thought of as “clean” data, may still have a variety of speakers, different channels and varying noise conditions. Hence it may be more reasonable to consider such data multi-condit ..."
Abstract
-
Cited by 15 (13 self)
- Add to MetaCart
Standard noise compensation techniques for automatic speech recognition assume a clean trained acoustic model. What is thought of as “clean” data, may still have a variety of speakers, different channels and varying noise conditions. Hence it may be more reasonable to consider such data multi-conditional for multistyle training. This paper shows that multistyle models benefit from VTS compensation or Joint uncertainty decoding by reducing the mismatch between training and test. An EM-based noise estimation procedure that produces ML VTS or Joint noise models is also described. Alternatively, adaptive training with Joint uncertainty transforms factors out the noise from the data. The uncertainty variance bias de-weights observations in the training data where the SNR is low. This property allows data with a wide SNR range to be used and produces canonical models that truly represent clean speech, whereas multistyle trained models must account for all acoustic variation associated with different noise conditions. This paper presents Joint adaptive training including formula for estimating the transforms and canonical model parameters. Experiments are conducted on the
Issues with uncertainty decoding for noise robust speech recognition
- Speech Communication
, 2008
"... Interest is growing in a class of robustness algorithms that exploit the notion of uncertainty introduced by environmental noise. The majority of these techniques share the property that the uncertainty of an observation due to noise is propagated to the recogniser, resulting in increased model vari ..."
Abstract
-
Cited by 14 (9 self)
- Add to MetaCart
Interest is growing in a class of robustness algorithms that exploit the notion of uncertainty introduced by environmental noise. The majority of these techniques share the property that the uncertainty of an observation due to noise is propagated to the recogniser, resulting in increased model variances. Using appropriate approximations, efficient implementations may be obtained, with the goal of achieving near model-based performance without the associated computational cost. Unfortunately, uncertainty decoding forms that compute the uncertainty in the front-end and pass this to the decoder may suffer from a theoretical problem in low signal-to-noise ratio conditions. This report discusses how this fundamental issue arises, and demonstrates it through two schemes: SPLICE with uncertainty and front-end Joint uncertainty decoding. A method to mitigate this in theJoint form is presented, as well as how SPLICE implicitly addresses it. However, it is shown that a model-based Joint uncertainty decoding approach does not suffer from this limitation, like these front-end forms do, and is also competitive computationally. The issues described and performance of the various schemes are examined on two artificially corrupted corpora: AURORA 2.0 digit recognition database and the thousand-word Resource Management task. 2 1
Discriminative classifiers with adaptive kernels for noise robust speech recognition
- Comput. Speech Lang
, 2010
"... Discriminative classifiers are a popular approach to solving classification problems. However one of the problems with these approaches, in particular kernel based classifiers such as Support Vector Machines (SVMs), is that they are hard to adapt to mismatches between the training and test data. Thi ..."
Abstract
-
Cited by 12 (10 self)
- Add to MetaCart
Discriminative classifiers are a popular approach to solving classification problems. However one of the problems with these approaches, in particular kernel based classifiers such as Support Vector Machines (SVMs), is that they are hard to adapt to mismatches between the training and test data. This paper describes a scheme for overcoming this problem for speech recognition in noise by adapting the kernel rather than the SVM decision boundary. Generative kernels, defined using generative models, are one type of kernel that allows SVMs to handle sequence data. By compensating the parameters of the generative models for each noise condition noise-specific generative kernels can be obtained. These can be used to train a noiseindependent SVM on a range of noise conditions, which can then be used with a test-set noise kernel for classification. The noise-specific kernels used in this paper are based on Vector Taylor Series (VTS) model-based compensation. VTS allows all the model parameters to be compensated and the background noise to be estimated in a maximum likelihood fashion. A brief discussion of VTS, and the optimisation of the mismatch function representing the impact of noise on the clean speech, is also included. Experiments using these VTS-based test-set noise kernels were run on the AURORA 2 continuous digit task. The proposed SVM rescoring scheme yields large gains in performance over the VTS compensated models. Key words: speech recognition, noise robustness, support vector machines, generative kernels
EXTENDED VTS FOR NOISE-ROBUST SPEECH RECOGNITION
"... Model compensation is a standard way of improving speech recognisers’ robustness to noise. Currently popular schemes are based on vector Taylor series (VTS) compensation. They often use the continuous time approximation to compensate dynamic parameters. In this paper, the accuracy of dynamic paramet ..."
Abstract
-
Cited by 10 (9 self)
- Add to MetaCart
Model compensation is a standard way of improving speech recognisers’ robustness to noise. Currently popular schemes are based on vector Taylor series (VTS) compensation. They often use the continuous time approximation to compensate dynamic parameters. In this paper, the accuracy of dynamic parameter compensation is improved by representing the dynamic features as a linear transformation of a window of static features. A modified version of VTS compensation is applied to the distribution of the window of static features and, importantly, their correlations. These compensated distributions are then transformed to standard static and dynamic distributions. The proposed scheme outperformed the standard VTS scheme by about 10 % relative. Index Terms — Speech recognition, acoustic noise, robustness 1.
Transforming Features to Compensate Speech Recogniser Models for Noise
"... To make speech recognisers robust to noise, either the features or the models can be compensated. Feature enhancement is often fast; model compensation is often more accurate, because it predicts the corrupted speech distribution. It is therefore able, for example, to take uncertainty about the clea ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
To make speech recognisers robust to noise, either the features or the models can be compensated. Feature enhancement is often fast; model compensation is often more accurate, because it predicts the corrupted speech distribution. It is therefore able, for example, to take uncertainty about the clean speech into account. This paper re-analyses the recently-proposed predictive linear transformations for noise compensation as minimising the KL divergence between the predicted corrupted speech and the adapted models. New schemes are then introduced which apply observation-dependent transformations in the front-end to adapt the back-end distributions. One applies transforms in the exact same manner as the popular minimum mean square error (MMSE) feature enhancement scheme, and is as fast. The new method performs better on AURORA 2. Index Terms: speech recognition, noise robustness 1.
Covariance Modelling for Noise-Robust Speech Recognition
"... Model compensation is a standard way of improving speech recognisers’ robustness to noise. Most model compensation techniques produce diagonal covariances. However, this fails to handle any changes in the feature correlations due to the noise. This paper presents a scheme that allows full-covariance ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
Model compensation is a standard way of improving speech recognisers’ robustness to noise. Most model compensation techniques produce diagonal covariances. However, this fails to handle any changes in the feature correlations due to the noise. This paper presents a scheme that allows full-covariance matrices to be estimated. One problem is that full covariance matrix estimation will be more sensitive approximations, those for the dynamic parameters are known to crude. In this paper a linear transformation of a window of consecutive frames is used as the basis for dynamic parameter compensation. A second problem is that the resulting full covariance matrices slow down decoding. This is addressed by using predictive linear transforms that decorrelate the feature space, so that the decoder can then use diagonal covariance matrices. On a noise-corrupted Resource Management task, the proposed scheme outperformed the standard VTS compensation scheme.
Discriminative Classifiers with Generative Kernels for Noise Robust ASR
"... Discriminative classifiers are a popular approach to solving classification problems. However one of the problems with these approaches, in particular kernel based classifiers such as Support Vector Machines (SVMs), is that they are hard to adapt to mismatches between the training and test data. Thi ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Discriminative classifiers are a popular approach to solving classification problems. However one of the problems with these approaches, in particular kernel based classifiers such as Support Vector Machines (SVMs), is that they are hard to adapt to mismatches between the training and test data. This paper describes a scheme for overcoming this problem for speech recognition in noise. Generative kernels, defined using generative models, allow SVMs to handle sequence data. By compensating the generative models for the noise conditions noise-specific generative kernels can be obtained. These can be used to train a noise-independent SVM on a range of noise conditions, which can then be used with a test-set noise kernel for classification. Initial experiments using an idealised version of model-based compensation were run on the AURORA 2.0 continuous digit task. The proposed scheme yielded large gains in performance over the compensated models.
Structured log linear models for noise robust speech recognition
- Signal Processing Letters, IEEE
, 2010
"... [ The use of discriminative models for structured classification tasks, such as automatic speech recognition is becoming increasingly popular. The major contribution of this work is we proposed a large margin structured log-linear model for noise robust continuous ASR. 1 An important aspect of log-l ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
[ The use of discriminative models for structured classification tasks, such as automatic speech recognition is becoming increasingly popular. The major contribution of this work is we proposed a large margin structured log-linear model for noise robust continuous ASR. 1 An important aspect of log-linear models is the form of the features. The features used in our structured log linear model are derived from generative kernels. This provides an elegant way of combining generative and discriminative models to handle time-varying data. Additionally, since the features are based on the generative models, model-based compensation can be easily performed for noise robustness. Third, the designed joint feature space can be decomposed at the arc level. This allows efficient decoding and training with lattices, which is important for any larger vocabulary extensions. Previous work in this area is extended in two important directions. First, instead of using CML training which is commonly used for discriminative models, this paper describes efficient large margin training for sentence-level log linear models based on lattices. Depending on the nature of the joint feature-space and labels, we have proved that this form of model is closely related to structured SVMs and Multiclass SVMs. Second, efficient lattice-based classification of continuous data is also performed incorporating a joint feature space. This novel model combines generative kernels, discriminative models, efficient lattice-based large margin training and modelbased noise compensation. It is evaluated on a noise corrupted continuous digit task: AURORA 2.0. Results on the AURORA 2 demonstrate that modelling the structure information yields significant improvements.]
Structured Support Vector Machines for Noise Robust Continuous Speech Recognition
"... The use of discriminative models is an interesting alternative to generative models for speech recognition. This paper examines one form of these models, structured support vector machines (SVMs), for noise robust speech recognition. One important aspect of structured SVMs is the form of the joint f ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
The use of discriminative models is an interesting alternative to generative models for speech recognition. This paper examines one form of these models, structured support vector machines (SVMs), for noise robust speech recognition. One important aspect of structured SVMs is the form of the joint feature space. In this work features based on generative models are used, which allows model-based compensation schemes to be applied to yield robust joint features. However, these features require the segmentation of frames into words, or subwords, to be specified. In previous work this segmentation was obtained using generative models. Here the segmentations are refined using the parameters of the structured SVM. A Viterbilike scheme for obtaining “optimal ” segmentations, and modifications to the training algorithm to allow them to be efficiently used, are described. The performance of the approach is evaluated on a noise corrupted continuous digit task: AURORA 2. Index Terms: speech recognition, structural SVMs, optimal alignment, large margin, log linear model

