Results 1 -
5 of
5
Dynamic compensation of hmm variances using the feature enhancement uncertainty computed from a parametric model of speech distortion
- IEEE Trans. Speech Audio Processing
, 2005
"... Abstract—This paper presents a new technique for dynamic, frame-by-frame compensation of the Gaussian variances in the hidden Markov model (HMM), exploiting the feature variance or uncertainty estimated during the speech feature enhancement process, to improve noise-robust speech recognition. The ne ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Abstract—This paper presents a new technique for dynamic, frame-by-frame compensation of the Gaussian variances in the hidden Markov model (HMM), exploiting the feature variance or uncertainty estimated during the speech feature enhancement process, to improve noise-robust speech recognition. The new technique provides an alternative to the Bayesian predictive classification decision rule by carrying out an integration over the feature space instead of over the model-parameter space, offering a much simpler system implementation, lower computational cost, and dynamic compensation capabilities at the frame level. The computation of the feature enhancement variances is carried out using a probabilistic and parametric model of speech distortion, free from the use of any stereo training data. Dynamic compensation of the Gaussian variances in the HMM recognizer is derived, which is simply enlarging the HMM Gaussian variances by the feature enhancement variances. Experimental evaluation using the full Aurora2 test data sets demonstrates a significant digit error rate reduction, averaged over all noisy and signal-to-noise-ratio conditions, compared with the baseline that did not exploit the enhancement variance information. When the true enhancement variances are used, further dramatic error rate reduction is observed, indicating the strong potential for the new technique and the strong need for high accuracy in estimating the variances associated with feature enhancement. All the results, using either the true variances of the enhanced features or the estimated ones, show that the greatest contribution to recognizer’s performance improvement is due to the use of the uncertainty for the static features, next due to the delta features, and the least due to the delta–delta features. Index Terms—Dynamic variance compensation, hidden Markov model (HMM) variance, noise-robust automatic speech recognition (ASR), parametric environment model, speech feature enhancement, uncertainty in feature enhancement. I.
Multimodal speaker identification using an adaptive classifier cascade based on modality reliability
- IEEE Transactions on Multimedia
, 2005
"... We present a multimodal open-set speaker identification system that integrates information coming from audio, face and lip motion modalities. For fusion of multiple modalities, we propose a new adaptive cascade rule that favors reliable modality combinations through a cascade of classifiers. The ord ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We present a multimodal open-set speaker identification system that integrates information coming from audio, face and lip motion modalities. For fusion of multiple modalities, we propose a new adaptive cascade rule that favors reliable modality combinations through a cascade of classifiers. The order of the classifiers in the cascade is adaptively determined based on the reliability of each modality combination. A novel reliability measure, that genuinely fits to the open-set speaker identification problem, is also proposed to assess accept or reject decisions of a classifier. A formal framework is developed based on probability of correct decision for analytical comparison of the proposed adaptive rule with other classifier combination rules. The proposed adaptive rule is more robust in the presence of unreliable modalities, and outperforms the hard-level max rule and soft-level weighted summation rule, provided that the employed reliability measure is effective in assessment of classifier decisions. Experimental results that support this assertion are provided.
Modelling, estimating and compensating low-bit rate coding distortion in speech recognition
- IEEE Trans. on SAP
, 2002
"... A solution to the problem of speech recognition with signals distorted by low-bit rate coders is presented in this paper. A model for the coding-decoding distortion, a HMM compensation method to include this model, and an EM-based adaptation algorithm to estimate this distortion are proposed here. M ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
A solution to the problem of speech recognition with signals distorted by low-bit rate coders is presented in this paper. A model for the coding-decoding distortion, a HMM compensation method to include this model, and an EM-based adaptation algorithm to estimate this distortion are proposed here. Medium vocabulary continuous-speech speaker-independent recognition experiments with 8 kbps G.729(CS-CELP), 13 kbps RPE-LTP (GSM), 5.3 kbps G723.1, 4.8 kbps FS-1016 and 32 kbps G.726(ADPCM) coders show that the approach described in this paper is able to dramatically reduce the effect of the coding distortion and, in some cases, gives a word accuracy higher than the baseline system with uncoded speech. Finally, the EM estimation algorithm requires only one adapting utterance and the approach described is certainly The evolution and popularity of cellular and TCP/IP networks has created the problem of improving the recognition accuracy for speech distorted by low-bit rate coders. The distortion of coding schemes in speech recognizers is difficult to model and is an open problem that cannot be solved by applying conventional noise cancelling techniques [1] such as spectral subtraction [2], cepstral mean subtraction [3] and RASTA
THE STOCHASTIC WEIGHTED VITERBI ALGORITHM: A FRAME WORK TO COMPENSATE ADDITIVE NOISE AND LOW – BIT RATE CODING DISTORTION
"... A solution to the problem of speech recognition with signals corrupted by additive noise and distorted by low-bit rate coders is presented in this paper. The additive noise and the coding distortion are cancelled according to the following scheme: firstly, the pdf of the clean coded-decoced speech i ..."
Abstract
- Add to MetaCart
A solution to the problem of speech recognition with signals corrupted by additive noise and distorted by low-bit rate coders is presented in this paper. The additive noise and the coding distortion are cancelled according to the following scheme: firstly, the pdf of the clean coded-decoced speech is estimated with an additive noise model; second, the pdf of the clean uncoded signal is also estimated with a coding distortion model; and finally, the HMM is compensated by using the expected value of the observation pdf in the context of the stochastic weighted Viterbi (SWV) algorithm. The approach leads to reductions as high as 50 % or 60 % in word error rate. 1.
FEATURE-DEPENDENT COMPENSATION IN SPEECH RECOGNITION
"... Several mismatch conditions can be modeled as an additive bias. This bias is considered independent of the observation vectors, although this approximation is not always accurate. In this paper the dependence of the bias on the observation vectors is taken into consideration in the context of compen ..."
Abstract
- Add to MetaCart
Several mismatch conditions can be modeled as an additive bias. This bias is considered independent of the observation vectors, although this approximation is not always accurate. In this paper the dependence of the bias on the observation vectors is taken into consideration in the context of compensating the GSM coding distortion in speech recognition. However, the results presented here can easily be generalized to deal with other types of mismatch. The coding-decoding distortion is modeled here as feature-dependent. This model is employed to propose an Expectation-Maximization (EM) estimation algorithm of the codingdecoding distortion that is able to cancel the effect of GSM coder with as few as one adapting utterance. Finally, the feature-dependent adaptation can give word error rate (WER) 26 % lower than the featureindependent model. 1.

