Results 1 - 10
of
15
Acoustical and Environmental Robustness in Automatic Speech Recognition
, 1990
"... This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment. These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in d ..."
Abstract
-
Cited by 145 (8 self)
- Add to MetaCart
This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment. These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in different acoustical environments, and when a desk-top microphone (rather than a close-talking microphone) is used for speech input. Without such processing, mismatches between training and testing conditions produce an unacceptable degradation in recognition accuracy. Two kinds of
A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition
- IEEE Transactions on Speech and Audio Processing
, 1996
"... is granted. A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition Ananth Sankar 2 and Chin-Hui Lee Speech Research Department AT&T Bell Laboratories Murray Hill, NJ 07974 1 Introduction Recently there has been much interest in the problem of improving the performanc ..."
Abstract
-
Cited by 86 (14 self)
- Add to MetaCart
is granted. A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition Ananth Sankar 2 and Chin-Hui Lee Speech Research Department AT&T Bell Laboratories Murray Hill, NJ 07974 1 Introduction Recently there has been much interest in the problem of improving the performance of automatic speech recognition (ASR) systems in adverse environments. When there is a mismatch between the training and testing environments, ASR systems suffer a degradation in performance. The goal of robust speech recognition is to remove the effect of this mismatch so as to bring the recognition performance as close as possible to the matched conditions. In speech recognition, the speech is usually modeled by a set of hidden Markov models (HMM) X . During recognition the observed utterance Y is decoded using these models. Due to the mismatch between training and testing conditions, this often results in a degradation in performance compared to the matched conditions. The mismatch b...
Survey of the State of the Art in Human Language Technology
, 1995
"... Contents 1 Spoken Language Input 1 Ron Cole & Victor Zue, chapter editors 1.1 Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 Victor Zue & Ron Cole 1.2 Speech Recognition : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 Victor Zue, Ron Cole, & Wayne Ward 1.3 Sig ..."
Abstract
-
Cited by 47 (0 self)
- Add to MetaCart
Contents 1 Spoken Language Input 1 Ron Cole & Victor Zue, chapter editors 1.1 Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 Victor Zue & Ron Cole 1.2 Speech Recognition : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 Victor Zue, Ron Cole, & Wayne Ward 1.3 Signal Representation : : : : : : : : : : : : : : : : : : : : : : : : : : 11 Melvyn J. Hunt 1.4 Robust Speech Recognition : : : : : : : : : : : : : : : : : : : : : : 17 Richard M. Stern 1.5 HMM Methods in Speech Recognition : : : : : : : : : : : : : : : 24 Renato De Mori & Fabio Brugnara 1.6 Language Representation : : : : : : : : : : : : : : : : : : : : : : : : 35 Salim Roukos 1.7 Speaker Recognition : : : : : : : : : : : : : : : : : : : : : : : : : : :<F35.37
The challenge of spoken language systems: Research directions for the nineties
- IEEE Transactions on Speech and Audio Processing
, 1995
"... Footnote This article is based on a February, 1992workshop sponsored by the National Science ..."
Abstract
-
Cited by 34 (5 self)
- Add to MetaCart
Footnote This article is based on a February, 1992workshop sponsored by the National Science
Environmental Adaptation for Robust Speech Recognition
, 1994
"... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1. Approaches to Overcoming Environmental Variability . . . . . . ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1. Approaches to Overcoming Environmental Variability . . . . . . . . . . . . . . 6 1.1.1. Re-Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.2. Multi-Style Training . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.3. Environmental Compensation Using Dynamic Adaptation . . . . . . . . . . 8 1.2. Towards Environment-Independent Recognition . . . . . . . . . . . . . . . . 8 1.2.1. Sources of Environmental Variability . . . . . . . . . . . . . . . . . . 9 1.2.2. Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . 9 1.3. Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 2 Overview of Environmental Robustness in Speech Recognition . . . . . . 12 2.1. Sources of Degradation...
Statistical methods for the enhancement of noisy speech
- PROC. IEEE INT. WORKSHOP ON ACOUSTIC ECHO AND NOISE CONTROL (IWAENC2003)
, 2003
"... With the advent and wide dissemination of mobile communications, speech processing systems must be made robust with respect to environmental noise. In fact, the performance of speech coders or speech recognition systems is degraded when the input signal contains a significant level of noise. As a re ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
With the advent and wide dissemination of mobile communications, speech processing systems must be made robust with respect to environmental noise. In fact, the performance of speech coders or speech recognition systems is degraded when the input signal contains a significant level of noise. As a result, speech quality, speech intelligibility, or recognition rate requirements cannot be met. Improvements are obtained when the speech processing system is combined with a speech enhancement preprocessor. In this paper we will outline algorithms for noise reduction which are based on statistics and optimal estimation techniques. The focus will be on estimation procedures for the spectral coefficients of the clean speech signal and on the estimation of the power spectral density of the background noise.
Speech Enhancement Using a MMSE Short Time Spectral Amplitude Estimator with Laplacian Speech Modeling,” submitted to
- IEEE Trans. Speech, Audio Proc
"... This paper focuses on optimal estimators of the magnitude spectrum for speech enhancement. We present an analytical solution for estimating in the MMSE sense the magnitude spectrum when the clean speech DFT coefficients are modeled by a Laplacian distribution and the noise DFT coefficients are model ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
This paper focuses on optimal estimators of the magnitude spectrum for speech enhancement. We present an analytical solution for estimating in the MMSE sense the magnitude spectrum when the clean speech DFT coefficients are modeled by a Laplacian distribution and the noise DFT coefficients are modeled by a Gaussian distribution. Furthermore, we derive the MMSE estimator under speech presence uncertainty and a Laplacian model. Results indicated that the Laplacian based MMSE estimator yielded less residual noise in the enhanced speech than the traditional Gaussian-based MMSE estimator. 1.
Spectral Estimation And Normalisation For Robust Speech Recognition
"... Speech recognition in adverse conditions remains a difficult but challenging problem. It is already shown [1] that normalisation of the dynamic range (SNR ) of the frequency channels in a mel scale triangular filterbank (MFCC) [2], improves the robustness against both additive and convolutional n ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Speech recognition in adverse conditions remains a difficult but challenging problem. It is already shown [1] that normalisation of the dynamic range (SNR ) of the frequency channels in a mel scale triangular filterbank (MFCC) [2], improves the robustness against both additive and convolutional noise. Nevertheless, because the method is based on a masking-technique, the improvement is small in the case of SNR values that are smaller than the target (normalised) SNR. A solution for this problem can be found in first enhancing the filterbank energies before the masking-technique is applied. For this purpose we developed a Non-linear Spectral Estimator (NSE) for speech recognition that operates on the log filterbank energies. NSE enhances these filterbank energies and makes use of SNR-normalisation also effective at very low SNRs. Experimental results are given on the NOISEX-92 [3] database. Better recognition performance is seen even at 0dB SNR.
Speech spectral modeling and enhancement based on autoregressive conditional heteroscedasticity models
, 2006
"... In this paper, we develop and evaluate speech enhancement algorithms, which are based on supergaussian generalized autoregressive conditional heteroscedasticity (GARCH) models in the short-time Fourier transform (STFT) domain. We consider three different statistical models, two fidelity criteria, an ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper, we develop and evaluate speech enhancement algorithms, which are based on supergaussian generalized autoregressive conditional heteroscedasticity (GARCH) models in the short-time Fourier transform (STFT) domain. We consider three different statistical models, two fidelity criteria, and two approaches for the estimation of the variances of the STFT coefficients. The statistical model is either Gaussian, Gamma or Laplacian; the fidelity criteria include minimum mean-squared error (MMSE) of the STFT coefficients and MMSE of the log-spectral amplitude (LSA); the spectral variance is estimated based on either the proposed GARCH models or the decision-directed method of Ephraim and Malah. We show that estimating the variance by the GARCH modeling method yields lower log-spectral distortion and higher perceptual evaluation of speech quality scores (PESQ, ITU-T P.862) than by using the decisiondirected method, whether the presumed statistical model is Gaussian, Gamma or Laplacian, and whether the fidelity criterion is MMSE of the STFT coefficients or MMSE of the LSA. furthermore while a gaussian model is inferior to the supergaussian models when USING the decision-directed method, the Gaussian model is superior when using the garch modeling method.

