Results 1 - 10
of
12
Acoustical and Environmental Robustness in Automatic Speech Recognition
, 1990
"... This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment. These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in d ..."
Abstract
-
Cited by 145 (8 self)
- Add to MetaCart
This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment. These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in different acoustical environments, and when a desk-top microphone (rather than a close-talking microphone) is used for speech input. Without such processing, mismatches between training and testing conditions produce an unacceptable degradation in recognition accuracy. Two kinds of
Assessing Local Noise Level Estimation Methods
- SPEECH COMMUNICATION
, 1999
"... In this paper, we assess and compare two well-known methods for the local estimation of noise level in frequency subbands to a new one based of the following of lower signal energy envelope. Moreover we introduce, for those three approaches, a new pre-processing algorithm expected to better follow f ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
In this paper, we assess and compare two well-known methods for the local estimation of noise level in frequency subbands to a new one based of the following of lower signal energy envelope. Moreover we introduce, for those three approaches, a new pre-processing algorithm expected to better follow fast modulations of the noise energy. Speech periodicity property is used to update the noise level estimate during voiced parts of speech (without explicit detection of voiced portions) . This evaluation is performed on four different kinds of noise (both artificial and real noises) added to clean speech. The best approach is used for spectral subtraction in a speech recognition experiment and compared to more classical noise robust features (J-RASTA).
Environmental Adaptation for Robust Speech Recognition
, 1994
"... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1. Approaches to Overcoming Environmental Variability . . . . . . ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1. Approaches to Overcoming Environmental Variability . . . . . . . . . . . . . . 6 1.1.1. Re-Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.2. Multi-Style Training . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.3. Environmental Compensation Using Dynamic Adaptation . . . . . . . . . . 8 1.2. Towards Environment-Independent Recognition . . . . . . . . . . . . . . . . 8 1.2.1. Sources of Environmental Variability . . . . . . . . . . . . . . . . . . 9 1.2.2. Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . 9 1.3. Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 2 Overview of Environmental Robustness in Speech Recognition . . . . . . 12 2.1. Sources of Degradation...
Unsupervised spectral subtraction for noise-robust ASR
- In Proceedings of ASRU 2005
, 2005
"... This paper proposes a simple, computationally efficient 2-mixture model approach to discriminate between speech and background noise at the magnitude spectrogram level. It is directly derived from observations on real data, and can be used in a fully unsupervised manner, with the EM algorithm. In th ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
This paper proposes a simple, computationally efficient 2-mixture model approach to discriminate between speech and background noise at the magnitude spectrogram level. It is directly derived from observations on real data, and can be used in a fully unsupervised manner, with the EM algorithm. In this paper, the 2-mixture model is used in an “Unsupervised Spectral Subtraction ” scheme that can be applied as a pre-processing step for any acoustic feature extraction scheme, such as MFCCs or PLP. The goal is to improve noise-robustness of the acoustic features. Experimental results on both OGI Numbers 95 and Aurora 2 tasks yielded a major improvement on all noise conditions, while retaining a similar performance on clean conditions. 1.
Noise Estimation Without Explicit Speech, Non-speech Detection: a Comparison of Mean, Median and Modal Based Approaches
- in Proc. Eurospeech
, 2001
"... Automatic speech recognition performance tends to be degraded in noisy conditions. Spectral subtraction is a simple, popular approach of noise compensation. In conventional spectral subtraction [1, 2], noise statistics are updated during speech gaps and subtracted from a corrupt signal during speech ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
Automatic speech recognition performance tends to be degraded in noisy conditions. Spectral subtraction is a simple, popular approach of noise compensation. In conventional spectral subtraction [1, 2], noise statistics are updated during speech gaps and subtracted from a corrupt signal during speech intervals. Some means of explicit speech, non-speech detection is therefore essential. Recent proposals have avoided the problem of speech, non-speech detection [3, 4, 5, 6, 7] by continually updating noise estimates whether speech is present or not. In this paper, we evaluate two such approaches of noise estimation and compare their performance with standard noise estimation in hand-labelled speech gaps. Experimental results are reported with the conventional spectral subtraction framework on a 1500 speaker database. Results confirm that such approaches of noise estimation which do not rely on explicit speech, non-speech detection compare favourably with conventional noise estimation approaches. 1.
Statistical methods for the enhancement of noisy speech
- PROC. IEEE INT. WORKSHOP ON ACOUSTIC ECHO AND NOISE CONTROL (IWAENC2003)
, 2003
"... With the advent and wide dissemination of mobile communications, speech processing systems must be made robust with respect to environmental noise. In fact, the performance of speech coders or speech recognition systems is degraded when the input signal contains a significant level of noise. As a re ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
With the advent and wide dissemination of mobile communications, speech processing systems must be made robust with respect to environmental noise. In fact, the performance of speech coders or speech recognition systems is degraded when the input signal contains a significant level of noise. As a result, speech quality, speech intelligibility, or recognition rate requirements cannot be met. Improvements are obtained when the speech processing system is combined with a speech enhancement preprocessor. In this paper we will outline algorithms for noise reduction which are based on statistics and optimal estimation techniques. The focus will be on estimation procedures for the spectral coefficients of the clean speech signal and on the estimation of the power spectral density of the background noise.
Networks for Speech Enhancement
- in Handbook of Neural Networks for Speech Processing, Shigeru Katagiri, Ed
, 1998
"... Introduction 1.1. Background Speech enhancement is motivated by the need to improve the performance of voice communications systems in noisy conditions. Applications range from frontends for speech recognition systems, to enhancement of telecommunications in aviation, military, teleconferencing, a ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Introduction 1.1. Background Speech enhancement is motivated by the need to improve the performance of voice communications systems in noisy conditions. Applications range from frontends for speech recognition systems, to enhancement of telecommunications in aviation, military, teleconferencing, and cellular environments. The goal is either to improve the perceived quality of the speech, or to increase its intelligibility. Improving quality can be important for reducing listener fatigue in high stress and high noise environments. In the recording industry, improving the quality of recorded speech may be desirable even if the noise level is low to begin with. It is also a key way for telecommunications companies to increase customer satisfaction. Intelligibility can be measured in terms of speech recognition performance. Enhancement preprocessing techniques, however, have not proven successful at im- 2 Ha
Efficient Realtime Noise Estimation Without Explicit Speech, Non-speech Detection: An Assessment on the AURORA Corpus
- in Proc. Int. Conf. DSP
, 2002
"... Abstract: This paper addresses the problem of noise estimation for speech enhancement and automatic speech recognition. In the context of mobile telephony, there is a requirement for low resource algorithms which must run at real-time. This paper describes the implementation of a recently published ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract: This paper addresses the problem of noise estimation for speech enhancement and automatic speech recognition. In the context of mobile telephony, there is a requirement for low resource algorithms which must run at real-time. This paper describes the implementation of a recently published approach, termed quantile-based noise estimation, integrated within a conventional spectral subtraction framework. The novelty lies in the efficiency of the noise estimation process. Assessment is carried out on the AURORA corpus and demonstrates significant improvements in efficiency. Automatic speech recognition results show an average relative improvement of 26 % over the baseline. 1.
Robust Automatic Speech Recognition With Unreliable Data
, 1999
"... Theoretical and practical issues of some of the problems in robust automatic speech recognition (ASR) and some of the techniques that address them are presented in this report. The problem of the robustness of the ASR in real--life (as opposed to laboratory) conditions is paramount to the widespread ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Theoretical and practical issues of some of the problems in robust automatic speech recognition (ASR) and some of the techniques that address them are presented in this report. The problem of the robustness of the ASR in real--life (as opposed to laboratory) conditions is paramount to the widespread deployment of speech enabled products. The report reviews techniques used so far for robust ASR, ranging from simple spectrum subtraction to various types of model adaptation. A possible connection of robust ASR with the computational auditory scene analysis (CASA), methods for local Signal--to--Noise Ratio (SNR) estimation and classification/scoring with on--line adapted statistical models is discussed. The main focus is on the techniques that would allow for incorporation of CASA and local SNR estimates (used as methods for speech/non--speech separation) into the present prevailing stochastic pattern matching paradigms -- Hidden Markov models (HMM) and artificial neural networks (ANN). Th...
Spectral Estimation And Normalisation For Robust Speech Recognition
"... Speech recognition in adverse conditions remains a difficult but challenging problem. It is already shown [1] that normalisation of the dynamic range (SNR ) of the frequency channels in a mel scale triangular filterbank (MFCC) [2], improves the robustness against both additive and convolutional n ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Speech recognition in adverse conditions remains a difficult but challenging problem. It is already shown [1] that normalisation of the dynamic range (SNR ) of the frequency channels in a mel scale triangular filterbank (MFCC) [2], improves the robustness against both additive and convolutional noise. Nevertheless, because the method is based on a masking-technique, the improvement is small in the case of SNR values that are smaller than the target (normalised) SNR. A solution for this problem can be found in first enhancing the filterbank energies before the masking-technique is applied. For this purpose we developed a Non-linear Spectral Estimator (NSE) for speech recognition that operates on the log filterbank energies. NSE enhances these filterbank energies and makes use of SNR-normalisation also effective at very low SNRs. Experimental results are given on the NOISEX-92 [3] database. Better recognition performance is seen even at 0dB SNR.

