Results 1 - 10
of
71
Acoustical and Environmental Robustness in Automatic Speech Recognition
, 1990
"... This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment. These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in d ..."
Abstract
-
Cited by 145 (8 self)
- Add to MetaCart
This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment. These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in different acoustical environments, and when a desk-top microphone (rather than a close-talking microphone) is used for speech input. Without such processing, mismatches between training and testing conditions produce an unacceptable degradation in recognition accuracy. Two kinds of
Speech Analysis
, 1998
"... Contents 1 Introduction 4 1.1 What is Speech Analysis? . . . . . . . . . . . . . . . . . . . . 4 1.1.1 So what is an acoustic vector? . . . . . . . . . . . . . . 4 1.2 Why Speech Analysis? . . . . . . . . . . . . . . . . . . . . . . 4 1.3 The problems of speech analysis . . . . . . . . . . . . . . ..."
Abstract
-
Cited by 134 (0 self)
- Add to MetaCart
Contents 1 Introduction 4 1.1 What is Speech Analysis? . . . . . . . . . . . . . . . . . . . . 4 1.1.1 So what is an acoustic vector? . . . . . . . . . . . . . . 4 1.2 Why Speech Analysis? . . . . . . . . . . . . . . . . . . . . . . 4 1.3 The problems of speech analysis . . . . . . . . . . . . . . . . . 7 1.4 Standard references for this course . . . . . . . . . . . . . . . 7 2 Background 7 2.1 Sampling theory . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Sampling frequency . . . . . . . . . . . . . . . . . . . . 7 2.1.2 Sampling resolution . . . . . . . . . . . . . . . . . . . . 8 2.2 Linear filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Finite Impulse Response filters . . . . . . . . . . . . . 8 2.2.2 Infinite Impulse Response filters . . . . . . . . . . . . . 11 2.3 The source filter model of speech . . . . . . . . . . . . . . . . 12 3 Filter bank Analysis 12 3.1 Spectrograms . . . . . . . . .
Support vector machines for speech recognition
- Proceedings of the International Conference on Spoken Language Processing
, 1998
"... Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative informati ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative information and are prone to overfitting and over-parameterization. Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition, and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Aphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.
Multi-Channel Speech Enhancement In A Car Environment Using Wiener Filtering And Spectral Subtraction
- Proc. ICASSP 97
, 1997
"... This paper presents a multichannel-algorithm for speech enhancement for hands--free telephone systems in cars. This new algorithm takes advantage of the special noise characteristics in fast driving cars. The incoherence of the noise allows to use adaptive Wiener filtering in the frequencies above a ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
This paper presents a multichannel-algorithm for speech enhancement for hands--free telephone systems in cars. This new algorithm takes advantage of the special noise characteristics in fast driving cars. The incoherence of the noise allows to use adaptive Wiener filtering in the frequencies above a theoretically determined frequency. Below this frequency a smoothed spectral subtraction (SSS) is used to get an improved noise suppression. The algorithm yields better results in noise reduction with significantly less distortions and artificial noise than spectral subtraction or Wiener filtering alone. 1. INTRODUCTION The handset equipment for telephones in cars is a restriction and a potential risk for the driver. Only hands--free devices can overcome this problem. Two different approaches for hands--free devices can be pursued. The first one uses only one microphone [1, 2], whereas the second one is a multichannel approach [3, 4]. The most often used single-sensor method is spectral s...
New Methods For Adaptive Noise Suppression
- In Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing
, 1995
"... We propose three new adaptive noise suppression algorithms for enhancing noise-corrupted speech: smoothed spectral subtraction (SSS), vector quantization of line spectral frequencies (VQ-LSF), and modified Wiener filtering (MWF). SSS is an improved version of the well-known spectral subtraction algo ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
We propose three new adaptive noise suppression algorithms for enhancing noise-corrupted speech: smoothed spectral subtraction (SSS), vector quantization of line spectral frequencies (VQ-LSF), and modified Wiener filtering (MWF). SSS is an improved version of the well-known spectral subtraction algorithm, while the other two methods are based on generalized Wiener filtering. We have compared these three algorithms with each other and with spectral subtraction on both simulated noise and actual car noise. All three proposed methods perform substantially better than spectral subtraction, primarily because of the absence of any musical noise artifacts in the processed speech. Listening tests showed preference for MWF and SSS over VQLSF. Also, MWF provides a much higher mean opinion score (MOS) than does spectral subtraction. Finally, VQLSF provides a relatively good spectral match to the clean speech, and may, therefore, be better suited for speech recognition. ICASSP-95, Detroit, May 19...
Assessing Local Noise Level Estimation Methods
- SPEECH COMMUNICATION
, 1999
"... In this paper, we assess and compare two well-known methods for the local estimation of noise level in frequency subbands to a new one based of the following of lower signal energy envelope. Moreover we introduce, for those three approaches, a new pre-processing algorithm expected to better follow f ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
In this paper, we assess and compare two well-known methods for the local estimation of noise level in frequency subbands to a new one based of the following of lower signal energy envelope. Moreover we introduce, for those three approaches, a new pre-processing algorithm expected to better follow fast modulations of the noise energy. Speech periodicity property is used to update the noise level estimate during voiced parts of speech (without explicit detection of voiced portions) . This evaluation is performed on four different kinds of noise (both artificial and real noises) added to clean speech. The best approach is used for spectral subtraction in a speech recognition experiment and compared to more classical noise robust features (J-RASTA).
Speech enhancement based on wavelet thresholding the multitaper spectrum
- IEEE TRANS. SPEECH AUDIO PROC
, 2004
"... It is well known that the “musical noise” encountered in most frequency domain speech enhancement algorithms is partially due to the large variance estimates of the spectra. To address this issue, we propose in this paper the use of low-variance spectral estimators based on wavelet thresholding the ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
It is well known that the “musical noise” encountered in most frequency domain speech enhancement algorithms is partially due to the large variance estimates of the spectra. To address this issue, we propose in this paper the use of low-variance spectral estimators based on wavelet thresholding the multitaper spectra for speech enhancement. A short-time spectral amplitude estimator is derived which incorporates the wavelet-thresholded multitaper spectra. Listening tests showed that the use of multitaper spectrum estimation combined with wavelet thresholding suppressed the musical noise and yielded better quality than the subspace and MMSE algorithms.
An Adaptive KLT Approach for Speech Enhancement
- IEEE Trans. Speech Audio Processing
, 1999
"... An adaptive Karhunen-Loeve Transform tracking based algorithm is proposed for enhancement of speech degraded by colored additive interference. This algorithm decomposes noisy speech into its components along the axes of a KLT-based vector space of the clean speech. It is observed that the noise ener ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
An adaptive Karhunen-Loeve Transform tracking based algorithm is proposed for enhancement of speech degraded by colored additive interference. This algorithm decomposes noisy speech into its components along the axes of a KLT-based vector space of the clean speech. It is observed that the noise energy is disparately distributed along each eigenvector. These energies are obtained from noise samples gathered from silence intervals between speech samples. To obtain these silence intervals, we proposed an ecient voice activity detector based on outputs of principle component eigenlter; the greatest eigenvalue of speech KLT. Enhancement is performed by modifying each KLT component due to its noise and clean speech energies. The criterion is to minimize the produced distortion when residual noise power is limited to a specic level. At the end, inverse KLT is performed and an estimation of clean signal is synthesized. Our listening tests indicated that 71% of our subjects preferred the enha...
Efficient voice activity detection algorithms using long-term speech information
- Speech Communication
, 2004
"... ..."
A generalized subspace approach for enhancing speech corrupted by colored noise
- IEEE TRANS. SPEECH AUDIO PROC
, 2003
"... A generalized subspace approach is proposed for enhancement of speech corrupted by colored noise. A nonunitary transform, based on the simultaneous diagonalization of the clean speech and noise covariance matrices, is used to project the noisy signal onto a signal-plus-noise subspace and a noise sub ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
A generalized subspace approach is proposed for enhancement of speech corrupted by colored noise. A nonunitary transform, based on the simultaneous diagonalization of the clean speech and noise covariance matrices, is used to project the noisy signal onto a signal-plus-noise subspace and a noise subspace. The clean signal is estimated by nulling the signal components in the noise subspace and retaining the components in the signal subspace. The applied transform has built-in prewhitening and can therefore be used in general for colored noise. The proposed approach is shown to be a generalization of the approach proposed by Ephraim and Van Trees for white noise. Two estimators were derived based on the nonunitary transform, one based on time-domain constraints and one based on spectral domain constraints. Objective and subjective measures demonstrated improvements over other subspace-based methods when tested with TIMIT sentences corrupted with speech-shaped noise and multi-talker babble.

