Results 1 -
8 of
8
MAXIMUM LIKELIHOOD ESTIMATION OF A REVERBERATION MODEL FOR ROBUST DISTANT-TALKING SPEECH RECOGNITION
"... We propose a novel approach for estimating a reverberation model for a robust recognizer according to [1], which is designed to allow distant-talking automatic speech recognition (ASR) in reverberant environments. Based on a few calibration utterances with known transcriptions recorded in the target ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We propose a novel approach for estimating a reverberation model for a robust recognizer according to [1], which is designed to allow distant-talking automatic speech recognition (ASR) in reverberant environments. Based on a few calibration utterances with known transcriptions recorded in the target environment, a maximum likelihood estimator is used to find the means and variances of the reverberation model. In contrast to [1] and to HMM training on artificially reverberated training data (e. g. [2]), measurements of room impulse responses become unnecessary, and the effort for training is greatly reduced. Simulations of a connected digit recognition task show that, in highly reverberant environments, the reverberation models estimated by the proposed approach achieve significantly higher recognition rates than HMMs trained on reverberant data. 1.
A simplified decoding method for a robust distant-talking ASR concept based on feature-domain dereverberation
- Proc. International Workshop for Acoustic Echo and Noise Control (IWAENC
, 2008
"... A simplified decoding method for the concept of REverberation MOdeling for Speech recognition (REMOS) [1] is proposed. In order to achieve robust distant-talking Automatic Speech Recognition (ASR), the REMOS concept uses a combination of clean-speech HMMs and a reverberation model to perform feature ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
A simplified decoding method for the concept of REverberation MOdeling for Speech recognition (REMOS) [1] is proposed. In order to achieve robust distant-talking Automatic Speech Recognition (ASR), the REMOS concept uses a combination of clean-speech HMMs and a reverberation model to perform feature-domain dereverberation during decoding. The simplified decoding/dereverberation method proposed in this contribution significantly reduces the computational complexity of the concept without a major performance reduction. Index Terms — Dereverberation, robust ASR, reverberation model, feature-domain processing.
A COMBINED APPROACH FOR ESTIMATING A FEATURE-DOMAIN REVERBERATION MODEL IN NON-DIFFUSE ENVIRONMENTS
"... A combined approach for estimating a feature-domain reverberation model suitable for the robust distant-talking automatic speech recognition concept REMOS (REverberation MOdeling for Speech recognition) [1] is proposed. Based on a few calibration utterances recorded in the target environment, the co ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
A combined approach for estimating a feature-domain reverberation model suitable for the robust distant-talking automatic speech recognition concept REMOS (REverberation MOdeling for Speech recognition) [1] is proposed. Based on a few calibration utterances recorded in the target environment, the combined approach employs ML estimation and blind estimation of the reverberation time to determine a two-slope reverberation model. Since measurements of room impulse responses become unnecessary, the effort for training is greatly reduced compared to [1] and compared to training HMMs on artificially reverberated data. Connected digit recognition experiments show that the proposed reverberation models in connection with the REMOS concept significantly outperform HMM-based recognizers trained on reverberant data. Index Terms — Dereverberation, blind estimation, reverberation model, reverberation time, robust ASR.
NEW RESULTS FOR FEATURE-DOMAIN REVERBERATION MODELING
"... To achieve robust distant-talking automatic speech recognition in reverberant environments, the effect of reverberation on the speech feature sequences has to be modeled as accurately as possible. A convolution in the feature domain has been proposed recently in [1, 2, 3, 4] to capture the dispersio ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
To achieve robust distant-talking automatic speech recognition in reverberant environments, the effect of reverberation on the speech feature sequences has to be modeled as accurately as possible. A convolution in the feature domain has been proposed recently in [1, 2, 3, 4] to capture the dispersion of the feature vectors caused by reverberation. These publications use a fixed representation of the acoustic path between speaker and microphone or an elementary statistical reverberation model based on simplifying assumptions. In this contribution, we propose a Monte-Carlo approach that allows for an explicit determination of the joint probability density function of a feature-domain reverberation model. Index Terms — Robust speech recognition, distant-talking speech recognition, reverberation modeling, feature-domain processing, Monte-Carlo method 1.
BLIND ESTIMATION OF A FEATURE-DOMAIN REVERBERATION MODEL IN NON-DIFFUSE ENVIRONMENTS WITH VARIANCE ADJUSTMENT
"... Blind estimation of a two-slope feature-domain reverberation model is proposed. The reverberation model is suitable for robust distant-talking automatic speech recognition approaches which use a convolution in the feature domain to characterize the reverberant feature vector sequence, e.g. [1, 2, 3] ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Blind estimation of a two-slope feature-domain reverberation model is proposed. The reverberation model is suitable for robust distant-talking automatic speech recognition approaches which use a convolution in the feature domain to characterize the reverberant feature vector sequence, e.g. [1, 2, 3]. Since the model describes the reverberation by a matrix-valued IID Gaussian random process, its statistical properties are completely captured by its mean and variance matrices. The suggested solution for the estimation of the model includes two novel features based on the study of simulated rooms: 1) a solution for blindly determining a twoslope decay model from a single-slope estimate; 2) a variance mask to improve the estimation of the variance matrix. Using the proposed solution, the reverberation model can be estimated during recognition without the need of pre-training or using calibration utterances with known transcription. Connected digit recognition experiments using [3] show that the reverberation models estimated by the proposed approach significantly outperform HMM-based recognizers trained on reverberant data in most environments. 1.
Model-Based Dereverberation of Speech in the Mel-Spectral Domain
"... Abstract—A model-based dereverberation approach for robust distant-talking speech recognition employing the powerful acoustic model of the recognizer to describe the clean speech feature sequence is discussed. The clean speech model is combined with a statistical reverberation model describing the a ..."
Abstract
- Add to MetaCart
Abstract—A model-based dereverberation approach for robust distant-talking speech recognition employing the powerful acoustic model of the recognizer to describe the clean speech feature sequence is discussed. The clean speech model is combined with a statistical reverberation model describing the acoustic path between speaker and microphone directly in the mel-spectral domain. Dereverberation is performed during recognition by determining the most likely contributions of the combined model’s components to the current reverberant feature vector. The advantages of processing feature-domain representations of speech rather than using time- or frequency-domain speech representations are the dimension reduction and the possibility to obtain robust reverberation models valid for arbitrary speaker and microphone positions in the recording room. In this contribution, we emphasize that the criterion used for the dereverberation operation is equivalent to maximum a posteriori estimation. Connected-digit recognition experiments confirm the superior performance of the novel concept. I.
ADAPTING HMMS OF DISTANT-TALKING ASR SYSTEMS USING FEATURE-DOMAIN REVERBERATION MODELS
"... To capture the dispersive effect of reverberation by Hidden Markov Model (HMM)-based distant-talking speech recognition systems, adapting the means of the current HMM state based on the means of the preceding states has been suggested in [1]. In this contribution, we propose to incorporate the rever ..."
Abstract
- Add to MetaCart
To capture the dispersive effect of reverberation by Hidden Markov Model (HMM)-based distant-talking speech recognition systems, adapting the means of the current HMM state based on the means of the preceding states has been suggested in [1]. In this contribution, we propose to incorporate the reverberation models of [2] into the adaptation approach to describe the effect of reverberation with higher accuracy. Connected-digit recognition experiments in three different rooms confirm that the suggested more accurate reverberation representation leads to a significant performance increase in all investigated environments. 1.
MODEL-BASED DEREVERBERATION IN THE LOGMELSPEC DOMAIN FOR ROBUST DISTANT-TALKING SPEECH RECOGNITION
"... The REMOS (REverberation MOdeling for Speech recognition) concept for reverberation-robust distant-talking speech recognition, introduced in [1] for melspectral features, is extended in this contribution to logarithmic melspectral (logmelspec) features. Based on a combined acoustic model consisting ..."
Abstract
- Add to MetaCart
The REMOS (REverberation MOdeling for Speech recognition) concept for reverberation-robust distant-talking speech recognition, introduced in [1] for melspectral features, is extended in this contribution to logarithmic melspectral (logmelspec) features. Based on a combined acoustic model consisting of a hidden Markov model network and a reverberation model, REMOS determines clean-speech and reverberation estimates during recognition by an inner optimization operation. A reformulation of this inner optimization problem for logmelspec features, allowing an efficient solution by nonlinear optimization algorithms, is derived in this paper so that an efficient implementation of REMOS for logmelspec features becomes possible. Connected digit recognition experiments show that the proposed REMOS implementation significantly outperforms reverberantlytrained HMMs in highly reverberant environments. Index Terms — Reverberation, model-based dereverberation, acoustic modeling, distant-talking ASR, robust ASR

