## BLIND ESTIMATION OF A FEATURE-DOMAIN REVERBERATION MODEL IN NON-DIFFUSE ENVIRONMENTS WITH VARIANCE ADJUSTMENT

Citations: | 1 - 1 self |

### BibTeX

@MISC{Wen_blindestimation,

author = {Jimi Y. C. Wen and Armin Sehr and Patrick A. Naylor and Walter Kellermann},

title = {BLIND ESTIMATION OF A FEATURE-DOMAIN REVERBERATION MODEL IN NON-DIFFUSE ENVIRONMENTS WITH VARIANCE ADJUSTMENT},

year = {}

}

### OpenURL

### Abstract

Blind estimation of a two-slope feature-domain reverberation model is proposed. The reverberation model is suitable for robust distant-talking automatic speech recognition approaches which use a convolution in the feature domain to characterize the reverberant feature vector sequence, e.g. [1, 2, 3]. Since the model describes the reverberation by a matrix-valued IID Gaussian random process, its statistical properties are completely captured by its mean and variance matrices. The suggested solution for the estimation of the model includes two novel features based on the study of simulated rooms: 1) a solution for blindly determining a twoslope decay model from a single-slope estimate; 2) a variance mask to improve the estimation of the variance matrix. Using the proposed solution, the reverberation model can be estimated during recognition without the need of pre-training or using calibration utterances with known transcription. Connected digit recognition experiments using [3] show that the reverberation models estimated by the proposed approach significantly outperform HMM-based recognizers trained on reverberant data in most environments. 1.

### Citations

392 |
Image method for efficiently simulating small-room acoustics,” The
- Allen, Berkley
- 1979
(Show Context)
Citation Context ... &9 8 :!;9 &9 !% 1 ! " # $ )*+,-.!/0-120)345-. 6-7 9;!" = ϕ !9;! !9;!' !9;" ! " # $ )*+,-.!/0-120)345-. 627 Figure 2: Simulation setup of different room acoustics parameters using the image method of =-=[14]-=-. ˆσ 2 H . = ς 2 H ⊗ ˜σ 2 H = ς 2 H ⊗ m 2 H ! ≈ σ 2 H (7) where ⊗ denotes the Hadamard product. Taking the natural logarithm of each matrix element, we get ln ˆσ 2 H = lnς 2 H + lnm 2 H. (8) To determ... |

47 |
New method of measuring reverberation time
- Schroeder
- 1965
(Show Context)
Citation Context ...ophone distance. Non-perfectly diffuse sound fields exhibit a faster decay for the early segment corresponding to the direct sound and early reflections, and a slower decay for the late reverberation =-=[11, 12]-=-. Therefore, a two-slope RVM extended from Pollack’s time-domain model [10] is used in [8] to capture the non-diffuse RIRs as depicted in Fig. 1(b). In the early segment of the two-slope model, extend... |

39 | Recognizing reverberant speech with RASTA-PLP - Kingsbury, Morgan - 1997 |

10 | Distant-talking continuous speech recognition based on a novel reverberation model
- Sehr, Zeller, et al.
- 2006
(Show Context)
Citation Context ...ration model is suitable for robust distant-talking automatic speech recognition approaches which use a convolution in the feature domain to characterize the reverberant feature vector sequence, e.g. =-=[1, 2, 3]-=-. Since the model describes the reverberation by a matrix-valued IID Gaussian random process, its statistical properties are completely captured by its mean and variance matrices. The suggested soluti... |

9 |
La transmission de l’énergie sonore dans les salles. Dissertation. Université du
- Polack
- 1988
(Show Context)
Citation Context ...or the estimation of the room decay ˆα according to ˆα = 4 ∑ γr(σ r=0 2 X−)r is used. The parameters (γr) of the mapping function are obtained in [9] by using Polack’s statistical reverberation model =-=[10]-=- and two speech fragments consisting of one male and one female sentence. 2.3 Late Decay Adjustment RIRs obtained in real-world rooms are not ‘diffuse’ since ‘diffuse’ RIRs require an infinite source-... |

7 |
A new HMM adaptation approach for the case of a hands-free speech input in reverberant rooms
- Hirsch, Finster
- 2006
(Show Context)
Citation Context ...ration model is suitable for robust distant-talking automatic speech recognition approaches which use a convolution in the feature domain to characterize the reverberant feature vector sequence, e.g. =-=[1, 2, 3]-=-. Since the model describes the reverberation by a matrix-valued IID Gaussian random process, its statistical properties are completely captured by its mean and variance matrices. The suggested soluti... |

7 | Model adaptation by state splitting of HMM for long reverberation
- Raut, Nishimoto, et al.
- 2005
(Show Context)
Citation Context ...ess, its statistical properties are completely captured by its mean and variance matrices. While a set of known RIRs in [3], simultaneous recordings of closetalking and distant-talking microphones in =-=[6]-=-, and calibration utterances with known transcriptions in [1, 7, 8] are required for estimating the reverberation representation, the proposed approach can estimate the RVM blindly during recognition.... |

6 |
Model adaptation for long convolutional distortion by maximum likelihood state filtering approach
- Raut, Nishimoto, et al.
(Show Context)
Citation Context ...ration model is suitable for robust distant-talking automatic speech recognition approaches which use a convolution in the feature domain to characterize the reverberant feature vector sequence, e.g. =-=[1, 2, 3]-=-. Since the model describes the reverberation by a matrix-valued IID Gaussian random process, its statistical properties are completely captured by its mean and variance matrices. The suggested soluti... |

5 |
The harming part of room acoustics in automatic speech recognition
- Petrick, Lohde, et al.
- 2007
(Show Context)
Citation Context ...tch between the input utterances and the acoustic model of the recognizer, usually trained on close-talking speech. Therefore, the performance of ASR systems is significantly reduced by reverberation =-=[4, 5]-=- if no countermeasures are taken. In the time domain, reverberant speech can be described by a convolution of clean speech with the Room Impulse Response (RIR) characterizing the acoustic path from th... |

3 |
Chambers: Blind Estimation of Reverberation Parameters for Non-Diffuse Rooms
- Kendrick, Li, et al.
(Show Context)
Citation Context ...ophone distance. Non-perfectly diffuse sound fields exhibit a faster decay for the early segment corresponding to the direct sound and early reflections, and a slower decay for the late reverberation =-=[11, 12]-=-. Therefore, a two-slope RVM extended from Pollack’s time-domain model [10] is used in [8] to capture the non-diffuse RIRs as depicted in Fig. 1(b). In the early segment of the two-slope model, extend... |

2 | Maximum likelihood estimation of a reverberation model for robust distant-talking speech recognition
- Sehr, Zheng, et al.
- 2007
(Show Context)
Citation Context ...s mean and variance matrices. While a set of known RIRs in [3], simultaneous recordings of closetalking and distant-talking microphones in [6], and calibration utterances with known transcriptions in =-=[1, 7, 8]-=- are required for estimating the reverberation representation, the proposed approach can estimate the RVM blindly during recognition. Thus, the flexibility of the robust distant talking ASR approaches... |

2 | A combined approach for estimating a feature-domain reverberation model in nondiffuse environments
- Sehr, Wen, et al.
- 2008
(Show Context)
Citation Context ...s mean and variance matrices. While a set of known RIRs in [3], simultaneous recordings of closetalking and distant-talking microphones in [6], and calibration utterances with known transcriptions in =-=[1, 7, 8]-=- are required for estimating the reverberation representation, the proposed approach can estimate the RVM blindly during recognition. Thus, the flexibility of the robust distant talking ASR approaches... |

2 | Blind estimation of reverberation time based on the distribution of signal decay rates
- Wen, Habets, et al.
- 2008
(Show Context)
Citation Context ...been proposed in this paper. The proposed approach determines the mean and the variance matrices of a matrix-valued IID Gaussian random process. Blind estimates of the reverberation time according to =-=[9]-=- are used to determine single-slope decay estimates. Using the proposed adjustment method, the singleslope estimates are transformed to an early and a late decay to produce the mean matrix of a two-sl... |

2 |
Speech and Audio Processing in Adverse Environments, chapter Towards Robust Distant-Talking Automatic Speech Recognition in Reverberant Environments
- Sehr, Kellermann
- 2008
(Show Context)
Citation Context ...ics, we assume an average reverberation time of T60 = 0.6 s and an average sourcemic distance of d = 2.5 m to select the optimum values of the parameters ρ and ϕ. Connected digit recognition tests in =-=[15]-=- indicate that overestimation of the reverberation by the RVM is less detrimental than underestimation. Since larger values of ϖ cause the adjusted late decay to be slower, we select ϖ = 3 correspondi... |