Results 1 -
3 of
3
Noise Adaptive Speech Recognition in Time-Varying Noise Based on Sequential Kullback Proximal Algorithm
- in ICASSP, 2002
, 2002
"... We present a noise adaptive speech recognition approach, where time-varying noise parameter estimation and Viterbi process are combined together. The Viterbi process provides approximated joint likelihood of active partial paths and observation sequence given the noise parameter sequence estimated t ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
(Show Context)
We present a noise adaptive speech recognition approach, where time-varying noise parameter estimation and Viterbi process are combined together. The Viterbi process provides approximated joint likelihood of active partial paths and observation sequence given the noise parameter sequence estimated till previous frame. The joint likelihood after normalization provides approximation to the posterior probabilities of state sequences for an EM-type recursive process based on sequential Kullback proximal algorithm to estimate the current noise parameter. The combined process can easily be applied to perform continuous speech recognition in presence of non-stationary noise. Experiments were conducted in simulated and real non-stationary noises. Results showed that the noise adaptive system provides significant improvements in word accuracy as compared to the baseline system (without noise compensation) and the normal noise compensation system (which assumes the noise to be stationary). 1.
Knowledge-Based Speech Enhancement
, 2005
"... Speech is a fundamental means of human communication. In the last several decades, much effort has been devoted to the efficient transmission and storage of speech signals. With advances in technology making mobile communication ubiquitous, communications anywhere has become a reality. The freedom ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Speech is a fundamental means of human communication. In the last several decades, much effort has been devoted to the efficient transmission and storage of speech signals. With advances in technology making mobile communication ubiquitous, communications anywhere has become a reality. The freedom and flexibility offered by mobile technology brings with it new challenges, one of which is robustness to acoustic background noise. Speech enhancement systems form a vital front-end for mobile telephony in noisy environments such as in cars, cafeterias, subway stations, etc., in hearing aids, and to improve the performance of speech recognition systems. In this thesis, which consists of four research articles, we discuss both single and multi-microphone approaches to speech enhancement. The main contribution of this thesis is a framework to exploit available prior knowledge about both speech and noise. The physiology of speech production places a constraint on the possible shapes of the speech spectral envelope, and this information is captured using codebooks of speech linear predictive (LP) coefficients obtained from a large training database. Similarly, information about commonly occurring noise types is captured using a set of noise codebooks, which can be combined with sound environment classification to treat different environments differently. In paper A, we introduce maximum-likelihood estimation of the speech and noise LP parameters using the codebooks. The codebooks capture only the spectral shape. The speech and noise gain factors are obtained through a frame-byframe optimization, providing good performance in practical nonstationary noise environments. The estimated parameters are subsequently used in a Wiener filter. Paper B describes Bayesian minimum mean squared error estimation of the speech and noise LP parameters and functions there-of, while retaining the instantaneous gain computation. Both memoryless and memory-based estimators are derived. While papers A and B describe single-channel techniques, paper C describes a multi-channel Bayesian speech enhancement approach, where, in addition to temporal processing, the spatial diversity provided by multiple microphones is also exploited. In paper D, we introduce a multi-channel noise reduction technique motivated by blind source separation (BSS) concepts. In contrast to standard BSS approaches, we use the knowledge that one of the signals is speech and that the other is noise, and exploit their different characteristics.
Thesis for the degree of Doctor of Philosophy Knowledge-Based Speech Enhancement
"... Abstract Speech is a fundamental means of human communication. In the last several decades, much effort has been devoted to the efficient transmission and storage of speech signals. With advances in technology making mobile communication ubiquitous, communications anywhere has become a reality. The ..."
Abstract
- Add to MetaCart
Abstract Speech is a fundamental means of human communication. In the last several decades, much effort has been devoted to the efficient transmission and storage of speech signals. With advances in technology making mobile communication ubiquitous, communications anywhere has become a reality. The freedom and flexibility offered by mobile technology brings with it new challenges, one of which is robustness to acoustic background noise. Speech enhancement systems form a vital front-end for mobile telephony in noisy environments such as in cars, cafeterias, subway stations, etc., in hearing aids, and to improve the performance of speech recognition systems. In this thesis, which consists of four research articles, we discuss both single and multi-microphone approaches to speech enhancement. The main contribution of this thesis is a framework to exploit available prior knowledge about both speech and noise. The physiology of speech production places a constraint on the possible shapes of the speech spectral envelope, and this information is captured using codebooks of speech linear predictive (LP) coefficients obtained from a large training database. Similarly, information about commonly occurring noise types is captured using a set of noise codebooks, which can be combined with sound environment classification to treat different environments differently. In paper A, we introduce maximum-likelihood estimation of the speech and noise LP parameters using the codebooks. The codebooks capture only the spectral shape. The speech and noise gain factors are obtained through a frame-byframe optimization, providing good performance in practical nonstationary noise environments. The estimated parameters are subsequently used in a Wiener filter. Paper B describes Bayesian minimum mean squared error estimation of the speech and noise LP parameters and functions there-of, while retaining the instantaneous gain computation. Both memoryless and memory-based estimators are derived. While papers A and B describe single-channel techniques, paper C describes a multi-channel Bayesian speech enhancement approach, where, in addition to temporal processing, the spatial diversity provided by multiple microphones is also exploited. In paper D, we introduce a multi-channel noise reduction technique motivated by blind source separation (BSS) concepts. In contrast to standard BSS approaches, we use the knowledge that one of the signals is speech and that the other is noise, and exploit their different characteristics.