Results 1 -
8 of
8
Environmental Conditions and Acoustic Transduction in Hands-Free Speech Recognition
- Speech Communication
, 1998
"... Hands-free interaction represents a key-point for increase of flexibility of present applications and for the development of new speech recognition applications, where the user can not be encumbered by either hand-held or head-mounted microphones. When the microphone is far from the speaker, the ..."
Abstract
-
Cited by 22 (4 self)
- Add to MetaCart
Hands-free interaction represents a key-point for increase of flexibility of present applications and for the development of new speech recognition applications, where the user can not be encumbered by either hand-held or head-mounted microphones. When the microphone is far from the speaker, the transduced signal is affected by degradation of different nature, that is often unpredictable. Special microphones and multimicrophone acquisition systems represent a way of reducing some environmental noise effects. Robust processing and adaptation techniques can be further used in order to compensate for different kinds of variability that may be present in the recognizer input. The purpose of this paper is to re-visit some of the assumptions about the different sources of this variability and to discuss both on special transducer systems and on compensation/adaptation techniques that can be adopted. In particular, the paper will refer to the use of multimicrophone systems to overc...
Training of HMM with Filtered Speech Material for Hands-Free Recognition
- in Proc. ICASSP
, 1999
"... This paper addresses the problem of hands-free speech recognition in a noisy office environment. An array of six omnidirectional microphones and a corresponding time delay compensation module are used to provide a beamformed signal as input to a HMM-based recognizer. Training of HMMs is performed e ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
This paper addresses the problem of hands-free speech recognition in a noisy office environment. An array of six omnidirectional microphones and a corresponding time delay compensation module are used to provide a beamformed signal as input to a HMM-based recognizer. Training of HMMs is performed either using a clean speech database or using a filtered version of the same database. Filtering consists in a convolution with the acoustic impulse response between speaker and microphone, to reproduce the reverberation effect. Background noise is summed to provide the desired SNR. The paper shows that the new models trained on these data perform better than the baseline ones. Furthermore, the paper investigates on MLLR adaptation of the new models. It is shown that a further performance improvement is obtained, allowing to reach a 98.7% WRR in a connected digit recognition task, when the talker is at 1.5 m distance from the array. 1. INTRODUCTION Hands-free continuous speech recognition ...
Speech Enhancement and Recognition in Meetings With an Audio–Visual Sensor Array
"... reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained
Particle Filtering Methods for Acoustic Source . . .
, 2004
"... The task of acoustic source tracking plays an important role in many practical speech acquisition systems. This research presents an extensive study of sequential Monte Carlo methods applied to the source localisation problem, based on the signals received at an array of microphones. A general frame ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The task of acoustic source tracking plays an important role in many practical speech acquisition systems. This research presents an extensive study of sequential Monte Carlo methods applied to the source localisation problem, based on the signals received at an array of microphones. A general framework for acoustic source localisation using particle filtering is proposed, and four di#erent algorithms that fit within this framework are subsequently developed. To assess the performance of these new methods, statistical simulations are carried out using both synthetic and real-life samples of audio data. The simulation results demonstrate the superiority of an approach based on sequential estimation. The resulting particle filters are shown to drastically outperform traditional acoustic source localisation methods. Further
Explicit Speech Modeling for Distant-Talker Signal Acquisition
, 1998
"... There are a variety of applications requiring speech acquisition in challenging environments for which it is neither possible nor desirable to have a talker or talkers physically linked to an input device. With these scenarios come new challenges specific to the distant-talker environment which are ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
There are a variety of applications requiring speech acquisition in challenging environments for which it is neither possible nor desirable to have a talker or talkers physically linked to an input device. With these scenarios come new challenges specific to the distant-talker environment which are not adequately addressed by methods shown to be effective in close-talking conditions. These include channel effects such as reverberations, poor signal-to-noise ratios, and interfering sources. This paper develops techniques which explicitly incorporate the nature of the speech signal (i.e. statistical non-stationarity, method of production, pitch, voicing, and formant structures, and source radiator models) into a multi-channel context. A new processing paradigm is presented which combines the advantages of spatial filtering, non-traditional (e.g. nonlinear) processing methods, and specific knowledge of the desired time-series attributes. Detailed examples highlighting the merits of this p...
Improving Reverberant VTS for Hands-free Robust Speech Recognition
"... Abstract—Model-based approaches to handling additive background noise and channel distortion, such as Vector Taylor Series (VTS), have been intensively studied and extended in a number of ways. In previous work, VTS has been extended to handle both reverberant and background noise, yielding the Reve ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract—Model-based approaches to handling additive background noise and channel distortion, such as Vector Taylor Series (VTS), have been intensively studied and extended in a number of ways. In previous work, VTS has been extended to handle both reverberant and background noise, yielding the Reverberant VTS (RVTS) scheme. In this work, rather than assuming the observation vector is generated by the reverberation of a sequence of background noise corrupted speech vectors, as in RVTS, the observation vector is modelled as a superposition of the background noise and the reverberation of clean speech. This yields a new compensation scheme RVTS Joint (RVTSJ), which allows an easy formulation for joint estimation of both additive and reverberation noise parameters. These two compensation schemes were evaluated and compared on a simulated reverberant noise corrupted AURORA4 task. Both yielded large gains over VTS baseline system, with RVTSJ outperforming the previous RVTS scheme. I.
Speech Enhancement Using Nonlinear Microphone Array Under Nonstationary Noise Conditions
- IEICE Trans. Fundamentals E82-E
, 1999
"... This paper describes a spatial spectral subtraction method by using the complementary beamforming microphone array to enhance noisy speech signals for speech recognition. The complementary beamforming is based on two types of beamformers designed to obtain complementary directivity patterns with res ..."
Abstract
- Add to MetaCart
This paper describes a spatial spectral subtraction method by using the complementary beamforming microphone array to enhance noisy speech signals for speech recognition. The complementary beamforming is based on two types of beamformers designed to obtain complementary directivity patterns with respect to each other. In this paper, it is shown that the nonlinear subtraction processing with complementary beamforming can result in a kind of the spectral subtraction without the need for speech pause detection. To evaluate the effectiveness, speech enhancement experiments and speech recognition experiments are performed based on computer simulations under both stationary and nonstationary noise conditions. In comparison with the optimized conventional delay-and-sum array, it is shown that: (1) the proposed array performs more than 20% better in word recognition rates under the conditions that the white Gaussian noise is used, (2) the proposed array improves the word recognition rate by ab...
Blind Speech Enhancement with Independent Component Analysis and Spectral Subtraction
, 2010
"... A hands-free speech recognition system and a hands-free telecommunication system are essential for realizing an intuitive, unconstrained, and stress free human-machine interface. In an actual acoustic environment, however, not only user’s speech but also interference source signals such as backgroun ..."
Abstract
- Add to MetaCart
A hands-free speech recognition system and a hands-free telecommunication system are essential for realizing an intuitive, unconstrained, and stress free human-machine interface. In an actual acoustic environment, however, not only user’s speech but also interference source signals such as background noise and interference speech are existing. Such interferences disturb high-quality speech recognition or telecommunication. Therefore, a source extraction method is needed to realize high-quality hands-free systems. Particularly, blind source extraction methods are spotlighted. Since blind source extraction does not require any supervision, it can be applied to wide-area applications. Independent component analysis (ICA) is a successful candidate of blind source extraction methods. There have been many studies on ICA, and they have provided strong evidences that ICA can extract blindly source signals from noisy observations. However, almost all studies on ICA only treat the limited case, i.e., all sound sources are point source like speech. Such an acoustic condition is very

