Results 1 - 10
of
55
Acoustical and Environmental Robustness in Automatic Speech Recognition
, 1990
"... This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment. These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in d ..."
Abstract
-
Cited by 145 (8 self)
- Add to MetaCart
This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment. These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in different acoustical environments, and when a desk-top microphone (rather than a close-talking microphone) is used for speech input. Without such processing, mismatches between training and testing conditions produce an unacceptable degradation in recognition accuracy. Two kinds of
Speech Recognition in Noisy Environments
- Ph. D. Dissertation, ECE Department, CMU
, 1996
"... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.1. Thesis goals . . . . . . . . . . . . . . . . . . . . . ..."
Abstract
-
Cited by 72 (3 self)
- Add to MetaCart
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.1. Thesis goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.2. Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Chapter 2 The SPHINX-II Recognition System . . . . . . . . . . . . . . . . . . . . . . 17 2.1. An Overview of the SPHINX-II System . . . . . . . . . . . . . . . . . . 17 2.1.1. Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.2. Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . 20 2.1.3. Recognition Unit . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.1.4. Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.1.5. Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2. Experimental Tasks and Corpora . ...
Acoustic Event Localization Using A Crosspower-Spectrum Phase Based Technique
- in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing
, 1994
"... Linear microphone arrays can be employed for acoustic event localization in a noisy environment using time delay estimation. Three techniques are investigated that allow delay estimation, namely Normalized Cross Correlation, LMS Adaptive Filters, Crosspower-Spectrum Phase: they are combined with a b ..."
Abstract
-
Cited by 46 (9 self)
- Add to MetaCart
Linear microphone arrays can be employed for acoustic event localization in a noisy environment using time delay estimation. Three techniques are investigated that allow delay estimation, namely Normalized Cross Correlation, LMS Adaptive Filters, Crosspower-Spectrum Phase: they are combined with a bidimensional representation, the Coherence Measure, in order to emphasize information that can be exploited for estimating position of both non-moving and moving acoustic sources. To compare the given techniques, different acoustic sources were considered, that generated events in different positions in space. Expressing performance in terms of accuracy of the wavefront direction angle, experiments showed that the Crosspower-Spectrum Phase based technique outperforms the other two. This technique provided very promising preliminary results also in terms of source position estimation. 1. INTRODUCTION In the last decade, some research effort has been devoted to microphone array processing tec...
A Practical Methodology for Speech Source Localization With Microphone Arrays
, 1996
"... Electronically steerable arrays of microphones have a variety of uses in speech data acquisition systems. Applications include teleconferencing, speech recognition and speaker identification, sound capture in adverse environments, and biomedical devices for the hearing impaired. An array of micropho ..."
Abstract
-
Cited by 44 (3 self)
- Add to MetaCart
Electronically steerable arrays of microphones have a variety of uses in speech data acquisition systems. Applications include teleconferencing, speech recognition and speaker identification, sound capture in adverse environments, and biomedical devices for the hearing impaired. An array of microphones has a number of advantages over a single-microphone system. It may be electronically aimed to provide a high-quality signal from a desired source location while simultaneously attenuating interfering talkers and ambient noise, does not necessitate local placement of transducers or encumber the talker with a hand-held or head-mounted microphone, and does not require physical movement to alter its direction of reception. Additionally, it has capabilities that a single microphone does not; namely automatic detection, localization, and tracking of active talkers in its receptive area. This paper addresses the specific application of source localization algorithms for estimating the position ...
A Framework for Speech Source Localization Using Sensor Arrays
, 1995
"... Electronically steerable arrays of microphones have avariety of uses in speech data ac-quisition systems. Applications include teleconferencing, speech recognition and speaker identification, sound capture in adverse environments, and biomedical devices for the hear-ing impaired. An array of microph ..."
Abstract
-
Cited by 42 (5 self)
- Add to MetaCart
Electronically steerable arrays of microphones have avariety of uses in speech data ac-quisition systems. Applications include teleconferencing, speech recognition and speaker identification, sound capture in adverse environments, and biomedical devices for the hear-ing impaired. An array of microphones has a number of advantages over a single-microphone system. It may be electronically aimed to provide a high-quality signal from a desired source location while simultaneously attenuating interfering talkers and ambient noise, does not necessitate local placement of transducers or encumber the talker with a hand-held or head-mounted microphone, and does not require physical movement to alter its direction of reception. Additionally, it has capabilities that a single microphone does not; namely automatic detection, localization, and tracking of active talkers in its receptive area. A fundamental requirement of sensor array systems is the ability to locate and track a speech source. An accurate fix on the primary talker, as well as knowledge of any interfering talkers or coherent noise sources, is necessary to effectively steer the array. Source location data may also be used for purposes other than beamforming; e.g. aiming a camera in a video-conferencing system. In addition to high accuracy, the location estimator must be
On The Origins Of Speech Intelligibility In The Real World
- ESCA WORKSHOP ON ROBUST SPEECH RECOGNITION FOR UNKNOWN COMMUNICATION CHANNELS, PONT-A-MOUSSON
, 1997
"... Current-generation speech recognition systems seek to identify words via analysis of their underlying phonological constituents. Although this stratagem works well for carefully enunciated speech emanating from a pristine acoustic environment, it has fared less well for recognizing speech spoken und ..."
Abstract
-
Cited by 30 (9 self)
- Add to MetaCart
Current-generation speech recognition systems seek to identify words via analysis of their underlying phonological constituents. Although this stratagem works well for carefully enunciated speech emanating from a pristine acoustic environment, it has fared less well for recognizing speech spoken under more realistic conditions, such as (1) moderate to high levels of background noise (2) moderately reverberant acoustic environments (3) spontaneous, informal conversation Under such "real-world" conditions the acoustic properties of speech make it difficult to partition the acoustic stream into readily definable phonological units, thus rendering the process of word recognition highly vulnerable to departures from "canonical" patterns. Analysis of informal, spontaneous speech indicates that the stability of linguistic representation is more likely to reside on the syllabic and phrasal levels than on the phonological. In consequence, attempts to represent words merely as sequences of ...
A DSP Implementation of Source Location Using Microphone Arrays
- Proceedings of the SPIE
, 1996
"... The design, implementation, and performance of a low-cost, realtime DSP system for source location is discussed. The system consists of an 8-element electret microphone array connected to a Signalogic DSP daughterboard hosted by a PC. The system determines the location of a speaker in the audien ..."
Abstract
-
Cited by 28 (5 self)
- Add to MetaCart
The design, implementation, and performance of a low-cost, realtime DSP system for source location is discussed. The system consists of an 8-element electret microphone array connected to a Signalogic DSP daughterboard hosted by a PC. The system determines the location of a speaker in the audience in an irregularly shaped auditorium. The auditorium presents a non-ideal acoustical environment; some of the walls are acoustically treated, but there still exists significant reverberation and a large amount of low frequency noise from fans in the ceiling. The source location algorithm is implemented in a two step process: The first step determines time delay of arrival (TDOA) for select microphone pairs. A modified version of the Cross-Power Spectrum Phase Method is used to compute TDOAs and is implemented on the DSP daughterboard. The second step uses the computed TDOAs in a least mean squares gradient descent search algorithm implemented on the PC to compute a location esti...
Multi-Microphone Correlation-Based Processing for Robust Automatic Speech Recognition
- IEEE International Conference on Acoustics, Speech, and Signal Processing
, 1996
"... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . 8 1.1. The Cross-Condition Problem . . . . . . . . . . . . . . . . . . . . 8 1. ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . 8 1.1. The Cross-Condition Problem . . . . . . . . . . . . . . . . . . . . 8 1.2. Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3. Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 2. Background . . . . . . . . . . . . . . . . . . . . .12 2.1. Delay-and-Sum Beamforming . . . . . . . . . . . . . . . . . . . 12 2.1.1. Application of Delay-and-Sum Processing to Speech Recognition . . 13 2.2. Traditional Adaptive Arrays . . . . . . . . . . . . . . . . . . . . 13 2.2.1. Adaptive Noise Cancelling . . . . . . . . . . . . . . . . . . 15 2.2.2. Application of Traditional Adaptive Methods to Speech Recognition . 16 2.3. Cross-Correlation Based Arrays . . . . . . . . . . . . . . . . . . 18 2.3.1. Phenomena . . . . . . . . ....
A Practical Time-Delay Estimator for Localizing Speech Sources with a Microphone Array
, 1995
"... A frequency-domain-based delay estimator is described, designed specifically for speech signals in a microphone-array environment. It is shown to be capable of obtaining precision delay estimates over a wide range of SNR conditions and is simple enough computationally to make it practical for real-t ..."
Abstract
-
Cited by 24 (10 self)
- Add to MetaCart
A frequency-domain-based delay estimator is described, designed specifically for speech signals in a microphone-array environment. It is shown to be capable of obtaining precision delay estimates over a wide range of SNR conditions and is simple enough computationally to make it practical for real-time systems. A location algorithm based upon the delay estimator is then developed. With this algorithm it is possible to localize talker positions to a region only a few centimeters in diameter (not very different from the size of the source), and to track a moving source. Experimental results using data from a real 16-element array are presented to indicate the true performance of the algorithms.

