Results 1 - 10
of
65
Robust automatic speech recognition with missing and unreliable acoustic data
- Speech Communication
, 2001
"... ..."
Multi Stream Speech Recognition
, 1996
"... . In this paper, we discuss a new automatic speech recognition (ASR) approach based on independent processing and recombination of several feature streams. In this framework, it is assumed that the speech signal is represented in terms of multiple input streams, each input stream representing a diff ..."
Abstract
-
Cited by 113 (16 self)
- Add to MetaCart
. In this paper, we discuss a new automatic speech recognition (ASR) approach based on independent processing and recombination of several feature streams. In this framework, it is assumed that the speech signal is represented in terms of multiple input streams, each input stream representing a different characteristic of the signal. If the streams are entirely synchronous, they may be accommodated simply (as they usually are in state-of-the-art systems). However, as discussed in the paper, it may be required to permit some degree of asynchrony between streams. This paper introduces the basic framework of a statistical structure that can accommodate multiple (asynchronous) observation streams (possibly exhibiting different frame rates). This approach will then be applied to the particular case of multi-band speech recognition and will be shown to yield significantly better noise robustness. 2 IDIAP--RR 96-07 1 Introduction In current automatic speech recognition (ASR) systems, the a...
A New ASR Approach Based On Independent Processing And Recombination Of Partial Frequency Bands
, 1996
"... In the framework of hidden Markov models (HMM) or hybrid HMM/Artificial Neural Network (ANN) systems, we present a new approach towards automatic speech recognition (ASR). The general idea is to split the whole frequency band (represented in terms of critical bands) into a few sub-bands on which dif ..."
Abstract
-
Cited by 106 (14 self)
- Add to MetaCart
In the framework of hidden Markov models (HMM) or hybrid HMM/Artificial Neural Network (ANN) systems, we present a new approach towards automatic speech recognition (ASR). The general idea is to split the whole frequency band (represented in terms of critical bands) into a few sub-bands on which different recognizers are independently applied and then recombined at a certain speech unit level to yield global scores and a global recognition decision. The preliminary results presented in this paper show that such an approach, even using quite simple recombination strategies, can yield at least comparable performance on clean speech while providing better robustness in the case of noisy speech.
Towards ASR on Partially Corrupted Speech
, 1996
"... A new highly parallel approach to automatic recognition of speech, inspired by early Fletcher's research on Articulation Index, and based on independent probability estimates in several sub-bands of the available speech spectrum, is presented. The approach is especially suitable for situations when ..."
Abstract
-
Cited by 54 (9 self)
- Add to MetaCart
A new highly parallel approach to automatic recognition of speech, inspired by early Fletcher's research on Articulation Index, and based on independent probability estimates in several sub-bands of the available speech spectrum, is presented. The approach is especially suitable for situations when part of the spectrum of speech is corrupted. In such cases, it can yield an order-of-magnitude improvement in the error rate over a conventional full-band recognizer. 1.
Sub-Band Based Recognition Of Noisy Speech
- in Proc. of ICASSP'97, (Munich
, 1997
"... A new approach to automatic speech recognition based on independent class-conditional probability estimates in several frequency sub-bands is presented. The approach is shown to be especially applicable to environments which cause partial corruption of the frequency spectrum of the signal. Some of t ..."
Abstract
-
Cited by 49 (4 self)
- Add to MetaCart
A new approach to automatic speech recognition based on independent class-conditional probability estimates in several frequency sub-bands is presented. The approach is shown to be especially applicable to environments which cause partial corruption of the frequency spectrum of the signal. Some of the issues involved in the implementation of the approach are also addressed. 1. INTRODUCTION When speech signal is partly degraded e.g. by a frequency selective noise, some part of the speech spectrum may still carry a valid information. A typical signal representation used in automatic speech recognition (ASR) consists of a series of feature vectors, each vector representing the entire short-term frequency spectrum at a given time instant. Even one or a few corrupted elements in the feature vector lead to severe degradation of the recognition performance. Earlier work by Fletcher on articulatory index [1] (review in [2]) suggests that the human auditory mechanism decodes the linguistic mes...
Understanding Speech Understanding: Towards A Unified Theory Of Speech Perception
, 1996
"... Ever since Helmholtz, the perceptual basis of speech has been associated with the energy distribution across frequency. However, there is now accumulating evidence that speech understanding does not require a detailed spectral portraiture of the signal. As a consequence, a new theoretical perspectiv ..."
Abstract
-
Cited by 36 (6 self)
- Add to MetaCart
Ever since Helmholtz, the perceptual basis of speech has been associated with the energy distribution across frequency. However, there is now accumulating evidence that speech understanding does not require a detailed spectral portraiture of the signal. As a consequence, a new theoretical perspective, focused on time, is beginning to emerge. This framework emphasizes the temporal evolution of coarse spectral patterns as the primary carrier of information within the speech signal, and provides an efficient and effective means of shielding linguistic information against the potentially hostile forces of the natural soundscape, such as reverberation and background acoustic interference. The auditory system may extract this relational information through computation of the low-frequency modulation spectrum in the auditory cortex, and this representation provides a principled basis for segmentation of the speech signal into syllabic units. Because of the systematic relationship between the syllable and higher-level lexicogrammatical organization it is possible, in principle, to gain direct access to the lexicon and grammar through such an auditory analysis of speech.
Traps -- Classifiers Of Temporal Patterns
- IN PROCEEDINGS OF 5TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, ICSLP 98
, 1998
"... The work proposes a radically different set of features for ASR where TempoRAl Patterns of spectral energies are used in place of the conventional spectral patterns. The approach has several inherent advantages, among them robustness to stationary or slowly varying disturbances. ..."
Abstract
-
Cited by 33 (8 self)
- Add to MetaCart
The work proposes a radically different set of features for ASR where TempoRAl Patterns of spectral energies are used in place of the conventional spectral patterns. The approach has several inherent advantages, among them robustness to stationary or slowly varying disturbances.
Temporal patterns (TRAPS) in ASR of noisy speech
- in Proc. ICASSP
, 1999
"... International Computer Science Institute, In this paper we study a new approach to processing temporal information for automatic speech recognition (ASR). Speci cally, we study the use of rather longtime TempoRAl Patterns (TRAPs) of spectral energies in place of the conventional spectral patterns fo ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
International Computer Science Institute, In this paper we study a new approach to processing temporal information for automatic speech recognition (ASR). Speci cally, we study the use of rather longtime TempoRAl Patterns (TRAPs) of spectral energies in place of the conventional spectral patterns for ASR. The proposed Neural TRAPs are found to yield significant amount of complementary information to that of the conventional spectral feature based ASR system. A combination of these two ASR systems is shown to result in improved robustness to several types of additive and convolutive environmental degradations. 1.1. Spectral features 1.
Hybrid HMM/ANN Systems for Speech Recognition: Overview and New Research Directions
- in Adaptive Processing of Sequences and Data Structures, ser. Lecture Notes in Artificial Intelligence (1387
, 1998
"... ..."
Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition
- EURASIP J. APPL. SIGNAL PROCESSING
, 2002
"... When trying to overcome the significant performance drops of ASR systems in the presence of noise, one road to follow is the integration of the information present in the lips movement of the speaker. Comparisons showed that integration of audio and video data on the decision level yields best re ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
When trying to overcome the significant performance drops of ASR systems in the presence of noise, one road to follow is the integration of the information present in the lips movement of the speaker. Comparisons showed that integration of audio and video data on the decision level yields best recognition results. This raises the question how to weight the two modalities in different noise conditions. Throughout this article we develop a weighting process adaptive to various background noise situations. Firstly

