Results 1 - 10
of
14
Speaking In Shorthand -- A Syllable-Centric Perspective For Understanding Pronunciation Variation
, 1998
"... Current-generation automatic speech recognition (ASR) systems model spoken discourse as a linear sequence of words and phones. Because it is unusual for every phone within a word to be pronounced in a standard ("canonical") way, ASR systems often depend on a multi-pronunciation lexicon to match an a ..."
Abstract
-
Cited by 93 (12 self)
- Add to MetaCart
Current-generation automatic speech recognition (ASR) systems model spoken discourse as a linear sequence of words and phones. Because it is unusual for every phone within a word to be pronounced in a standard ("canonical") way, ASR systems often depend on a multi-pronunciation lexicon to match an acoustic sequence with a lexical unit. Since there are, in practice, many different ways for a word to be pronounced, this standard approach adds a layer of complexity and ambiguity to the decoding process which, if modified, could potentially improve recognition performance. Systematic analysis of pronunciation variation in a corpus of spontaneous English discourse (Switchboard) demonstrates that the variation observed is systematic at the level of the syllable. Syllabic onsets are realized in canonical form far more frequently than either coda or nuclear constituents. Prosodic stress also plays an important role in pronunciation. The governing mechanism is likely to involve the informationa...
Towards Multi-Domain Speech Understanding with Flexible and Dynamic Vocabulary
, 2001
"... In developing telephone-based conversational systems, we foresee future systems capable of supporting multiple domains and flexible vocabulary. Users can pursue several topics of interest within a single telephone call, and the system is able to switch transparently among domains within a single dia ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
In developing telephone-based conversational systems, we foresee future systems capable of supporting multiple domains and flexible vocabulary. Users can pursue several topics of interest within a single telephone call, and the system is able to switch transparently among domains within a single dialog. This system is able to detect the presence of any out-of-vocabulary (OOV) words, and automatically hypothesizes each of their pronunciation, spelling and meaning. These can be confirmed with the user and the new words are subsequently incorporated into the recognizer lexicon for future use. This thesis
Dynamic Classifier Combination In Hybrid Speech Recognition Systems Using Utterance-Level Confidence Values
- PROCEEDINGS ICASSP-99
, 1999
"... A recent development in the hybrid HMM/ANN speech recognition paradigm is the use of several subword classifiers, each of which provides different information about the speech signal. Although the combining methods have obtained promising results, the strategies so far proposed have been relatively ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
A recent development in the hybrid HMM/ANN speech recognition paradigm is the use of several subword classifiers, each of which provides different information about the speech signal. Although the combining methods have obtained promising results, the strategies so far proposed have been relatively simple. In most cases frame-level subword unit probabilities are combined using an unweighted product or sum rule. In this paper, we argue and empirically demonstrate that the classifier combination approach can benefit from a dynamically weighted combination rule, where the weights are derived from higher-than-frame-level confidence values.
Perceptually Inspired Signal-processing Strategies for Robust Speech Recognition in Reverberant Environments
, 1998
"... Natural, hands-free interaction with computers is currently one of the great unfulfilled promises of automatic speech recognition (ASR), in part because ASR systems cannot reliably recognize speech under everyday, reverberant conditions that pose no problems for most human listeners. The specific pr ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Natural, hands-free interaction with computers is currently one of the great unfulfilled promises of automatic speech recognition (ASR), in part because ASR systems cannot reliably recognize speech under everyday, reverberant conditions that pose no problems for most human listeners. The specific properties of the auditory representation of speech likely contribute to reliable human speech recognition under such conditions. This dissertation explores the use of perceptually inspired signal-processing strategies -- critical-band-like frequency analysis, an emphasis of slow changes in the spectral structure of the speech signal, adaptation, integration of phonetic information over syllabic durations, and use of multiple signal representations for...
Discriminant Training of Front-End and Acoustic Modeling Stages to Heterogeneous Acoustic Environments for Multi-stream Automatic Speech Recognition
, 2000
"... Automatic Speech Recognition (ASR) still poses a problem to researchers. In particular, most ASR systems have not been able to fully handle adverse acoustic environments. Although a large number of modifications have resulted in increased levels of performance robustness, ASR systems still fall sh ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Automatic Speech Recognition (ASR) still poses a problem to researchers. In particular, most ASR systems have not been able to fully handle adverse acoustic environments. Although a large number of modifications have resulted in increased levels of performance robustness, ASR systems still fall short of human recognition ability in a large number of environments. A possible shortcoming of the typical ASR system is the reliance on a single stream of front-end acoustic features and acoustic modeling feature probabilities. A single front-end feature extraction algorithm may not be capable of maintaining robustness to arbitrary acoustic environments. Acoustic modeling will also degrade due to distributional changes caused by the acoustic environment. This thesis explores the parallel use of multiple front-end and acoustic modeling elements to improve upon this shortcoming. Each ASR acoustic modeling component is trained to estimate class posterior probabilities in a particular acoustic environment. In addition to discriminative training of the probability estimator, existing feature extraction algorithms are modi#ed in suchaway as to improve class discrimination in the training environment. More specifically, Linear Discriminant Analysis provides a mechanism for obtaining discriminant temporal basis functions that can replace components of the existing algorithms that were designed in either an empirical or intuitive manner. Probability streams are generate...
NON-STATIONARY MULTI-CHANNEL (MULTI-STREAM) PROCESSING TOWARDS ROBUST AND ADAPTIVE ASR
"... In this paper, we discuss the rationale behind multi-channel processing as applied to multi-stream automatic speech recognition (ASR). In this framework, we will develop dif-ferent mathematical models and discuss some interesting relationships with psycho-acoustic evidence.In the case of multi-chan ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
In this paper, we discuss the rationale behind multi-channel processing as applied to multi-stream automatic speech recognition (ASR). In this framework, we will develop dif-ferent mathematical models and discuss some interesting relationships with psycho-acoustic evidence.In the case of multi-channel processing, it is assumed that the speech signal is processed by different "experts",each expert focusing on a different characteristic of the signal, and that the different channels are combined at some(temporal) stage to yield a global recognition output. Although we believe that the discussion below is valid fornumerous multi-channel problems (e.g., audio and visual streams, in the case of audio-visual ASR), the present pa-per will mainly discuss the possible combination strategies (with application to multi-band ASR) and their relationshipswith different mathematical models. Finally, we will show that the proposed approaches could provide us with a newparadigm for noise robust and adaptive ASR.
Towards robust and adaptive speech recognition models
- IDIAP Research Reprort No. IDIAP-PR
, 2003
"... In this paper, we discuss a family of new Automatic Speech Recognition (ASR) approaches, which somewhat deviate from the usual ASR approaches but which have recently been shown to be more robust to nonstationary noise, without requiring specific adaptation or “multi-style ” training. More specifical ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this paper, we discuss a family of new Automatic Speech Recognition (ASR) approaches, which somewhat deviate from the usual ASR approaches but which have recently been shown to be more robust to nonstationary noise, without requiring specific adaptation or “multi-style ” training. More specifically, we will motivate and briefly describe new approaches based on multi-stream and subband ASR. These approaches extend the standard hidden Markov model (HMM) based approach by assuming that the different (frequency) streams representing the speech signal are processed by different (independent) “experts”, each expert focusing on a different characteristic of the signal, and that the different stream likelihoods (or posteriors) are combined at some (temporal) stage to yield a global recognition output. As a further extension to multi-stream ASR, we will finally introduce a new approach, referred to as HMM2, where the HMM emission probabilities are estimated via state specific feature based HMMs responsible for merging the stream information and modeling their possible correlation. Key words. Robust speech recognition, hidden Markov models, subband processing, multistream processing. 1. Introduction. Current
Data-Driven Rasta Filters In Reverberation
, 1999
"... In this work we test the performance of RASTA-style modulation filters derived under reverberant conditions. The modulation filters are constructed through linear discriminant analysis of log critical band energies in a manner described by van Vuuren and Hermansky. In previous work we had observed t ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this work we test the performance of RASTA-style modulation filters derived under reverberant conditions. The modulation filters are constructed through linear discriminant analysis of log critical band energies in a manner described by van Vuuren and Hermansky. In previous work we had observed the properties of the resultant filters under a number of acoustic conditions that were artificially applied to the training speech. Here, we present automatic speech recognition results that compare the performance of these filters under some training and testing reverberant conditions. We also test the effectiveness and robustness of a multi-stream combination using probability streams trained under different reverberant environment. The experiments reveal some performance improvement in severe reverberation. 1. INTRODUCTION Robustness to reverberant acoustic conditions is a challenging problem in automatic speech recognition (ASR). The effects of reverberation and temporal smearing have...
Using Multiple Time Scales in the Framework of Multi-Stream Speech Recognition
, 2000
"... In this paper, we present a new approach to incorporating multiple time scale information as independent streams in multi-stream processing. To illustrate the procedure, we take two dierent sets of multiple time scale features. In the rst system, these are features extracted over variable sized wind ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this paper, we present a new approach to incorporating multiple time scale information as independent streams in multi-stream processing. To illustrate the procedure, we take two dierent sets of multiple time scale features. In the rst system, these are features extracted over variable sized windows of three and ve times the original window size. In the second system, we take as separate input streams the commonly used dierence features, i.e. the rst and second order derivatives of the instantaneous features. In the same way, any other kinds of multiple time scale features could be employed. The approach is embedded in the recently introduced "full combination" approach to multi-stream processing in which, the phoneme probabilities from all possible combinations of streams are combined in a weighted sum. As an extension of this approach we have found that replacing the sum of probabilities by their product, in the same "all wise" context, can result in higher robustness. Capt...
Robust Automatic Speech Recognition With Unreliable Data
, 1999
"... Theoretical and practical issues of some of the problems in robust automatic speech recognition (ASR) and some of the techniques that address them are presented in this report. The problem of the robustness of the ASR in real--life (as opposed to laboratory) conditions is paramount to the widespread ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Theoretical and practical issues of some of the problems in robust automatic speech recognition (ASR) and some of the techniques that address them are presented in this report. The problem of the robustness of the ASR in real--life (as opposed to laboratory) conditions is paramount to the widespread deployment of speech enabled products. The report reviews techniques used so far for robust ASR, ranging from simple spectrum subtraction to various types of model adaptation. A possible connection of robust ASR with the computational auditory scene analysis (CASA), methods for local Signal--to--Noise Ratio (SNR) estimation and classification/scoring with on--line adapted statistical models is discussed. The main focus is on the techniques that would allow for incorporation of CASA and local SNR estimates (used as methods for speech/non--speech separation) into the present prevailing stochastic pattern matching paradigms -- Hidden Markov models (HMM) and artificial neural networks (ANN). Th...

