Results 1 -
5 of
5
Linguistic dissection of Switchboard-Corpus Automatic
- Speech Recognition Systems”, ISCA-ITRW workshop, ASR-2000, Paris
"... A diagnostic evaluation of eight Switchboard-corpus recognition systems was conducted in order to ascertain whether word-error patterns are attributable to a specific set of linguistic factors. Each recognition system’s output was converted to a common format and scored relative to a reference trans ..."
Abstract
-
Cited by 28 (9 self)
- Add to MetaCart
A diagnostic evaluation of eight Switchboard-corpus recognition systems was conducted in order to ascertain whether word-error patterns are attributable to a specific set of linguistic factors. Each recognition system’s output was converted to a common format and scored relative to a reference transcript derived from phonetically hand-labeled data. This reference material was analyzed with respect to ca. forty acoustic, linguistic and speaker characteristics, which in turn, were correlated with recognition-error patterns via decision-trees and other forms of statistical analysis. The most consistent factors associated with superior recognition performance pertain to accurate classification of phonetic segments and articulatoryacoustic features. Other factors correlated with word recognition are syllable structure, prosodic stress and speaking rate (in terms of syllables per second). 1.
Discriminative speaker adaptation with conditional maximum likelihood linear regression
- In Eurospeech
, 2001
"... We present a simplified derivation of the extended Baum-Welch procedure, which shows that it can be used for Maximum Mutual Information (MMI) of a large class of continuous emission density hidden Markov models (HMMs). We use the extended Baum-Welch procedure for discriminative estimation of MLLR-ty ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
We present a simplified derivation of the extended Baum-Welch procedure, which shows that it can be used for Maximum Mutual Information (MMI) of a large class of continuous emission density hidden Markov models (HMMs). We use the extended Baum-Welch procedure for discriminative estimation of MLLR-type speaker adaptation transformations. The resulting adaptation procedure, termed Conditional Maximum Likelihood Linear Regression (CMLLR), is used successfully for supervised and unsupervised adaptation tasks on the Switchboard corpus, yielding an improvement over MLLR. The interaction of unsupervised CMLLR with segmental minimum Bayes risk lattice voting procedures is also explored, showing that the two procedures are complimentary. 1.
Discriminative linear transforms for feature normalization and speaker adaptation in HMM estimation
, 2002
"... Linear transforms have been used extensively for training and adaptation of HMM-based ASR systems. Recently procedures have been developed for the estimation of linear transforms under the Maximum Mutual Information (MMI) criterion. In this paper we introduce discriminative training procedures that ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
Linear transforms have been used extensively for training and adaptation of HMM-based ASR systems. Recently procedures have been developed for the estimation of linear transforms under the Maximum Mutual Information (MMI) criterion. In this paper we introduce discriminative training procedures that employ linear transforms for feature normalization and for speaker adaptive training. We integrate these discriminative linear transforms into MMI estimation of HMM parameters for improvement of large vocabulary conversational speech recognition systems. 1.
Connectionist Language Modeling For Large Vocabulary Continuous Speech Recognition
- In International Conference on Acoustics, Speech and Signal Processing
, 2002
"... This paper describes ongoing work on a new approach for language modeling for large vocabulary continuous speech recognition. Almost all state-of-the-art systems use statistical n-gram language models estimated on text corpora. One principle problem with such language models is the fact that many of ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
This paper describes ongoing work on a new approach for language modeling for large vocabulary continuous speech recognition. Almost all state-of-the-art systems use statistical n-gram language models estimated on text corpora. One principle problem with such language models is the fact that many of the n-grams are never observed even in very large training corpora, and therefore it is common to back-off to a lower-order model. In this paper we propose to address this problem by carrying out the estimation task in a continuous space, enabling a smooth interpolation of the probabilities. A neural network is used to learn the projection of the words onto a continuous space and to estimate the n-gram probabilities. The connectionist language model is being evaluated on the DARPA HUB5 conversational telephone speech recognition task and preliminary results show consistent improvements in both perplexity and word error rate.
Hidden Model Sequence Models for Automatic Speech Recognition
, 2001
"... Most modern automatic speech recognition systems make use of acoustic models based on hidden Markov models. To obtain reasonable recognition performance within a large vocabulary framework, the acoustic models usually include a pronunciation model, together with complex parameter tying schemes. In m ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Most modern automatic speech recognition systems make use of acoustic models based on hidden Markov models. To obtain reasonable recognition performance within a large vocabulary framework, the acoustic models usually include a pronunciation model, together with complex parameter tying schemes. In many cases the pronunciation model operates on a phoneme level and is derived independently of the underlying models. In contrast, this work is aimed at improving pronunciation modelling on a sub-phone level in a combined framework. The modelling of pronunciation variation is assumed to be of special importance for recognition of spontaneous speech.

