Results 1 -
9 of
9
Comparison of Discriminative Training Criteria and Optimization Methods for Speech Recognition
, 2001
"... The aim of this work is to build up a common framework for a class of discriminative training criteria and optimization methods for continuous speech recognition. A unified discriminative criterion based on likelihood ratios of correct and competing models with optional smoothing is presented. The u ..."
Abstract
-
Cited by 32 (6 self)
- Add to MetaCart
The aim of this work is to build up a common framework for a class of discriminative training criteria and optimization methods for continuous speech recognition. A unified discriminative criterion based on likelihood ratios of correct and competing models with optional smoothing is presented. The unified criterion leads to particular criteria through the choice of competing word sequences and the choice of smoothing. Analytic and experimental comparisons are presented for both the maximum mutual information (MMI) and the minimum classification error (MCE) criterion together with the optimization methods gradient descent (GD) and extended Baum (EB) algorithm. A tree search-based restricted recognition method using word graphs is presented, so as to reduce the computational complexity of large vocabulary discriminative training. Moreover, for MCE training, a method using word graphs for efficient calculation of discriminative statistics is introduced. Experiments were performed for continuous speech recognition using the ARPA wall street journal (WSJ) corpus with a vocabulary of 5k words and for the recognition of continuously spoken digit strings using both the TI digit string corpus for American English digits, and the SieTill corpus for telephone line recorded German digits. For the MMI criterion, neither analytical nor experimental results do indicate significant differences between EB and GD optimization. For acoustic models of low complexity, MCE training gave significantly better results than MMI training. The recognition results for large vocabulary MMI training on the WSJ corpus show a significant dependence on the context length of the language model used for training. Best results were obtained using a unigram language model for MMI training. No significant co...
Learning of variability for invariant statistical pattern recognition
- In ECML 2001, 12th European Conference on Machine Learning
, 2001
"... ..."
Acoustic Front-End Optimization for Large Vocabulary Speech Recognition
- Proc. EUROSPEECH
"... In this paper we describe experiments with the acoustic front--end of our large vocabulary speech recognition system. In particular, two aspects are studied: 1) linear transforms for feature extraction and 2) the modelling of the emission probabilities. Experiments are reported on a 5000--word task ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
In this paper we describe experiments with the acoustic front--end of our large vocabulary speech recognition system. In particular, two aspects are studied: 1) linear transforms for feature extraction and 2) the modelling of the emission probabilities. Experiments are reported on a 5000--word task of the ARPA Wall Street Journal database.
Comparison Of Optimization Methods For Discriminative Training Criteria
- IN PROC. EUROSPEECH’97
, 1997
"... In this work we compare two parameter optimization techniques for discriminative training using the MMI criterion: the extended Baum-Welch (EBW) algorithm and the generalized probabilistic descent (GPD) method. Using Gaussian emission densities we found special expressions for the step sizes in GPD, ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
In this work we compare two parameter optimization techniques for discriminative training using the MMI criterion: the extended Baum-Welch (EBW) algorithm and the generalized probabilistic descent (GPD) method. Using Gaussian emission densities we found special expressions for the step sizes in GPD, leading to reestimation formula very similar to those derived for the EBW algorithm. Results were produced for both the TI digitstring and the SieTill corpus for continuously spoken American English and German digitstrings. The results for both techniques do not show significant differences. This experimental results support the strong link between EBW and GPD as expected from the analytic comparison.
Statistical Modelling in Continuous Speech Recognition (CSR)
- IN CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 2001
"... Automatic continuous speech recognition (CSR) is sufficiently ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Automatic continuous speech recognition (CSR) is sufficiently
Independent component analysis applied to feature extraction for robust automatic speech recognition
- Electronics Letters
, 2000
"... In this article we explore Independent Component Analysis (ICA) as a statistical technique for deriving suitable data-driven representational bases for the projection of spectrum and cepstrum in the context of Automatic Speech Recognition (ASR). Based on the close link between the independent mechan ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this article we explore Independent Component Analysis (ICA) as a statistical technique for deriving suitable data-driven representational bases for the projection of spectrum and cepstrum in the context of Automatic Speech Recognition (ASR). Based on the close link between the independent mechanisms of speech variability and the concept of statistical independence we derive a new feature transformation that effects consistent improvement in recognition performance. Introduction: The feature extraction stage of current ASR systems converts the input speech waveform in a series of low-dimensional vectors, each summarizing a short segment of the acoustical speech input in order to minimize the computational demands of the Hidden Markov Model (HMM) classifier. The resulting feature vector produced by subsequent transformations is driven in a final decorrelation stage that permits the
Acoustic Modeling in the Philips Hub-4 Continuous-Speech Recognition System
, 1998
"... In this paper we describe some characteristics of the acoustic modeling used in the Philips continuous-speech recognition system for the DARPA Hub-4 1997 evaluation, which are related to robustness issues. We aimed at a conceptually simple system: We trained two model sets on 70 hours of the Hub4 tr ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper we describe some characteristics of the acoustic modeling used in the Philips continuous-speech recognition system for the DARPA Hub-4 1997 evaluation, which are related to robustness issues. We aimed at a conceptually simple system: We trained two model sets on 70 hours of the Hub4 training data, one for within-word and one for cross-word decoding. These model sets were used for both genders and all environmental conditions. In order to be able to do so, channel normalization (mean, variance normalization) and speaker normalization (vocal tract length normalization, realized by an appropriate shift of the center frequencies of the mel filter bank) have been applied, as well as adaptation techniques. MLLR-based unsupervised batch adaptation on clusters of segments was conducted both after a first withinword decoding and a cross-word decoding pass. The training strategy and the effects of the various normalization and adaptation techniques will be discussed in the paper. ...
Adaption in Statistical Pattern Recognition Using . . .
- IEEE Trans. on Pattern Anal. and Machine Intel
, 2004
"... We integrate the tangent method into a statistical framework for classification analytically and practically. The resulting consistent framework for adaptation allows us to efficiently estimate the tangent vectors representing the variability. The framework improves classification results on two ..."
Abstract
- Add to MetaCart
We integrate the tangent method into a statistical framework for classification analytically and practically. The resulting consistent framework for adaptation allows us to efficiently estimate the tangent vectors representing the variability. The framework improves classification results on two real-world pattern recognition tasks from the domains handwritten character recognition and automatic speech recognition.
EXTENDING FEATURES FOR AUTOMATIC SPEECH RECOGNITION BY MEANS OF AUDITORY MODELLING
"... When investigating the benefit of auditory modelling for automatic speech recognition applications typically different features or auditory simulation models are compared. In this work the attempt of combining several auditory model based feature extraction schemes is pursued, as well as their furth ..."
Abstract
- Add to MetaCart
When investigating the benefit of auditory modelling for automatic speech recognition applications typically different features or auditory simulation models are compared. In this work the attempt of combining several auditory model based feature extraction schemes is pursued, as well as their further combination with standard MFCC features. For this purpose a regularization of the common heteroscedastic discriminant analysis is introduced to summarize relevant information in feature spaces of lower dimension and uncorrelated single features. Besides standard auditory model- based features also new features are included that rely on delay computing networks to extract relevant information from the shape of the cochlear travelling wave delay trajectory. In an empirical study statistically significant improvements are shown by combining standard MFCCs with the different features extracted from the auditory simulation model. The effect of different degrees of regularization is investigated for this task. 1.

