## Discriminative Training For Continuous Speech Recognition (1995)

Venue: | Proc. 1995 Europ. Conf. on Speech Communication and Technology |

Citations: | 14 - 0 self |

### BibTeX

@INPROCEEDINGS{Ruske95discriminativetraining,

author = {Reichl And Ruske and W. Reichl and G. Ruske},

title = {Discriminative Training For Continuous Speech Recognition},

booktitle = {Proc. 1995 Europ. Conf. on Speech Communication and Technology},

year = {1995},

pages = {537--540}

}

### Years of Citing Articles

### OpenURL

### Abstract

Discriminative training techniques for Hidden-Markov Models were recently proposed and successfully applied for automatic speech recognition. In this paper a discussion of the Minimum Classification Error and the Maximum Mutual Information objective is presented. An extended reestimation formula is used for the HMM parameter update for both objective functions. The discriminative training methods were utilized in speaker independent phoneme recognition experiments and improved the phoneme recognition rates for both discriminative training techniques. 1. INTRODUCTION Recently discriminative training techniques for Hidden- Markov Models (HMM) were used successfully for automatic speech recognition. They provide better performance compared to Maximum Likelihood Estimation (MLE), since the training is concentrated on the estimation of class boundaries and not on parameters of assumed model distributions [1,12]. Although MLE and discriminative training are theoretically equivalent (if su...

### Citations

203 |
An introduction to the application of the theory of probabilitic functions of a Markov process to automatic speech recognition
- Levinson, Rabiner, et al.
- 1983
(Show Context)
Citation Context ...ues Usually HMM learning is based on the Maximum Likelihood principle, optimizing the likelihood of the observation by a very efficient parameter reestimation technique, the Baum-Welch (BW) algorithm =-=[3,7]-=-. The optimization of HMM parameters according to discriminative criteria may be carried out with standard optimization techniques, such as steepest descent or conjugate gradients [1,3,5,8,9, 13,14]. ... |

203 |
Discriminative Learning for Minimum Error Classi
- Juang, Katagiri
- 1992
(Show Context)
Citation Context ...inty about the message, given the observed signal. Another discriminative objective function is the Minimum Classification Error (MCE), which approximates the misclassification rate of the classifier =-=[3,4,8,13,14]-=-. The optimization of this error function is generally carried out by the Generalized Probabilistic Descent (GPD) algorithm, a gradient descent based optimization, and results in a classifier with min... |

105 | An inequality for rational functions with applications to some statistical problems - Gopalakrishnan, Kanevsky, et al. - 1991 |

44 |
Hiddan Markov Models, Maximum Mutual Information, and the Speech Recognition Problem
- Normandin
- 1991
(Show Context)
Citation Context ...ques provide better performance if these requirements are not met [1,12]. A popular alternative to MLE is the Maximum Mutual Information (MMI) between the acoustic observation and the decoded symbols =-=[1,5,9,11,12]-=-. This criterion attempts to minimize the uncertainty about the message, given the observed signal. Another discriminative objective function is the Minimum Classification Error (MCE), which approxima... |

43 |
Mmi training for continuous phoneme recognition on the timit database
- Kapadia, Valtchev, et al.
- 1993
(Show Context)
Citation Context ...nted, which was extended in [11,12] to continuous observation densities. In speech recognition experiments this extended BW algorithm showed improved convergence compared to gradient descent training =-=[9,11,12]-=-. Recently the theoretical conditions for objective functions, optimized by (12), were relaxed to general analytic functions [10] (e.g. L(c;X)). The reestimation formula for parametersi is very simila... |

43 |
Performance Connected Digit Recognition Using Maximum Mutual Information Estimation
- Normandin
(Show Context)
Citation Context ...equired to meet the Lagrange conditions for the constraints [3,7,9,14]. In [6] an improved BW algorithm for the training of rational functions R(X) (e.g. I(c; X)) was presented, which was extended in =-=[11,12]-=- to continuous observation densities. In speech recognition experiments this extended BW algorithm showed improved convergence compared to gradient descent training [9,11,12]. Recently the theoretical... |

36 |
Segmental GPD training of HMM based speech recognizer
- Chou, Juang, et al.
- 1992
(Show Context)
Citation Context ...inty about the message, given the observed signal. Another discriminative objective function is the Minimum Classification Error (MCE), which approximates the misclassification rate of the classifier =-=[3,4,8,13,14]-=-. The optimization of this error function is generally carried out by the Generalized Probabilistic Descent (GPD) algorithm, a gradient descent based optimization, and results in a classifier with min... |

36 |
Minimum error rate training based on N-best string models
- Chou, Lee, et al.
- 1993
(Show Context)
Citation Context ...inty about the message, given the observed signal. Another discriminative objective function is the Minimum Classification Error (MCE), which approximates the misclassification rate of the classifier =-=[3,4,8,13,14]-=-. The optimization of this error function is generally carried out by the Generalized Probabilistic Descent (GPD) algorithm, a gradient descent based optimization, and results in a classifier with min... |

26 |
An inequality with applications to statistical prediction for functions of Markov processes and to a model of ecology
- Baum, Eagon
- 1967
(Show Context)
Citation Context ...ons for objective functions, optimized by (12), were relaxed to general analytic functions [10] (e.g. L(c;X)). The reestimation formula for parametersi is very similar to the BW growth-transformation =-=[2]-=-:si = T D ( i ) =si \Gamma @R(X) @ i +D \Delta P isi \Gamma @R(X) @ i +D \Delta : (12) Update formulas for special parameters, such as Gaussian means, are printed in [11,12]. The growth-transformation... |

24 |
Maximum mutual information estimation of hmm parameters for continuous speech recognition using the n-best algorithm
- Chow
- 1990
(Show Context)
Citation Context ...ques provide better performance if these requirements are not met [1,12]. A popular alternative to MLE is the Maximum Mutual Information (MMI) between the acoustic observation and the decoded symbols =-=[1,5,9,11,12]-=-. This criterion attempts to minimize the uncertainty about the message, given the observed signal. Another discriminative objective function is the Minimum Classification Error (MCE), which approxima... |

8 |
A generalization of the Baum algorithm to functions on non-linear manifolds
- Kanevsky
- 1995
(Show Context)
Citation Context ...mpared by a uniform formalism. The optimization of the objective functions is carried out by a gradient descent method for the MCE [1,3,4,5,8,9,13,14] or an extended Baum-Welch (BW) algorithm for MMI =-=[9,10,11,12]-=-. In the paper an extended BW algorithm for the MCE criterion is presented, which is This work was funded by the German Federal Ministry for Research and Technology (BMFT) in the framework of the Verb... |

5 | A New Model-Discriminant Training Algorithm For Hybrid NN-HMM Systems
- Reichl, Ruske
- 1994
(Show Context)
Citation Context |

5 | A hybrid rbf-hmm system for continuous speech recognition - Reichl, Ruske - 1995 |

3 |
Maximum Mutual Information Estimation
- Bahl, deSouza
- 1986
(Show Context)
Citation Context ...ovide better performance compared to Maximum Likelihood Estimation (MLE), since the training is concentrated on the estimation of class boundaries and not on parameters of assumed model distributions =-=[1,12]-=-. Although MLE and discriminative training are theoretically equivalent (if sufficient classifier parameters and enough training data exist and if Gaussian mixture assumptions are appropriate) discrim... |