## Comparison Of Optimization Methods For Discriminative Training Criteria (1997)

Venue: | IN PROC. EUROSPEECH’97 |

Citations: | 10 - 4 self |

### BibTeX

@INPROCEEDINGS{Schlüter97comparisonof,

author = {R. Schlüter and W. Macherey and S. Kanthak and H. Ney and L. Welling},

title = {Comparison Of Optimization Methods For Discriminative Training Criteria},

booktitle = {IN PROC. EUROSPEECH’97},

year = {1997},

pages = {15--18},

publisher = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

In this work we compare two parameter optimization techniques for discriminative training using the MMI criterion: the extended Baum-Welch (EBW) algorithm and the generalized probabilistic descent (GPD) method. Using Gaussian emission densities we found special expressions for the step sizes in GPD, leading to reestimation formula very similar to those derived for the EBW algorithm. Results were produced for both the TI digitstring and the SieTill corpus for continuously spoken American English and German digitstrings. The results for both techniques do not show significant differences. This experimental results support the strong link between EBW and GPD as expected from the analytic comparison.

### Citations

225 |
An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology
- Baum, Eagon
- 1967
(Show Context)
Citation Context ...(EBW) = \Gamma s (x) +Ds��s \Gamma s (1) +Dssoe 2 s;(EBW) = \Gamma s (x 2 ) +Ds(oe 2 s + �� 2 s ) \Gamma s (1) +Ds \Gammas�� 2 s Although there do exist proofs of convergence for both GPD =-=[4] and EBW [3, 7]-=-, the step sizes needed to guarantee convergence are impractical by leading to very slow convergence [12]. In practice, faster convergence is achieved in the EBW case, if the iteration constants Ds ar... |

183 |
A Database for SpeakerIndependent Digit Recognition
- Leonard
- 1984
(Show Context)
Citation Context ...or calculation of the FB probabilities is performed using the Viterbi approximation [10]. 3. RESULTS Experiments were done for the recognition of continuous digitstrings using both the TI digitstring =-=[9]-=- corpus for American English digits and the SieTill [6] corpus for telephone line recorded German digits. In Table 1 some information on corpus statistics is summarized. Table 1. Corpus statistics for... |

100 | An inequality for rational functions with applications to some statistical estimation problems
- Gopalakrishan, Kanevsky, et al.
- 1991
(Show Context)
Citation Context ...(EBW) = \Gamma s (x) +Ds��s \Gamma s (1) +Dssoe 2 s;(EBW) = \Gamma s (x 2 ) +Ds(oe 2 s + �� 2 s ) \Gamma s (1) +Ds \Gammas�� 2 s Although there do exist proofs of convergence for both GPD =-=[4] and EBW [3, 7]-=-, the step sizes needed to guarantee convergence are impractical by leading to very slow convergence [12]. In practice, faster convergence is achieved in the EBW case, if the iteration constants Ds ar... |

41 |
MMI training for continuous phoneme recognition on the TIMIT database
- Kapadia, Valtchev, et al.
(Show Context)
Citation Context ...ng link between EBW and GPD as expected from the analytic comparison. 1. INTRODUCTION In an increasing number of applications discriminative training criteria such as Maximum Mutual Information (MMI) =-=[8, 12, 15]-=- and Minimum Classification Error (MCE) [2, 5, 15] have been used. In MCE training, an approximation for the error rate on the training data is optimized, whereas MMI optimizes the a posteriori probab... |

36 |
Segmental GPD training of HMM based speech recognizer
- Chou, Juang, et al.
- 1992
(Show Context)
Citation Context ...l EBW:s�� s;(EBW) = \Gamma s (x) +Ds��s \Gamma s (1) +Dssoe 2 s;(EBW) = \Gamma s (x 2 ) +Ds(oe 2 s + �� 2 s ) \Gamma s (1) +Ds \Gammas�� 2 s Although there do exist proofs of convergen=-=ce for both GPD [4]-=- and EBW [3, 7], the step sizes needed to guarantee convergence are impractical by leading to very slow convergence [12]. In practice, faster convergence is achieved in the EBW case, if the iteration ... |

35 |
Minimum error rate training based on the N-best string models
- Chou, Lee, et al.
- 1993
(Show Context)
Citation Context ...alytic comparison. 1. INTRODUCTION In an increasing number of applications discriminative training criteria such as Maximum Mutual Information (MMI) [8, 12, 15] and Minimum Classification Error (MCE) =-=[2, 5, 15]-=- have been used. In MCE training, an approximation for the error rate on the training data is optimized, whereas MMI optimizes the a posteriori probability of the training utterances and hence the cla... |

28 |
Maximum Mutual Information Estimation of Hidden Markov Models," Automatic Speech and
- Normandin
- 1996
(Show Context)
Citation Context ...ng link between EBW and GPD as expected from the analytic comparison. 1. INTRODUCTION In an increasing number of applications discriminative training criteria such as Maximum Mutual Information (MMI) =-=[8, 12, 15]-=- and Minimum Classification Error (MCE) [2, 5, 15] have been used. In MCE training, an approximation for the error rate on the training data is optimized, whereas MMI optimizes the a posteriori probab... |

20 |
Acoustic Modeling of Phoneme Units for Continuous Speech Recognition
- Ney
- 1990
(Show Context)
Citation Context ...tribution to the optimization process. This method is called corrective training [12]. In addition, time alignment for calculation of the FB probabilities is performed using the Viterbi approximation =-=[10]-=-. 3. RESULTS Experiments were done for the recognition of continuous digitstrings using both the TI digitstring [9] corpus for American English digits and the SieTill [6] corpus for telephone line rec... |

20 | MMIE training for large vocabulary continuous speech recognition - Normandin, Lacouture, et al. - 1994 |

13 |
Markov Models, Maximum Mutual Information, and the Speech Recognition Problem
- Hidden
- 1991
(Show Context)
Citation Context ...eighted by its posterior probability. 2.1.2. Extended Baum-Welch Algorithm Discriminative training with the MMI criterion usually applies an extended version of Baum Welch training, the EBW algorithm =-=[11, 12, 13]-=-. There the MMI criterion is maximized via the following auxiliary function: S(; ) = X s R X r=1 Tr X t=1 \Theta fl r;t (s; Wr ) \Gamma fl gen r;t (s) log p(xr;t js) + X s Ds Z dx p(xjs) log p(xjs); w... |

13 | Discriminative Training for Continuous Speech Recognition
- Reichl, Ruske
- 1995
(Show Context)
Citation Context ...ng link between EBW and GPD as expected from the analytic comparison. 1. INTRODUCTION In an increasing number of applications discriminative training criteria such as Maximum Mutual Information (MMI) =-=[8, 12, 15]-=- and Minimum Classification Error (MCE) [2, 5, 15] have been used. In MCE training, an approximation for the error rate on the training data is optimized, whereas MMI optimizes the a posteriori probab... |

11 | A comparative study of linear feature transformation techniques for automatic speech recognition
- Eisele, Haeb-Umbach, et al.
- 1996
(Show Context)
Citation Context ...ng the Viterbi approximation [10]. 3. RESULTS Experiments were done for the recognition of continuous digitstrings using both the TI digitstring [9] corpus for American English digits and the SieTill =-=[6]-=- corpus for telephone line recorded German digits. In Table 1 some information on corpus statistics is summarized. Table 1. Corpus statistics for the TI digitstring and the SieTill corpus. corpus fema... |

10 | Connected Digit Recognition Using Statistical Template Matching
- Welling, Ney, et al.
- 1995
(Show Context)
Citation Context ...ML training using the Viterbi approximation [10] and their results serve as starting points for the additional discriminative training. A detailed description of the baseline system could be found in =-=[16]-=-. Since discriminative training methods could not guarantee convergence under realistic conditions, we first investigated the convergence behaviour of the MMI criterion. Using iteration factors h = 5 ... |

6 | Minimum classification error training algorithm for feature extractor and pattern classifier in speech recognition - Paliwal, Bacchiani, et al. - 1995 |

2 |
Enhanced control and estimation of parameters for a telephone based isolated digit recognizer
- Bauer
- 1997
(Show Context)
Citation Context ...alytic comparison. 1. INTRODUCTION In an increasing number of applications discriminative training criteria such as Maximum Mutual Information (MMI) [8, 12, 15] and Minimum Classification Error (MCE) =-=[2, 5, 15]-=- have been used. In MCE training, an approximation for the error rate on the training data is optimized, whereas MMI optimizes the a posteriori probability of the training utterances and hence the cla... |

1 | A Discriminatively Derived Transform for Improved Speech Recognition - Ayer, Hunt, et al. - 1993 |

1 | Verallgemeinerte stochastische Modellierung fur die automatische Spracherkennung - Wolfertstetter - 1996 |