## Discriminative Training of Gaussian Mixtures for Image Object Recognition (1999)

### Cached

### Download Links

- [www-i6.informatik.rwth-aachen.de]
- [www-i6.informatik.rwth-aachen.de]
- [www-i6.informatik.rwth-aachen.de]
- DBLP

### Other Repositories/Bibliography

Venue: | DAGM SYMPOSIUM MUSTERERKENNUNG |

Citations: | 11 - 6 self |

### BibTeX

@INPROCEEDINGS{Dahmen99discriminativetraining,

author = {J. Dahmen and R. Schlüter and H. Ney},

title = {Discriminative Training of Gaussian Mixtures for Image Object Recognition},

booktitle = {DAGM SYMPOSIUM MUSTERERKENNUNG},

year = {1999},

pages = {205--212},

publisher = {Springer Verlag}

}

### Years of Citing Articles

### OpenURL

### Abstract

In this paper we present a discriminative training procedure for Gaussian mixture densities. Conventional maximum likelihood (ML) training of such mixtures proved to be very efficient for object recognition, even though each class is treated separately in training. Discriminative criteria offer the advantage that they also use out-of-class data, that is they aim at optimizing class separability. We present results on the US Postal Service (USPS) handwritten digits database and compare the discriminative results to those obtained by ML training. We also compare our best results with those reported by other groups, proving them to be state-of-the-art.

### Citations

9946 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...dex h=4 h=5 h=20 Fig. 2. MMI convergence behaviour for different h (single densities) Table 2. Results reported on the USPS database Method Error Rate [%] Human Performance [2] 2.5 Decision Tree C4.5 =-=[13]-=- 16.2 Two-Layer Neural Net [13] 5.9 5-Layer Neural Net (LeNet1) [13] 5.1 Support Vectors [14] 4.0 Invariant Support Vectors [15] 3.0 This work: MMI-Mixtures 4.5 ML-Mixtures 4.5 MMI-Mixtures, Product R... |

9054 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...ity function and on the other hand the covariance matrix of our previously whitened data is known to be diagonal. ML parameter estimation is now done using the Expectation Maximization (EM) algorithm =-=[6]-=- combined with a Linde-Buzo-Gray based clustering procedure [7]. Note that we used global variance pooling and a maximum approximation of the EM-algorithm in our experiments. For more information on M... |

4178 |
Pattern Classification and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ... details on that topic, the reader is referred to [9]). Note that for ease of representation we skip the dimension index d in the following formulae.ski = \Gamma ki (x) +Dc kiski \Gamma ki (1) +Dc ki =-=(5)-=-soe 2 = P k D(oe 2 + P i c kis2 ki ) KD \Gamma X ki \Gamma ki (1) +Dc ki KD 2 ki (6)sc ki = \Gamma ki (1) +Dc ki \Gamma k (1) +D (7) with iteration constant D. \Gamma ki (g(x)) and \Gamma k (g(x)) are... |

2928 |
Introduction to Statistical Pattern Recognition, Electrical Science Series
- Fukunaga
- 1972
(Show Context)
Citation Context ...s. A Gaussian mixture is defined as a linear combination of Gaussian component densities N (xj ki ; \Sigma ki ) withs= fc ki ;ski ; \Sigma ki g: ps(xjk) = I k X i=1 c ki \Delta N (xj ki ; \Sigma ki ) =-=(3)-=- where I k is the number of component densities used to model class k, c ki are weight coefficients (with c ki ? 0 and P c ki = 1),ski is the mean vector and \Sigma ki is the covariance matrix of comp... |

1332 |
An algorithm for vector quantizer design
- Linde, Buzo, et al.
- 1980
(Show Context)
Citation Context ... previously whitened data is known to be diagonal. ML parameter estimation is now done using the Expectation Maximization (EM) algorithm [6] combined with a Linde-Buzo-Gray based clustering procedure =-=[7]-=-. Note that we used global variance pooling and a maximum approximation of the EM-algorithm in our experiments. For more information on ML parameter estimation the reader is referred to [1]. 5 Discrim... |

1102 | On combining classifiers
- Kittler, Hatef, et al.
- 1998
(Show Context)
Citation Context ...ed by shifting it into eight directions. This yields nine instances of the same test sample, which are classified separately. We then use classifier combination schemes, in this case the product rule =-=[11]-=-, to come to a final decision for the original test sample. The basic idea behind this method is that we are able to use classifier combination rules (and their benefits) without having to create mult... |

246 |
ªEfficient Pattern Recognition Using a New Transformation Distance,º
- Simard, LeCun, et al.
- 1993
(Show Context)
Citation Context ...o information, we use each pixel as a feature, yielding a 256-dimensional feature vector. The USPS recognition task is known to be very hard, with a human error rate of about 2.5% on the testing data =-=[2]-=-. For our experiments we created Fig. 1. Example images taken from the USPS database additional virtual training data by shifting each image by one pixel into eight directions. Doing so, we on the one... |

232 |
An inequality with applications to statistical estimation for probabilistic functions of a Markov process and to a model for ecology
- BAUM, EGON
- 1967
(Show Context)
Citation Context ...given class k and ffi k;kn = 1 only if k = kn . For fast but reliable convergence of the MMI criterion, the choice of the iteration constant D is crucial. Although there exists a proof of convergence =-=[10]-=-, the size of the iteration constant guaranteeing convergence yields impractical small stepsizes, i.e. very slow convergence. In practice, fastest convergence is obtained if the iteration constants ar... |

178 |
Pattern Classi and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ... more details on that topic, the reader is referred to [9]). Note that for ease of representation we skip the dimension index d in the following formulae. ̂ ki = ki (x) +Dc ki ki ki (1) +Dc ki =-=(5)-=- ̂2 = P k D( 2 + P i c ki 2 ki ) KD X ki ki (1) +Dc ki KD ̂ 2 ki (6) ĉ ki = ki (1) +Dc ki k (1) +D (7) with iteration constantD. ki (g(x)) and k (g(x)) are discriminative averages of... |

105 | Prior knowledge in support vector kernels
- Schölkopf, Simard, et al.
- 1998
(Show Context)
Citation Context ...base Method Error Rate [%] Human Performance [2] 2.5 Decision Tree C4.5 [13] 16.2 Two-Layer Neural Net [13] 5.9 5-Layer Neural Net (LeNet1) [13] 5.1 Support Vectors [14] 4.0 Invariant Support Vectors =-=[15]-=- 3.0 This work: MMI-Mixtures 4.5 ML-Mixtures 4.5 MMI-Mixtures, Product Rule 3.8 ML-Mixtures, Product Rule 3.6 Since discriminative training methods cannot guarantee convergence under realistic conditi... |

80 |
Boosting performance in neural networks
- Drucker, Schapire, et al.
- 1993
(Show Context)
Citation Context ...a comparison of the training and classification methods used is not possible. Other groups for instance improved the recognition performance by adding 2.500 machine printed digits to the training set =-=[2, 12]-=-. -0.65 -0.6 -0.55 -0.5 -0.45 -0.4 0 2 4 6 8 10 12 iteration index h=4 h=5 h=20 Fig. 2. MMI convergence behaviour for different h (single densities) Table 2. Results reported on the USPS database Meth... |

51 | Comparison of discriminative training criteria
- Schlüter, Macherey
- 1998
(Show Context)
Citation Context ...the following reestimation formulae for the meansski , global diagonal variances oe 2 and mixture weights c ki of Gaussian mixture densities (for more details on that topic, the reader is referred to =-=[9]-=-). Note that for ease of representation we skip the dimension index d in the following formulae.ski = \Gamma ki (x) +Dc kiski \Gamma ki (1) +Dc ki (5)soe 2 = P k D(oe 2 + P i c kis2 ki ) KD \Gamma X k... |

30 |
Maximum mutual information estimation of hidden Markov models
- Normandin
- 1996
(Show Context)
Citation Context ...) represent the according class conditional and a priori probabilities. In the following, the a priori probabilities are supposed to be given (see Chapter 4). The maximum mutual information criterion =-=[8]-=- can then be defined by the expression FMMI () = N X n=1 log p(k n )ps(x n jk n ) P K k=1 p(k)ps(x n jk) : (4) That is, the MMI criterion aims to maximize the sum of logarithms of the a posteriori pro... |

28 |
Support Vector Learning. R. Oldenbourg
- Schölkopf
- 1997
(Show Context)
Citation Context .... Results reported on the USPS database Method Error Rate [%] Human Performance [2] 2.5 Decision Tree C4.5 [13] 16.2 Two-Layer Neural Net [13] 5.9 5-Layer Neural Net (LeNet1) [13] 5.1 Support Vectors =-=[14]-=- 4.0 Invariant Support Vectors [15] 3.0 This work: MMI-Mixtures 4.5 ML-Mixtures 4.5 MMI-Mixtures, Product Rule 3.8 ML-Mixtures, Product Rule 3.6 Since discriminative training methods cannot guarantee ... |

20 |
On Combining Classi
- Kittler
- 1998
(Show Context)
Citation Context ...lied by shifting it into eight directions. This yields nine instances of the same test sample, which are classied separately. We then use classier combination schemes, in this case the product rule =-=[11]-=-, to come to asnal decision for the original test sample. The basic idea behind this method is that we are able to use classier combination rules (and their benets) without having to create multiple... |

3 |
Objektklassi mit Mischverteilungen
- Dahmen, Beulen, et al.
(Show Context)
Citation Context ...ose reported by other groups, proving them to be state-of-the-art. 1 Introduction In the last few years, the use of Gaussian mixture densities for image object recognition proved to be very efficient =-=[1]-=-. On well known object recognition tasks such as the USPS handwritten digits database, we obtained results that are very well comparable or even superior to results reported using support vector machi... |