## Maximum Entropy and Gaussian Models for Image Object Recognition (2002)

### Cached

### Download Links

- [www-i6.informatik.rwth-aachen.de]
- [www-i6.informatik.rwth-aachen.de]
- [www.keysers.net]
- DBLP

### Other Repositories/Bibliography

Venue: | In Pattern Recognition, 24th DAGM Symposium |

Citations: | 18 - 9 self |

### BibTeX

@INPROCEEDINGS{Keysers02maximumentropy,

author = {Daniel Keysers and Franz Josef Och and Hermann Ney},

title = {Maximum Entropy and Gaussian Models for Image Object Recognition},

booktitle = {In Pattern Recognition, 24th DAGM Symposium},

year = {2002},

pages = {498--506},

publisher = {Springer Verlag}

}

### Years of Citing Articles

### OpenURL

### Abstract

The principle of maximum entropy is a powerful framework that can be used to estimate class posterior probabilities for pattern recognition tasks. In this paper, we show how this principle is related to the discriminative training of Gaussian mixture densities using the maximum mutual information criterion. This leads to a relaxation of the constraints on the covariance matrices to be positive (semi-)definite.

### Citations

429 |
Generalized iterative scaling for log-linear models
- Darroch, Ratchli
- 1972
(Show Context)
Citation Context ...hat compute the global maximum of the log probability (2) given a training set. These algorithms fall into two categories: On the one hand, we have an algorithm known as generalized iterative scaling =-=[4]-=- and related algorithms that can be proven to converge to the global maximum. On the other hand, due to the convex nature of the criterion (2), we can also use general optimization strategies as e.g. ... |

259 | Using maximum entropy for text classification
- Nigam
- 1999
(Show Context)
Citation Context ...les applied in the natural sciences. It has been applied to the estimation of probability distributions [6] and to classification tasks such as natural language processing [1] and text classification =-=[8]-=-. The contributions of this paper are – to show the relation between maximum entropy and Gaussian models, – to present a framework that allows to estimate a large number of parameters reliably, e.g. t... |

238 |
Efficient pattern recognition using a new transformation distance
- Simard, LeCun, et al.
- 1993
(Show Context)
Citation Context ...e models. The error Table 1. Summary of results for the USPS corpus (error rates, [%]). : training set extended with 2,400 machine-printed digits method ER[%] human performance [Simard et al. 1993] [=-=1-=-4] 2.5 relevance vector machine [Tipping et al. 2000] [15] 5.1 neural net (LeNet1) [LeCun et al. 1990] [13] 4.2 support vectors [Sch olkopf 1997] [11] 4.0 invariant support vectors [Sch olkopf et al. ... |

215 | Variational relevance vector machines
- Bishop, Tipping
- 2000
(Show Context)
Citation Context ...USPS corpus (error rates, [%]). : training set extended with 2,400 machine-printed digits method ER[%] human performance [Simard et al. 1993] [14] 2.5 relevance vector machine [Tipping et al. 2000] [=-=1-=-5] 5.1 neural net (LeNet1) [LeCun et al. 1990] [13] 4.2 support vectors [Sch olkopf 1997] [11] 4.0 invariant support vectors [Sch olkopf et al. 1998] [12] 3.0 neural net + boosting [Drucker et al. 199... |

127 | Transformation invariance in pattern recognition-tangent distance and tangent propagation
- Simard, LeCun, et al.
- 1996
(Show Context)
Citation Context ...extended with 2,400 machine-printed digits method ER[%] human performance [Simard et al. 1993] [14] 2.5 relevance vector machine [Tipping et al. 2000] [15] 5.1 neural net (LeNet1) [LeCun et al. 1990] =-=[-=-13] 4.2 support vectors [Sch olkopf 1997] [11] 4.0 invariant support vectors [Sch olkopf et al. 1998] [12] 3.0 neural net + boosting [Drucker et al. 1993] [13] 2.6 tangent distance [Simard et al. 199... |

122 | Maximum entropy discrimination
- JAAKKOLA, MEILA, et al.
- 1999
(Show Context)
Citation Context ...onstraints on the covariance matrices to be positive (semi-) denite. Therefore, the resulting model is not exactly equivalent to a Gaussian model. This result is in contrast to the approach taken in [=-=5]-=-, where the authors derive discriminative models for Gaussian densities based on priors of the parameters and the minimum relative entropy principle. Their solution results in discriminatively trained... |

97 | Prior knowledge in support vector kernels. In
- Schölkopf, Simard, et al.
- 1998
(Show Context)
Citation Context ...elevance vector machine [Tipping et al. 2000] [15] 5.1 neural net (LeNet1) [LeCun et al. 1990] [13] 4.2 support vectors [Sch olkopf 1997] [11] 4.0 invariant support vectors [Sch olkopf et al. 1998] [12] 3.0 neural net + boosting [Drucker et al. 1993] [13] 2.6 tangent distance [Simard et al. 1993] [14] 2.5 nearest neighbor classier [7] 5.6 mixture densities [2] baseline 7.2 + LDA + virtual data ... |

39 | Experiments with an extended tangent distance
- Keysers, Dahmen, et al.
- 2000
(Show Context)
Citation Context ...11] 4.0 invariant support vectors [Sch olkopf et al. 1998] [12] 3.0 neural net + boosting [Drucker et al. 1993] [13] 2.6 tangent distance [Simard et al. 1993] [14] 2.5 nearest neighbor classier [7] 5.6 mixture densities [2] baseline 7.2 + LDA + virtual data 3.4 kernel densities [7] baseline 5.5 + tangent vectors + virtual data 2.4 rates show that we can already gain recognition accuracy by usin... |

29 |
Support Vector Learning. R. Oldenbourg
- Schölkopf
- 1997
(Show Context)
Citation Context ...ethod ER[%] human performance [Simard et al. 1993] [14] 2.5 relevance vector machine [Tipping et al. 2000] [15] 5.1 neural net (LeNet1) [LeCun et al. 1990] [13] 4.2 support vectors [Sch olkopf 1997] [11] 4.0 invariant support vectors [Sch olkopf et al. 1998] [12] 3.0 neural net + boosting [Drucker et al. 1993] [13] 2.6 tangent distance [Simard et al. 1993] [14] 2.5 nearest neighbor classier [7]... |

28 |
Maximum mutual information estimation of hidden Markov models
- Normandin
- 1996
(Show Context)
Citation Context ...raining, since the information of out-of-class data is used. This criterion is often referred to as mutual information criterion in speech recognition, information theory and image object recognition [3, 9]. We will regard Gaussian models for the class conditional distributions: p(xjk) = N (xj k ; k ) = det(2 k ) 1 2 exp 1 2 (x k ) T 1 k (x k ) (3) The free parameters of these models ar... |

16 |
Della Pietra, “A Maximum Entropy Approach to Natural Language Processing
- Berger, Pietra, et al.
- 1996
(Show Context)
Citation Context ...framework is based on principles applied in the natural sciences. It has been applied to the estimation of probability distributions [6] and to classication tasks such as natural language processing [=-=1]-=- and text classi cation [8]. The contributions of this paper are { to show the relation between maximum entropy and Gaussian models, { to present a framework that allows to estimate a large number of ... |

16 | Güld - Statistical Image Object Recognition using Mixture Densities
- Dahmen, Keysers, et al.
- 2001
(Show Context)
Citation Context ...vectors [Sch olkopf et al. 1998] [12] 3.0 neural net + boosting [Drucker et al. 1993] [13] 2.6 tangent distance [Simard et al. 1993] [14] 2.5 nearest neighbor classier [7] 5.6 mixture densities [2] baseline 7.2 + LDA + virtual data 3.4 kernel densities [7] baseline 5.5 + tangent vectors + virtual data 2.4 rates show that we can already gain recognition accuracy by using the maximum entropy fram... |

7 |
Using maximum entropy for text classi
- Nigam, Laerty
- 1999
(Show Context)
Citation Context ...iples applied in the natural sciences. It has been applied to the estimation of probability distributions [6] and to classication tasks such as natural language processing [1] and text classi cation [=-=8]-=-. The contributions of this paper are { to show the relation between maximum entropy and Gaussian models, { to present a framework that allows to estimate a large number of parameters reliably, e.g. t... |

2 |
Ney: Discriminative Training of Gaussian Mixture Densities for Image Object Recognition
- Dahmen, Schlüter, et al.
- 1999
(Show Context)
Citation Context ...raining, since the information of out-of-class data is used. This criterion is often referred to as mutual information criterion in speech recognition, information theory and image object recognition [3, 9]. We will regard Gaussian models for the class conditional distributions: p(xjk) = N (xj k ; k ) = det(2 k ) 1 2 exp 1 2 (x k ) T 1 k (x k ) (3) The free parameters of these models ar... |

1 |
On the Rationale of Maximum Entropy Models
- Jaynes
- 1982
(Show Context)
Citation Context ...ritten digits recognition task. 1 Introduction The maximum entropy framework is based on principles applied in the natural sciences. It has been applied to the estimation of probability distributions =-=[6-=-] and to classication tasks such as natural language processing [1] and text classi cation [8]. The contributions of this paper are { to show the relation between maximum entropy and Gaussian models, ... |