## The Generalized CEM Algorithm (1999)

Venue: | In Advances in Neural Information Processing Systems 12 |

Citations: | 4 - 0 self |

### BibTeX

@INPROCEEDINGS{Jebara99thegeneralized,

author = {Tony Jebara and Alex Pentland},

title = {The Generalized CEM Algorithm},

booktitle = {In Advances in Neural Information Processing Systems 12},

year = {1999},

publisher = {MIT Press}

}

### OpenURL

### Abstract

We propose a general approach for estimating the parameters of latent variable probability models to maximize conditional likelihood and discriminant criteria. Unlike joint likelihood, these objectives are better suited for classification and regression. The approach utilizes and extends the previously introduced CEM framework (Conditional Expectation Maximization), which reformulates EM to handle the conditional likelihood case. We generalize the CEM algorithm to estimate any mixture of exponential family densities. This includes structured graphical models over exponential families, such as HMMs. The algorithm efficiently takes advantage of the factorization of the underlying graph. In addition, the new CEM bound is tighter and more rigorous than the original one. The final result is a CEM algorithm that mirrors the EM algorithm where both estimate a variational lower bound on their respective incomplete objective functions, and both generate the same standard M-steps ...

### Citations

9002 | The Nature of Statistical Learning Theory
- Vapnik
- 1995
(Show Context)
Citation Context ...e machine learning community and its application domains have seen the proliferation of conditional or discriminative criteria for classification and regression. For instance, support vector machines =-=[7]-=- have generated competitive classifier systems and are being combined with probabilistic models [3]. In the speech community, discriminatively trained HMMs minimize classification error for superior p... |

8134 | Maximum likelihood from incomplete data via the em algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...can be easily and convergently optimized for a large class of latent variable probability densities and graphical models. These otherwise intractable models are lower bounded and decoupled via the EM =-=[2]-=- algorithm which generates complete data and simple M-steps. Since the EM algorithm is limited to ML and MAP, proponents of other criteria must resort to gradient algorithms [6], or second order metho... |

4848 |
Neural Networks for Pattern Recognition
- Bishop
- 1995
(Show Context)
Citation Context ...robabilistic regressors after maximizing their conditional likelihood [5]. Even traditional neural networks employ a least-squares objective function on the output, emphasizing prediction performance =-=[1]-=-. All these criteria allocate modeling resources with the given task in mind, yielding improved performance. In contrast, under ML and MAP (the canonical criteria of probabilistic models), each densit... |

724 | Hierarchical mixtures of experts and the EM algorithm
- Jordan, Jacobs
- 1994
(Show Context)
Citation Context ..., discriminatively trained HMMs minimize classification error for superior phoneme labeling [6]. Mixtures of experts are used as probabilistic regressors after maximizing their conditional likelihood =-=[5]-=-. Even traditional neural networks employ a least-squares objective function on the output, emphasizing prediction performance [1]. All these criteria allocate modeling resources with the given task i... |

396 | Exploiting generative models in discriminative classiers
- Jaakkola, Haussler
- 1999
(Show Context)
Citation Context ... or discriminative criteria for classification and regression. For instance, support vector machines [7] have generated competitive classifier systems and are being combined with probabilistic models =-=[3]-=-. In the speech community, discriminatively trained HMMs minimize classification error for superior phoneme labeling [6]. Mixtures of experts are used as probabilistic regressors after maximizing thei... |

54 | Maximum conditional likelihood via bound maximization and the CEM algorithm
- Jebara, Pentland
- 1998
(Show Context)
Citation Context ..., and so on. It is thus desirable to find a lower bound like EM which facilitates optimization of conditional or discriminant criteria and generates simple M-steps. Such a lower bound was proposed in =-=[4]-=- as the CEM (Conditional Expectation Maximization) algorithm and used to perform regression 1 with Gaussian mixtures. However CEM's full generality extends beyond that case to a large class of probabi... |

6 |
The trended HMM with discriminative training for phonetic classification
- Rathinavelu, Deng
- 1996
(Show Context)
Citation Context ...competitive classifier systems and are being combined with probabilistic models [3]. In the speech community, discriminatively trained HMMs minimize classification error for superior phoneme labeling =-=[6]-=-. Mixtures of experts are used as probabilistic regressors after maximizing their conditional likelihood [5]. Even traditional neural networks employ a least-squares objective function on the output, ... |