## Convexity, Maximum Likelihood and All That (1996)

Citations: | 4 - 0 self |

### BibTeX

@MISC{Berger96convexity,maximum,

author = {Adam Berger},

title = {Convexity, Maximum Likelihood and All That},

year = {1996}

}

### Years of Citing Articles

### OpenURL

### Abstract

This note is meant as a gentle but comprehensive introduction to the expectationmaximization (EM) and improved iterative scaling (IIS) algorithms, two popular techniques in maximum likelihood estimation. The focus in this tutorial is on the foundation common to the two algorithms: convex functions and their convenient properties. Where examples are called for, we draw from applications in human language technology. 1 Introduction The task is to characterize the behavior of a real or imaginary stochastic process. By "stochastic process," we mean something which generates a sequence of observable output values. These values can be viewed as a discrete time series. We denote a single observation by y, a random variable which takes on values in some alphabet Y. The modelling problem is to come up with an accurate (in a sense made precise later) model p(y) of the process. If the identity of y is influenced by some conditioning information x 2 X , then we might seek instead a conditional m...