@MISC{Elkan13maximumlikelihood,, author = {Charles Elkan}, title = {Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training}, year = {2013} }
Share
OpenURL
Abstract
Consider a family of probability distributions defined by a set of parameters θ. The distributions may be either probability mass functions (pmfs) or probability density functions (pdfs). Suppose that we have a random sample drawn from a fixed but unknown member of this family. The random sample is a training set of n examples x1 to xn. An example may also be called an observation, an outcome, an instance, or a data point. In general each xj is a vector of values, and θ is a vector of real-valued parameters. For example, for a Gaussian distribution θ = 〈µ, σ2 〉. We assume that the examples are independent, so the probability of the set is the product of the probabilities of the individual examples: f(x1,..., xn; θ) = ∏ fθ(xj; θ). The notation above makes us think of the distribution θ as fixed and the examples xj as unknown, or varying. However, we can think of the training data as fixed and consider alternative parameter values. This is the point of view behind the