## A tutorial on MM algorithms (2004)

Venue: | Amer. Statist |

Citations: | 66 - 3 self |

### BibTeX

@ARTICLE{Hunter04atutorial,

author = {David R. Hunter and Kenneth Lange and Departments Of Biomathematics and Human Genetics},

title = {A tutorial on MM algorithms},

journal = {Amer. Statist},

year = {2004},

pages = {30--37}

}

### Years of Citing Articles

### OpenURL

### Abstract

Most problems in frequentist statistics involve optimization of a function such as a likelihood or a sum of squares. EM algorithms are among the most effective algorithms for maximum likelihood estimation because they consistently drive the likelihood uphill by maximizing a simple surrogate function for the loglikelihood. Iterative optimization of a surrogate function as exemplified by an EM algorithm does not necessarily require missing data. Indeed, every EM algorithm is a special case of the more general class of MM optimization algorithms, which typically exploit convexity rather than missing data in majorizing or minorizing an objective function. In our opinion, MM algorithms deserve to part of the standard toolkit of professional statisticians. The current article explains the principle behind MM algorithms, suggests some methods for constructing them, and discusses some of their attractive features. We include numerous examples throughout the article to illustrate the concepts described. In addition to surveying previous work on MM algorithms, this article introduces some new material on constrained optimization and standard error estimation. Key words and phrases: constrained optimization, EM algorithm, majorization, minorization, Newton-Raphson 1 1

### Citations

8090 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...blems must be solved numerically. In the current article, we discuss an optimization method that typically relies on convexity arguments and is a generalization of the well-known EM algorithm method (=-=Dempster et al., 1977-=-; McLachlan and Krishnan, 1997). We call any algorithm based on this iterative method an MM algorithm. To our knowledge, the general principle behind MM algorithms was first enunciated by the numerica... |

1827 |
Robust statistics
- Huber
- 1981
(Show Context)
Citation Context ...st from the statistical community set off by the Dempster et al. (1977) paper, steady development of MM algorithms has continued. The MM principle reappears, among other places, in robust regression (=-=Huber, 1981-=-), in correspondence analysis (Heiser, 1987), in the quadratic lower bound principle of Böhning and Lindsay (1988), in the psychometrics literature on least squares (Bijleveld and de Leeuw, 1991; Kier... |

942 |
The EM Algorithm and Extensions
- Mclachlan, Krishnan
- 1996
(Show Context)
Citation Context ...merically. In the current article, we discuss an optimization method that typically relies on convexity arguments and is a generalization of the well-known EM algorithm method (Dempster et al., 1977; =-=McLachlan and Krishnan, 1997-=-). We call any algorithm based on this iterative method an MM algorithm. To our knowledge, the general principle behind MM algorithms was first enunciated by the numerical analysts Ortega and Rheinbol... |

749 | Applied logistic regression - Hosmer, Lemeshow - 1989 |

510 | Iterative Solution of Nonlinear Equations - Ortega, Rheinboldt - 1970 |

361 | Regression quantiles - Koenker, Bassett - 1978 |

177 | Maximum likelihood estimation via the ECM algorithm: A general framework - Meng, Rubin - 1993 |

120 |
Optimization transfer using surrogate objective functions (with discussion
- Lange, Hunter, et al.
- 2000
(Show Context)
Citation Context ...MM algorithms. Furthermore, there are several methods for accelerating EM algorithms that are also applicable to accelerating MM algorithms (Heiser, 1995; Lange, 1995b; Jamshidian and Jennrich, 1997; =-=Lange et al., 2000-=-). Although this survey article necessarily reports much that is already known, there are some new results here. Our MM treatment of constrained optimization in Section 7 is more general than previous... |

78 | A Modified Expectation Maximization Algorithm for Penalized Likelihood Estimation in Emission Tomography - Pierro - 1993 |

63 | Using EM to obtain asymptotic variancecovariance matrices: the SEM algorithm - Meng, Rubin - 1991 |

61 |
A gradient algorithm locally equivalent to the EM algorithm
- Lange
- 1995
(Show Context)
Citation Context ...great deal known about the convergence properties 23of MM algorithms that is too mathematically demanding to present here. Fortunately, almost all results from the EM algorithm literature (Wu, 1983; =-=Lange, 1995-=-a; McLachlan and Krishnan, 1997; Lange, 1999) carry over without change to MM algorithms. Furthermore, there are several methods for accelerating EM algorithms that are also applicable to accelerating... |

48 |
Numerical analysis for statisticians
- Lange
- 1999
(Show Context)
Citation Context ...rties 23of MM algorithms that is too mathematically demanding to present here. Fortunately, almost all results from the EM algorithm literature (Wu, 1983; Lange, 1995a; McLachlan and Krishnan, 1997; =-=Lange, 1999-=-) carry over without change to MM algorithms. Furthermore, there are several methods for accelerating EM algorithms that are also applicable to accelerating MM algorithms (Heiser, 1995; Lange, 1995b; ... |

39 | Globally convergent algorithms for maximum a posteriori transmission tomography
- Lange, Fessler
- 1995
(Show Context)
Citation Context ...wer bound principle of Böhning and Lindsay (1988), in the psychometrics literature on least squares (Bijleveld and de Leeuw, 1991; Kiers and Ten Berge, 1992), and in medical imaging (De Pierro, 1995; =-=Lange and Fessler, 1995-=-). The recent survey articles of de Leeuw (1994), Heiser (1995), Becker et al. (1997), and Lange et al. (2000) deal with the general principle, but it is not until the rejoinder of Hunter and Lange (2... |

37 |
Convergent computation by iterative majorization; theory and applications in multidimensional data analysis
- Heiser
- 1997
(Show Context)
Citation Context ...ln b(X)] , which is sometimes known as the information inequality. It is this inequality that guarantees that a minorizing function is constructed in the E-step of any EM algo8rithm (de Leeuw, 1994; =-=Heiser, 1995-=-), making every EM algorithm an MM algorithm. 3.2 Minorization via Supporting Hyperplanes Jensen’s inequality is easily derived from the supporting hyperplane property of a convex function: Any linear... |

30 | The majorization approach to multidimensional scaling: some problems and extensions - Groenen - 1993 |

29 | MM algorithms for generalized Bradley-Terry models - Hunter - 2004 |

28 | Block-relaxation algorithms in statistics - Leeuw - 1994 |

21 | Normal/Independent Distributions and Their Applications in Robust Regression - Lange, Sinsheimer - 1993 |

20 |
A quasi-Newtonian acceleration of the EM algorithm
- Lange
- 1995
(Show Context)
Citation Context ...great deal known about the convergence properties 23of MM algorithms that is too mathematically demanding to present here. Fortunately, almost all results from the EM algorithm literature (Wu, 1983; =-=Lange, 1995-=-a; McLachlan and Krishnan, 1997; Lange, 1999) carry over without change to MM algorithms. Furthermore, there are several methods for accelerating EM algorithms that are also applicable to accelerating... |

17 | An Alternative Technique for Absolute Deviations Curve Fitting - Schlossmacher - 1973 |

16 | Direct calculation of the information matrix via the EM algorithm - Oakes - 1999 |

11 | Quantile regression via an MM algorithm - Hunter, Lange - 2000 |

11 | Modelling association football scores - Maher - 1982 |

10 | Convergence of correction matrix algorithms for multidimensional scaling - Leeuw, Heiser - 1977 |

9 |
Monotonicity of Quadratic Approximation Algorithms
- Böhning, Lindsay
- 1988
(Show Context)
Citation Context ...per Bound If a convex function κ(θ) is twice differentiable and has bounded curvature, then we can majorize κ(θ) by a quadratic function with sufficiently high curvature and tangent to κ(θ) at θ (m) (=-=Böhning and Lindsay, 1988-=-). In algebraic terms, if we can find a positive definite matrix M such that M − ∇ 2 κ(θ) is nonnegative definite for all θ, then κ(θ) ≤ κ(θ (m) ) + ∇κ(θ (m) ) t (θ − θ (m) ) + 1 2 (θ − θ(m) ) t M(θ −... |

8 | Rejoinder to discussion of “Optimization transfer using surrogate objective functions - Hunter, Lange - 2000 |

6 | de Leeuw J - Michailidis - 1998 |

6 |
Correspondence analysis with least absolute residuals
- Heiser
- 1987
(Show Context)
Citation Context ...by the Dempster et al. (1977) paper, steady development of MM algorithms has continued. The MM principle reappears, among other places, in robust regression (Huber, 1981), in correspondence analysis (=-=Heiser, 1987-=-), in the quadratic lower bound principle of Böhning and Lindsay (1988), in the psychometrics literature on least squares (Bijleveld and de Leeuw, 1991; Kiers and Ten Berge, 1992), and in medical imag... |

4 |
Quasi-Newton acceleration of the EM algorithm
- Jamshidian, Jennrich
- 1997
(Show Context)
Citation Context ...) carry over without change to MM algorithms. Furthermore, there are several methods for accelerating EM algorithms that are also applicable to accelerating MM algorithms (Heiser, 1995; Lange, 1995b; =-=Jamshidian and Jennrich, 1997-=-; Lange et al., 2000). Although this survey article necessarily reports much that is already known, there are some new results here. Our MM treatment of constrained optimization in Section 7 is more g... |

4 |
An adaptive barrier method for convex programming
- Lange
- 1994
(Show Context)
Citation Context ...parameters are often required to be nonnegative. Here we discuss a majorization technique that in a sense eliminates inequality constraints. For this adaptive barrier method (Censor and Zenios, 1992; =-=Lange, 1994-=-) to work, an initial point θ (0) must be selected with all inequality constraints strictly satisfied. The barrier method confines subsequent iterates to the interior of the parameter space but allows... |

3 |
EM algorithms without missing data, Stat. Methods Med
- Becker, Yang, et al.
- 1997
(Show Context)
Citation Context ...nequalities (9) and (10) have been used to construct MM algorithms in the contexts of medical imaging (De Pierro, 1995; Lange and Fessler, 1995) and least-squares estimation without matrix inversion (=-=Becker et al., 1997-=-). 3.4 Majorization via a Quadratic Upper Bound If a convex function κ(θ) is twice differentiable and has bounded curvature, then we can majorize κ(θ) by a quadratic function with sufficiently high cu... |

2 |
Proximal minimization with D-functions
- Censor, Zenios
- 1992
(Show Context)
Citation Context ...parameters. For example, parameters are often required to be nonnegative. Here we discuss a majorization technique that in a sense eliminates inequality constraints. For this adaptive barrier method (=-=Censor and Zenios, 1992-=-; Lange, 1994) to work, an initial point θ (0) must be selected with all inequality constraints strictly satisfied. The barrier method confines subsequent iterates to the interior of the parameter spa... |

2 | Lange K (2002) Genomewide motif identification using a dictionary model - Sabatti |

1 | 2002), Computing estimates in the proportional odds - Hunter, Lange |

1 | 2002), A connection between variable selection and EMtype algorithms, Pennsylvania State University statistics department technical report 0201 - Hunter, Li |

1 | Ten Berge - Kiers, F - 1992 |