## On the Convergence of Monte Carlo Maximum Likelihood Calculations (1992)

Venue: | Journal of the Royal Statistical Society B |

Citations: | 69 - 5 self |

### BibTeX

@ARTICLE{Geyer92onthe,

author = {Charles J. Geyer},

title = {On the Convergence of Monte Carlo Maximum Likelihood Calculations},

journal = {Journal of the Royal Statistical Society B},

year = {1992},

volume = {56},

pages = {261--274}

}

### Years of Citing Articles

### OpenURL

### Abstract

Monte Carlo maximum likelihood for normalized families of distributions (Geyer and Thompson, 1992) can be used for an extremely broad class of models. Given any family f h ` : ` 2 \Theta g of nonnegative integrable functions, maximum likelihood estimates in the family obtained by normalizing the the functions to integrate to one can be approximated by Monte Carlo, the only regularity conditions being a compactification of the parameter space such that the the evaluation maps ` 7! h ` (x) remain continuous. Then with probability one the Monte Carlo approximant to the log likelihood hypoconverges to the exact log likelihood, its maximizer converges to the exact maximum likelihood estimate, approximations to profile likelihoods hypoconverge to the exact profile, and level sets of the approximate likelihood (support regions) converge to the exact sets (in Painlev'e-Kuratowski set convergence). The same results hold when there are missing data (Thompson and Guo, 1991, Gelfand and Carlin, 19...

### Citations

3964 | Convex Analysis
- Rockafellar
- 1996
(Show Context)
Citation Context ... and Wets, forthcoming, Propositions 3C.21 and 3C.22). For the same reason the compactness assumption in Theorem 5 can be dropped. Finally, (14) is automatically true for any level below the maximum (=-=Rockafellar, 1970-=-, Theorem 7.6). So (15) and (16) hold for ff ! sup l. 3 Asymptotic Normality Asymptotic normality of p n( ` n \Gamma `) is very similar to the asymptotics of maximum likelihood. Theorem 7 Suppose the ... |

1416 |
Monte Carlo sampling methods using Markov chains and their applications
- Hastings
- 1970
(Show Context)
Citation Context ...operties. For arbitrary functions h ` realizations X 1 , X 2 , . . . from P ` can be simulated without knowledge of the normalizer c(`) by the Metropolis-Hastings algorithm (Metropolis, et al., 1953; =-=Hastings, 1970-=-). Moreover, maximum likelihood estimation can be carried out, again without knowledge of the normalizer or its derivatives, using these Monte Carlo simulations (Geyer and Thompson, 1992). Somewhat su... |

936 | Sampling-based approaches to calculating marginal densities - Gelfand, Smith - 1990 |

874 | Markov Chains for Exploring Posterior Distributions (with discussion - Tierney - 1994 |

373 | Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. Pages 169-193 Bayesian Statistics 4
- Geweke
- 1992
(Show Context)
Citation Context ...X t=\Gamma1 fl(t) (20) Both factors in (20) can be estimated, c(`)=c(/) by the denominator in (19), and the sum by standard time series methods (for a review see Geyer, 1993; see also Hastings, 1970; =-=Geweke, 1992-=-; Han, 1991; and Green and Han, 1992). 4 Discussion `Normalized families of densities' are an important class of statistical models. We now have two interesting properties that hold for the whole clas... |

279 |
General irreducible Markov chains and non-negative operators
- Nummelin
- 1984
(Show Context)
Citation Context .... Condition (d) is hard, if Markov chain Monte Carlo is being used for the simulations, because it involves a Markov chain central limit theorem. General Markov chain central limit theorems do exist (=-=Nummelin, 1984-=-; Kipnis and Varadhan, 1986), but can be difficult to apply in practice, except when the state space is finite and the CLT is automatic (Chung, 1967, p. 99 ff.) The Kipnis-Varadahn theorem is the simp... |

249 |
Variational Convergence for Functional and Operator
- Attouch
- 1984
(Show Context)
Citation Context ... the log likelihood (7) hypoconverges, and the proof of Theorem 2 shows that the first term simultaneously epiconverges and hypoconverges. The sum thus hypoconverges (see the proof of Theorem 2.15 in =-=Attouch, 1984-=-). 2 2.2 Convergence of the MLE Calculation Theorem 4 If l n h ! l with probability one, if a sequence f ` n g satisfies l n ( ` n )ssup `2\Theta l n (`) \Gamma ffl n with ffl n ! 0, and if f ` n g is... |

226 |
Constrained Monte Carlo Maximum Likelihood for Dependent Data
- Geyer, Thompson
- 1992
(Show Context)
Citation Context ...992 Revised September 24, 1992 1 Research supported in part by grant DMS-9007833 from the National Science Foundation Abstract Monte Carlo maximum likelihood for normalized families of distributions (=-=Geyer and Thompson, 1992-=-) can be used for an extremely broad class of models. Given any family f h ` : ` 2 \Theta g of nonnegative integrable functions, maximum likelihood estimates in the family obtained by normalizing the ... |

204 |
Markov Chains with Stationary Transition Probabilities
- Chung
- 1967
(Show Context)
Citation Context ...al Markov chain central limit theorems do exist (Nummelin, 1984; Kipnis and Varadhan, 1986), but can be difficult to apply in practice, except when the state space is finite and the CLT is automatic (=-=Chung, 1967-=-, p. 99 ff.) The Kipnis-Varadahn theorem is the simplest for general state spaces, requiring only reversibility and summability of the autocovariances. A Metropolis-Hastings algorithm can always be ar... |

204 |
Finding the observed information matrix when using the EM algorithm
- Louis
- 1982
(Show Context)
Citation Context ...LE, since the Monte Carlo approximates the whole likelihood surface. The use of (17) to approximate the observed Fisher information may be useful in problems where analytical methods (Sundberg, 1974; =-=Louis, 1982-=-) are intractable. It is especially useful in conjunction with Monte Carlo EM (Tanner and Wei, 1990; Guo and Thompson, 1992), but may also be a competitor for the SEM algorithm (Meng and Rubin, 1991).... |

202 |
A Monte Carlo Implementation of the EM Algorithm and the Poor Man’s Data Augmentation Algorithms
- Wei, Tanner
- 1990
(Show Context)
Citation Context ...proximate the observed Fisher information may be useful in problems where analytical methods (Sundberg, 1974; Louis, 1982) are intractable. It is especially useful in conjunction with Monte Carlo EM (=-=Tanner and Wei, 1990-=-; Guo and Thompson, 1992), but may also be a competitor for the SEM algorithm (Meng and Rubin, 1991). Acknowledgements Conversations with Elizabeth Thompson, Julian Besag, and Michael Newton helped ch... |

199 |
Central limit theorem for additive functionals of reversible Markov processes and applications to simple exclusions
- Kipnis, Varadhan
- 1986
(Show Context)
Citation Context ...is hard, if Markov chain Monte Carlo is being used for the simulations, because it involves a Markov chain central limit theorem. General Markov chain central limit theorems do exist (Nummelin, 1984; =-=Kipnis and Varadhan, 1986-=-), but can be difficult to apply in practice, except when the state space is finite and the CLT is automatic (Chung, 1967, p. 99 ff.) The Kipnis-Varadahn theorem is the simplest for general state spac... |

186 | On the statistical analysis of dirty pictures (with discussion - Besag - 1986 |

105 |
Practical Markov chain Monte Carlo (with discussion
- Geyer
- 1992
(Show Context)
Citation Context ... Varadhan, 1986) A = c(`) 2 c(/) 2 +1 X t=\Gamma1 fl(t) (20) Both factors in (20) can be estimated, c(`)=c(/) by the denominator in (19), and the sum by standard time series methods (for a review see =-=Geyer, 1993-=-; see also Hastings, 1970; Geweke, 1992; Han, 1991; and Green and Han, 1992). 4 Discussion `Normalized families of densities' are an important class of statistical models. We now have two interesting ... |

98 | Consistency of the maximum likelihood estimator in the presence of in¯nitely many incidental parameters - Kiefer, Wolfowitz - 1956 |

87 |
Using EM to obtain asymptotic variancecovariance matrices: The SEM algorithm
- Meng, Rubin
- 1991
(Show Context)
Citation Context ...erg, 1974; Louis, 1982) are intractable. It is especially useful in conjunction with Monte Carlo EM (Tanner and Wei, 1990; Guo and Thompson, 1992), but may also be a competitor for the SEM algorithm (=-=Meng and Rubin, 1991-=-). Acknowledgements Conversations with Elizabeth Thompson, Julian Besag, and Michael Newton helped change my focus from exponential families to the general `normalized families' of Section 1. The whol... |

77 | Covariance structure and convergence rate of the Gibbs sampler with various scans - Liu, Wong, et al. - 1995 |

51 |
Metropolis methods, Gaussian proposals and antithetic variables
- Green, Han
- 1992
(Show Context)
Citation Context ...actors in (20) can be estimated, c(`)=c(/) by the denominator in (19), and the sum by standard time series methods (for a review see Geyer, 1993; see also Hastings, 1970; Geweke, 1992; Han, 1991; and =-=Green and Han, 1992-=-). 4 Discussion `Normalized families of densities' are an important class of statistical models. We now have two interesting properties that hold for the whole class. The MetropolisHastings algorithm ... |

50 | Spectral Analysis of Time Series - Priestly - 1981 |

24 | Asymptotic behavior of the Gibbs sampler - Chan - 1993 |

23 | On the irreducibility of a Markov chain defined on a space of genotype configurations by a sampling scheme. Biometrics - Sheehan, Thomas - 1993 |

23 |
Maximum likelihood theory for incomplete data from an exponential family
- Sundberg
- 1974
(Show Context)
Citation Context ... calculate the MLE, since the Monte Carlo approximates the whole likelihood surface. The use of (17) to approximate the observed Fisher information may be useful in problems where analytical methods (=-=Sundberg, 1974-=-; Louis, 1982) are intractable. It is especially useful in conjunction with Monte Carlo EM (Tanner and Wei, 1990; Guo and Thompson, 1992), but may also be a competitor for the SEM algorithm (Meng and ... |

20 |
Monte Carlo estimation of mixed models for large complex pedigrees
- Guo, Thompson
- 1994
(Show Context)
Citation Context ... Fisher information may be useful in problems where analytical methods (Sundberg, 1974; Louis, 1982) are intractable. It is especially useful in conjunction with Monte Carlo EM (Tanner and Wei, 1990; =-=Guo and Thompson, 1992-=-), but may also be a competitor for the SEM algorithm (Meng and Rubin, 1991). Acknowledgements Conversations with Elizabeth Thompson, Julian Besag, and Michael Newton helped change my focus from expon... |

19 |
Maximum-likelihood estimation for constrained or missingdata models. The Canadian
- Gelfand, Carlin
- 1993
(Show Context)
Citation Context ...l sets of the approximate likelihood (support regions) converge to the exact sets (in Painlev'e-Kuratowski set convergence). The same results hold when there are missing data (Thompson and Guo, 1991, =-=Gelfand and Carlin, 1991-=-) if a Wald-type integrability condition is satisfied. Asymptotic normality of the Monte Carlo error and convergence of the Monte Carlo approximation to the observed Fisher information are also shown.... |

14 |
Evaluation of likelihood ratios for complex genetic models
- Thompson, Guo
- 1991
(Show Context)
Citation Context ... exact profile, and level sets of the approximate likelihood (support regions) converge to the exact sets (in Painlev'e-Kuratowski set convergence). The same results hold when there are missing data (=-=Thompson and Guo, 1991-=-, Gelfand and Carlin, 1991) if a Wald-type integrability condition is satisfied. Asymptotic normality of the Monte Carlo error and convergence of the Monte Carlo approximation to the observed Fisher i... |

9 | Reweighting Monte Carlo Mixtures - Geyer - 1991 |

8 | Examples of inconsistency of maximum likelihood estimates - Bahadur - 1958 |

4 | Wets, A characterization of epi-convergence in terms of convergence of level sets - Beer, Rockafellar, et al. - 1992 |

4 |
Analysis of relatedness in the California condors, from DNA fingerprints
- Geyer, Ryder, et al.
- 1993
(Show Context)
Citation Context ...egrability condition seems to be required to assure convergence. This double sampling is necessary only when the first term in (6) cannot be calculated exactly. When it can be, it is better to do so (=-=Geyer et al., 1993-=-). Then the situation is the same as in Section 1.1. No Wald-type condition is needed for convergence. 2 Likelihood Convergence 2.1 Hypoconvergence of the Monte Carlo Likelihood Our treatment of the c... |

4 | On the Convergence Rate of Successive Substitution Sampling - Schervish, Carlin - 1992 |

1 |
Spectral window estimation of integrated autocorrelation time
- Han
- 1991
(Show Context)
Citation Context ...(t) (20) Both factors in (20) can be estimated, c(`)=c(/) by the denominator in (19), and the sum by standard time series methods (for a review see Geyer, 1993; see also Hastings, 1970; Geweke, 1992; =-=Han, 1991-=-; and Green and Han, 1992). 4 Discussion `Normalized families of densities' are an important class of statistical models. We now have two interesting properties that hold for the whole class. The Metr... |