## Quasi-Monte Carlo sampling to improve the efficiency of Monte Carlo EM (2005)

Venue: | Computational Statistics and Data Analysis |

Citations: | 7 - 3 self |

### BibTeX

@ARTICLE{Jank05quasi-montecarlo,

author = {Wolfgang Jank},

title = {Quasi-Monte Carlo sampling to improve the efficiency of Monte Carlo EM},

journal = {Computational Statistics and Data Analysis},

year = {2005},

volume = {48},

pages = {685--701}

}

### OpenURL

### Abstract

In this paper we investigate an efficient implementation of the Monte Carlo EM al-gorithm based on Quasi-Monte Carlo sampling. The Monte Carlo EM algorithm is a stochastic version of the deterministic EM (Expectation-Maximization) algorithm in which an intractable E-step is replaced by a Monte Carlo approximation. Quasi-Monte Carlo methods produce deterministic sequences of points that can significantly improve the accuracy of Monte Carlo approximations over purely random sampling. One draw-back to deterministic Quasi-Monte Carlo methods is that it is generally difficult to determine the magnitude of the approximation error. However, in order to implement the Monte Carlo EM algorithm in an automated way, the ability to measure this error is fundamental. Recent developments of randomized Quasi-Monte Carlo methods can overcome this drawback. We investigate the implementation of an automated, data-driven Monte Carlo EM algorithm based on randomized Quasi-Monte Carlo methods. We apply this algorithm to a geostatistical model of online purchases and find that it can significantly decrease the total simulation effort, thus showing great potential for improving upon the efficiency of the classical Monte Carlo EM algorithm. Key words and phrases: Monte Carlo error; low-discrepancy sequence; Halton sequence; EM algo-rithm; geostatistical model.

### Citations

8080 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Lairdsand, et al.
- 1977
(Show Context)
Citation Context ...lo EM algorithm. Key words and phrases: Monte Carlo error; low-discrepancy sequence; Halton sequence; EM algorithm; geostatistical model. 1s1 Introduction The Expectation-Maximization (EM) algorithm (=-=Dempster et al., 1977-=-) is a popular tool in statistics and many other fields. One limitation to the use of EM is, however, that quite often the E-step of the algorithm involves an analytically intractable, sometimes high ... |

902 | Monte Carlo Statistical Methods
- Robert, Casella
- 1999
(Show Context)
Citation Context ...ecall that by RQMC property 1), every element of a RQMC sequence has a uniform distribution over C d . Let xk be the kth element of a randomized Halton sequence. Using a suitable transformation (e.g. =-=Robert & Casella, 1999-=-), we can generate a d-vector of i.i.d. normal or t variates. Shifting and scaling this vector by µ(θ) and Σ(θ) results in a draw uk from fLap(u|y; θ). Thus, using r independent randomized Halton sequ... |

749 |
Random Number Generation and Quasi-Monte Carlo Methods, Philadelphia: Society for Industrial and Applied
- NIEDERREITER
- 1992
(Show Context)
Citation Context ... methods produce a deterministic sequence of points that provides the best-possible spread in C d . These deterministic sequences are often referred to as low-discrepancy sequences (see, for example, =-=Niederreiter, 1992-=-; Fang & Wang, 1994). A variety of different low-discrepancy sequences exist. Examples include the Halton sequence (Halton, 1960), the Sobol sequence (Sobol, 1967), the Faure sequence (Faure, 1982), a... |

282 |
Approximate inference in generalized linear mixed models
- Breslow, Clayton
- 1993
(Show Context)
Citation Context ...d quality printers or his/her technology readiness, all of which often exhibit strong local variability. Data exhibiting spatial correlation can be modelled using generalized linear mixed models (e.g =-=Breslow & Clayton, 1993-=-). Diggle et al. (1998) refer to these spatial applications of generalized linear mixed models as “model based geostatistics.” These spatial mixed models are challenging from a computational point of ... |

177 | Maximum likelihood estimation via the ECM algorithm: A general framework - Meng, Rubin - 1993 |

159 |
On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals
- Halton
- 1960
(Show Context)
Citation Context ...often referred to as low-discrepancy sequences (see, for example, Niederreiter, 1992; Fang & Wang, 1994). A variety of different low-discrepancy sequences exist. Examples include the Halton sequence (=-=Halton, 1960-=-), the Sobol sequence (Sobol, 1967), the Faure sequence (Faure, 1982), and the Niederreiter sequence (Niederreiter, 1992), but this list is not exhaustive. In this work we focus our attention on the H... |

152 | A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms - Wei, Tanner - 1990 |

140 |
On the distribution of points in a cube and the approximate evaluation of integrals
- Sobol
- 1967
(Show Context)
Citation Context ...y sequences (see, for example, Niederreiter, 1992; Fang & Wang, 1994). A variety of different low-discrepancy sequences exist. Examples include the Halton sequence (Halton, 1960), the Sobol sequence (=-=Sobol, 1967-=-), the Faure sequence (Faure, 1982), and the Niederreiter sequence (Niederreiter, 1992), but this list is not exhaustive. In this work we focus our attention on the Halton sequence since it is concept... |

105 | Asymptotic Methods in Analysis - Bruijn - 1981 |

97 | Model-based geostatistics
- Diggle, Tawn, et al.
- 1998
(Show Context)
Citation Context ...he associated two observations. For example, assume that Cov(ui, uj) = σ 2 exp{−α�zi − zj�}, (17) where �·� denotes the Euclidian norm. While different modelling alternatives exist (see, for example, =-=Diggle et al., 1998-=-), we will use the above model to investigate the efficiency of Quasi-Monte Carlo MCEM implementations for estimating the parameter vector θ = (β, σ 2 , α). We analyze a set of online retail data for ... |

94 |
Discrépance de suites associées à un système de numération (en dimension s
- Faure
- 1982
(Show Context)
Citation Context ...derreiter, 1992; Fang & Wang, 1994). A variety of different low-discrepancy sequences exist. Examples include the Halton sequence (Halton, 1960), the Sobol sequence (Sobol, 1967), the Faure sequence (=-=Faure, 1982-=-), and the Niederreiter sequence (Niederreiter, 1992), but this list is not exhaustive. In this work we focus our attention on the Halton sequence since it is conceptually very appealing. 2.1 Halton S... |

81 | Quasi-random Maximum Simulated Likelihood Estimation of the Mixed Multinomial Logit Model.” Transportation Research Part B - Bhat |

66 |
Maximum likelihood algorithms for generalized linear mixed models
- McCulloch
- 1997
(Show Context)
Citation Context ...ethod require a manual, user-determined increase of the sample size, for instance, by allocating the amount of data to be simulated in each iteration already before the start 2sof the algorithm (e.g. =-=McCulloch, 1997-=-). Implementations of MCEM that determine the necessary sample size in an automated, data-driven fashion have been developed only recently (see Booth & Hobert, 1999; Levine & Casella, 2001; Levine & F... |

62 |
Number-Theoretic Methods in Statistics
- Fang, Wang
- 1994
(Show Context)
Citation Context ...eterministic sequence of points that provides the best-possible spread in C d . These deterministic sequences are often referred to as low-discrepancy sequences (see, for example, Niederreiter, 1992; =-=Fang & Wang, 1994-=-). A variety of different low-discrepancy sequences exist. Examples include the Halton sequence (Halton, 1960), the Sobol sequence (Sobol, 1967), the Faure sequence (Faure, 1982), and the Niederreiter... |

61 |
A gradient algorithm locally equivalent to the EM algorithm
- Lange
- 1995
(Show Context)
Citation Context ...es Q(θ (t) |θ (t−1) ) ≥ Q(θ|θ (t−1) ) (6) for all θ in the parameter space. This is also known as the M-step. The M-step is often implemented using standard numerical methods like Newton-Raphson (see =-=Lange, 1995-=-). Solutions to overcome a difficult M-step have been proposed in, for example, Meng & Rubin (1993). Given an initial value θ (0) , the EM algorithm produces a sequence {θ (0) , θ (1) , θ (2) , . . . ... |

59 | 2002. “Recent advances in randomized quasi-Monte Carlo methods”. In Modeling Uncertainty: an Examination of Stochastic Theory - L’Ecuyer, Lemieux |

55 |
Maximizing generalized linear mixed model likelihoods with an automated monte carlo EM algorithm. J.R.Statist
- Booth, Hobert
- 1999
(Show Context)
Citation Context ...efore the start 2sof the algorithm (e.g. McCulloch, 1997). Implementations of MCEM that determine the necessary sample size in an automated, data-driven fashion have been developed only recently (see =-=Booth & Hobert, 1999-=-; Levine & Casella, 2001; Levine & Fan, 2003). Automated implementations of MCEM base the decision to increase the sample size on the magnitude of the error in the integralapproximation. In their semi... |

50 |
On the convergence of the EM algorithm
- Boyles
- 1983
(Show Context)
Citation Context ...ep have been proposed in, for example, Meng & Rubin (1993). Given an initial value θ (0) , the EM algorithm produces a sequence {θ (0) , θ (1) , θ (2) , . . . } that, under regularity conditions (see =-=Boyles, 1983-=-; Wu, 1983), converges to ˆ θ. In this work we focus on the situation when the E-step does not have a closed form solution. Wei & Tanner (1990) proposed to approximate an analytically intractable expe... |

43 |
Valuation of Mortgage-Backed Securities Using Brownian Bridges to Reduce Effective Dimension
- Caflisch, Morokoff, et al.
- 1997
(Show Context)
Citation Context ...ot necessarily result in the most efficient use of the simulated data. In particular, one criticism of random draws is that they often do not explore the sample space well (Morokoff & Caflisch, 1995; =-=Caflisch et al., 1997-=-). For instance, points drawn at random tend to form clusters which leads to gaps where the sample space is not explored at all (see Figure 1 for illustration). This criticism has lead to the developm... |

42 | Quasi-Monte Carlo Integration
- Morokoff, Caflisch
- 1993
(Show Context)
Citation Context ... entirely random draws do not necessarily result in the most efficient use of the simulated data. In particular, one criticism of random draws is that they often do not explore the sample space well (=-=Morokoff & Caflisch, 1995-=-; Caflisch et al., 1997). For instance, points drawn at random tend to form clusters which leads to gaps where the sample space is not explored at all (see Figure 1 for illustration). This criticism h... |

38 |
Monte Carlo EM Estimation for Time Series Models Involving Counts
- Chan, Ledolter
- 1995
(Show Context)
Citation Context ... successively as the algorithm moves along. In fact, Booth et al. (2001) argue that MCEM will never converge if mt is held fixed across iterations because of a persevering Monte Carlo error (see also =-=Chan & Ledolter, 1995-=-). While earlier versions of the method choose the Monte Carlo sample sizes in a deterministic fashion before the start of the algorithm (e.g. McCulloch, 1997), the same deterministic allocation of Mo... |

35 | Ox: Object Oriented Matrix Programming - Doornik - 2001 |

31 |
Numerical methods for stochastic processes
- Bouleau, Lépingle
- 1994
(Show Context)
Citation Context ...n1, ..., nd) T , say, the Halton sequence with the first elements skipped, xk = [φb1 (n1 + k − 1), ..., φbd (nd + k − 1)] T , k = 1, ..., m, (4) 4sremains a low-discrepancy sequence (see Pagès, 1992; =-=Bouleau & Lépingle, 1994-=-). We will refer to the sequence defined by (4) as a Halton sequence with starting point n. Figure 1 shows the first 2500 elements of a two-dimensional Haltion sequence with n = (0, 0) T . 2.2 Randomi... |

29 |
Implementations of the Monte Carlo EM algorithm
- Levine, Casella
- 2001
(Show Context)
Citation Context ...he algorithm (e.g. McCulloch, 1997). Implementations of MCEM that determine the necessary sample size in an automated, data-driven fashion have been developed only recently (see Booth & Hobert, 1999; =-=Levine & Casella, 2001-=-; Levine & Fan, 2003). Automated implementations of MCEM base the decision to increase the sample size on the magnitude of the error in the integralapproximation. In their seminal work, Booth & Hobert... |

26 | Randomized halton sequences - Wang, Hickernell |

22 | Efficiency improvement by lattice rules for pricing Asian options - Lemieux, L’Ecuyer - 1998 |

22 | Scrambling Sobol’ and Niederreiter-Xing points - OWEN - 1998 |

16 | Monte Carlo Extension of Quasi-Monte Carlo - Owen - 1998 |

10 | A survey of monte carlo algorithms for maximizing the likelihood of a two-stage hierarchical model - Booth, Hobert, et al. - 2001 |

9 | Hiearchical models: a current computational perspective - Hobert - 2000 |

6 |
An Automated (Markov Chain) Monte Carlo EM Algorithm
- Levine, Fan
- 2003
(Show Context)
Citation Context ...loch, 1997). Implementations of MCEM that determine the necessary sample size in an automated, data-driven fashion have been developed only recently (see Booth & Hobert, 1999; Levine & Casella, 2001; =-=Levine & Fan, 2003-=-). Automated implementations of MCEM base the decision to increase the sample size on the magnitude of the error in the integralapproximation. In their seminal work, Booth & Hobert (1999) use statisti... |

6 |
der Corput sequences, Kakutani transforms and one-dimensional numerical integration
- PAGES, Van
- 1992
(Show Context)
Citation Context ...tegers, n = (n1, ..., nd) T , say, the Halton sequence with the first elements skipped, xk = [φb1 (n1 + k − 1), ..., φbd (nd + k − 1)] T , k = 1, ..., m, (4) 4sremains a low-discrepancy sequence (see =-=Pagès, 1992-=-; Bouleau & Lépingle, 1994). We will refer to the sequence defined by (4) as a Halton sequence with starting point n. Figure 1 shows the first 2500 elements of a two-dimensional Haltion sequence with ... |

4 |
Laplace importance sampling for generalized linear mixed models
- Kuk
- 1999
(Show Context)
Citation Context ...ndomized Halton sequences using Laplace importance sampling. Laplace importance sampling has been proven useful to draw approximate samples from f(u|y; θ) in many instances (see Booth & Hobert, 1999; =-=Kuk, 1999-=-). Laplace importance sampling attempts to find an importance sampling distribution whose mean and variance match the mode and curvature of f(u|y; θ). More specifically, suppressing the dependence on ... |