## ON THE COMPUTATIONAL COMPLEXITY OF MCMC-BASED ESTIMATORS IN LARGE SAMPLES

Citations: | 2 - 1 self |

### BibTeX

@MISC{Belloni_onthe,

author = {Alexandre Belloni and Victor Chernozhukov},

title = {ON THE COMPUTATIONAL COMPLEXITY OF MCMC-BASED ESTIMATORS IN LARGE SAMPLES},

year = {}

}

### OpenURL

### Abstract

In this paper we examine the implications of the statistical large sample theory for the computational complexity of Bayesian and quasi-Bayesian estimation carried out using Metropolis random walks. Our analysis is motivated by the Laplace-Bernstein-Von Mises central limit theorem, which states that in large samples the posterior or quasi-posterior approaches a normal density. Using this observation, we establish polynomial bounds on the computational complexity of general Metropolis random walks methods in large samples. Our analysis covers cases, where the underlying log-likelihood or extremum criterion function is possibly non-concave, discontinuous, and of increasing dimension. However, the central limit theorem restricts the deviations from continuity and log-concavity of the log-likelihood or extremum criterion function in a very specific manner. Under minimal assumptions for the central limit theorem framework to hold, we show that the Metropolis algorithm is theoretically

### Citations

903 | Monte Carlo Statistical Methods - Robert, Casella - 2004 |

378 |
Generalized instrumental variables estimation of nonlinear rational expectations models, Econometrica
- Hansen, Singleton
- 1982
(Show Context)
Citation Context ...conometric applications, often moment restrictions represent Euler equations that result from the data x being an outcome of an optimization by rational decision-makers; see e.g. Hansen and Singleton =-=[21]-=-, Chamberlain [8], Imbens [23], and Donald, Imbens and Newey [11]. Thus, the curved exponential framework is a fundamental complement of the exponential framework, at least in certain fields of data a... |

297 |
Approximating the permanent
- Jerrum, Sinclair
- 1989
(Show Context)
Citation Context ...u(K\A)dQ(u) . Q(A) Lovász and Simonovits [34] proved the connection between conductance and convergence for the continuous space setting. This result extended an earlier result of Jerome and Sinclair =-=[24, 25]-=-, who connected convergence and conductance for discrete state spaces. Lovász and Simonovits’ result can be re-stated as follows. Theorem 2 Let Q0 be a M-warm start with respect to the stationary dist... |

230 | Statistical Estimation: Asymptotic Theory - Ibragimov, Has’minskii - 1981 |

175 |
Central limit theorem for additive functionals of reversible Markov processes and applications to simple exclusions
- Kipnis, Varadhan
- 1986
(Show Context)
Citation Context ...e Markov chain sample will be crucial in determining the appropriate sample size. The starting point of our analysis is a central limit theorem for reversible Markov chains due to Kipnis and Varadhan =-=[29]-=- which is restated here for convenience. Consider a reversible Markov chain on K with stationary distribution f. The lag k autocovariance of the stationary time series { g(λi ) } ∞ i=1 , obtained by s... |

175 | Robust regression: Asymptotics, Conjectures and Monte Carlo. The Annals of Statistics 1 - Huber - 1973 |

170 |
Monte Carlo Strategies
- Liu
- 2001
(Show Context)
Citation Context ... subject classifications: Primary, 65C05; secondary 65C60 Keywords and phrases: Monte Carlo, Computational Complexity, Curved Exponential 12 BELLONI AND CHERNOZHUKOV [7], Chib [10], Geweke [18], Liu =-=[33]-=- for detailed treatments of the MCMC methods and their applications in various areas of statistics, econometrics, and biometrics.) Bayesian methods rely on a likelihood formulation, while quasi-Bayesi... |

169 |
Information and exponential families in statistical theory
- Barndorff-Nielsen
- 1978
(Show Context)
Citation Context ...on of the density of interest. There are many classical examples of curved26 BELLONI AND CHERNOZHUKOV exponential families; see for example Efron [13], Lehmann and Casella [31], and Bandorff-Nielsen =-=[3]-=-. An example of the condition that puts a curved structure onto an exponential family is a moment restriction of the type: ∫ m(x,α)f(x,θ)dx = 0. This condition restricts θ to lie on a curve that can b... |

153 |
Weak Convergence and Optimal Scaling of Random Walk Metropolis Algorthims.”Annals of Applied Probability 7
- Roberts, Gelman, et al.
- 1997
(Show Context)
Citation Context ...random walk is not the most sophisticated algorithm available. Thus, in principle further improvements could be obtained by considering different kinds of algorithms, for example, the Langevin method =-=[40, 45, 42, 1]-=- which adds the Newton one-steps to the Metropolis chain. (Of course, the algorithm requires a smooth gradient of the log-likelihood function, which rules out nonsmooth and discontinuous cases emphasi... |

141 | Polynomial splines and their tensor products in extended linear modeling
- Stone, Hansen, et al.
- 1997
(Show Context)
Citation Context ...Cases. Exponential families play a very important role in statistical estimation, cf. Lehmann and Casella [31], especially in highdimensional contexts, cf. Portnoy [39], Ghosal [20], and Stone et al. =-=[44]-=-. For example, the high-dimensional situations arise in modern data sets in technometric and econometric applications. Moreover, exponential familes have excellent approximation properties and are use... |

140 | Sharper bounds for Gaussian and empirical processes - Talagrand - 1994 |

104 | CAViaR: Conditional autoregressive value at risk by regression quantiles
- Engle, Manganelli
- 2004
(Show Context)
Citation Context ...ion, see Koenker and Bassett [30], m(u) = (α − 1(u < 0))u. To reflect dependence on all past data and accurately capture GARCHlike dependencies, leading research in this area (see Engle and Maganelli =-=[14]-=-) considers recursive models of the form qi = f(Xi,qi−1,qi−2,...;θ), for instance, f(Xi,qi−1,qi−2,...;θ) = X ′ i γ+ρ1qi−1+ρ2qi−2. This implies a highly non-linear, recursive specification for the regr... |

91 |
Asymptotic Efficiency in Estimation with Conditional Moment Restrictions
- Chamberlain
- 1987
(Show Context)
Citation Context ...tions, often moment restrictions represent Euler equations that result from the data x being an outcome of an optimization by rational decision-makers; see e.g. Hansen and Singleton [21], Chamberlain =-=[8]-=-, Imbens [23], and Donald, Imbens and Newey [11]. Thus, the curved exponential framework is a fundamental complement of the exponential framework, at least in certain fields of data analysis. We requi... |

91 |
Conductance and the rapid mixing property for Markov chains: the approximation of the permanent resolved
- Jerrum, Sinclair
- 1988
(Show Context)
Citation Context ...u(K\A)dQ(u) . Q(A) Lovász and Simonovits [34] proved the connection between conductance and convergence for the continuous space setting. This result extended an earlier result of Jerome and Sinclair =-=[24, 25]-=-, who connected convergence and conductance for discrete state spaces. Lovász and Simonovits’ result can be re-stated as follows. Theorem 2 Let Q0 be a M-warm start with respect to the stationary dist... |

89 | Optimal scaling for various Metropolis-Hastings algorithms
- Roberts, Rosenthal
- 2001
(Show Context)
Citation Context ...120λmax d‖K‖ In order to apply Theorem 3 we rely on σ being defined in (3.19) as a function of the relevant theoretical quantities. More practical choices of the parameter, as in Robert and Rosenthal =-=[41]-=- and Gelman, Roberts and Gilks [17], suggest to “tune” the parameter to ensure a particular average acceptance rate for the steps of the Markov Chain. These cases are exactly the cases covered by our ... |

79 |
Markov chain Monte Carlo methods: computation and inference
- Chib
- 2001
(Show Context)
Citation Context ... acknowledged. AMS 2000 subject classifications: Primary, 65C05; secondary 65C60 Keywords and phrases: Monte Carlo, Computational Complexity, Curved Exponential 12 BELLONI AND CHERNOZHUKOV [7], Chib =-=[10]-=-, Geweke [18], Liu [33] for detailed treatments of the MCMC methods and their applications in various areas of statistics, econometrics, and biometrics.) Bayesian methods rely on a likelihood formulat... |

75 |
Random walks and an O ∗ (n 5 ) volume algorithm for convex bodies. Random Structures and Algorithms 11
- Kannan, Lovász, et al.
- 1997
(Show Context)
Citation Context ...mary concerned with the question of approximating the volume of high dimensional convex sets where uniform densities play a fundamental role (Lovász and Simonovits [34], Kannan, Lovász and Simonovits =-=[27, 28]-=-). Later the approach was generalized for the cases where the log-likelihood is concave (Frieze, Kannan and Polson [16], Polson [38], and Lovász and Vempala [35, 36, 37]). However, under log-concavity... |

70 | Isoperimetric Problems for Convex Bodies and a Localization
- Kannan, Lovász, et al.
- 1995
(Show Context)
Citation Context ...eral fundamental papers studying the computational complexity of Metropolis procedures, especially Applegate and Kannan [2], Frieze, Kannan and Polson [16], Polson [38], Kannan, Lovász and Simonovits =-=[27]-=-, Kannan and Li [26], Lovász and Simonovits [34], and Lovász and Vempala [35, 36, 37]. Many of our results and proofs rely upon and extend the mathematical tools previously developed in these works. W... |

55 | A MCMC Approach to Classical Estimation
- Chernozhukov, Hong
- 2003
(Show Context)
Citation Context ...al discontinuities and non-concavities and is often much easier to compute in practice than the argmax estimator; see, for example, the discussion in Liu, Tian, and Wei [32] and Chernozhukov and Hong =-=[9]-=-. This paper will show that if the sample size n grows to infinity and the dimension of the problem d does not grow too quickly relative to the sample size, the quasi-posterior (1.4) ∫ Θ exp{Qn(θ)} ex... |

42 |
Practical Markov chain Monte
- Geyer
- 1992
(Show Context)
Citation Context ...and subsampling). These (new) characterizations complement the previous well-known characterizations of the error of calculating (4.24) in terms of covariance functions of the underlying chain (Geyer =-=[19]-=- and Casella and Roberts [7]).20 BELLONI AND CHERNOZHUKOV The integral is computed by simulating a dependent (Markovian) sequence of random points λ 1 , λ 2 , ..., which has f as the stationary distr... |

38 | Hit-and-Run from a Corner
- Lovász, Vempala
- 2004
(Show Context)
Citation Context ...rocedures, especially Applegate and Kannan [2], Frieze, Kannan and Polson [16], Polson [38], Kannan, Lovász and Simonovits [27], Kannan and Li [26], Lovász and Simonovits [34], and Lovász and Vempala =-=[35, 36, 37]-=-. Many of our results and proofs rely upon and extend the mathematical tools previously developed in these works. We extend the complexity analysis of the previous literature, which has focused on the... |

38 | Isoperimetry and Gaussian analysis. Lectures on probability theory and statistics (Saint-Flour - Ledoux - 1994 |

35 | The geometry of logconcave functions and sampling algorithms
- Lovász, Vempala
(Show Context)
Citation Context ...e from below. In what follows we provide an outline of the proof, auxiliary results, and, finally, the formal proof. 3.2.1. Outline of the Proof. The proof follows the arguments in Lovász and Vempala =-=[35]-=-. In order to bound the ergodic flow of A ∈ A, consider the particular disjoint partition K = ˜ S1 ∪ ˜ S2 ∪ ˜ S3 where ˜ S1 ⊂ A, ˜ S2 ⊂ K \ A, and ˜ S3 consists of points in A or K \ A for which the o... |

35 | Hit-and-run mixes fast - Lovász - 1999 |

34 |
One-step estimators for over-identified generalized method of moments models. Rev. Econ. Studies 64:359–383
- Imbens
- 1997
(Show Context)
Citation Context ...ial and curved exponential families. The curved families arise for example when the data must satisfy additional moment restrictions, as e.g. in Hansen and Singleton [21], Chamberlain [8], and Imbens =-=[23]-=-. The curved families fall outside the log-concave framework. The rest of the paper is organized as follows. In Section 2, we establish a generalized version of the Central Limit Theorem for Bayesian ... |

32 |
Efficient Metropolis Jumping Rules,” in Bayesian Statistics 5
- Gelman, Roberts, et al.
- 1996
(Show Context)
Citation Context ...rem 3 we rely on σ being defined in (3.19) as a function of the relevant theoretical quantities. More practical choices of the parameter, as in Robert and Rosenthal [41] and Gelman, Roberts and Gilks =-=[17]-=-, suggest to “tune” the parameter to ensure a particular average acceptance rate for the steps of the Markov Chain. These cases are exactly the cases covered by our (theoretical) choice of σ (of cours... |

28 |
The geometry of exponential families
- Efron
- 1978
(Show Context)
Citation Context ..., and η describes the lower-dimensional parametrization of the density of interest. There are many classical examples of curved26 BELLONI AND CHERNOZHUKOV exponential families; see for example Efron =-=[13]-=-, Lehmann and Casella [31], and Bandorff-Nielsen [3]. An example of the condition that puts a curved structure onto an exponential family is a moment restriction of the type: ∫ m(x,α)f(x,θ)dx = 0. Thi... |

28 |
Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity
- Portnoy
- 1988
(Show Context)
Citation Context ...veloped in this paper. 5.1. Concave Cases. Exponential families play a very important role in statistical estimation, cf. Lehmann and Casella [31], especially in highdimensional contexts, cf. Portnoy =-=[39]-=-, Ghosal [20], and Stone et al. [44]. For example, the high-dimensional situations arise in modern data sets in technometric and econometric applications. Moreover, exponential familes have excellent ... |

27 | Geometric Random Walks: A Survey - Vempala - 2005 |

24 | Convergence of Markov Chain Monte Carlo Algorithms - Polson - 1996 |

21 | Asymptotic normality of posterior distributions for exponential families when the number of parameters tends to infinity
- Ghosal
- 2000
(Show Context)
Citation Context ...Chernozhukov and Hong [9] provided CLTs for quasi-posteriors formed using various non-likelihood criterion functions. In contrast to these previous results, we allow for increasing dimensions. Ghosal =-=[20]-=- also previously derived a CLT for posteriors with increasing dimension for concave exponential families. We go beyond such canonical setup and establish the CLT for non-concave and discontinuous case... |

18 |
Some contributions to the asymptotic theory of Bayes solutions
- Bickel, Yahav
- 1969
(Show Context)
Citation Context ...or quasi-posteriors and posteriors which extends the CLT previously derived in the literature for posteriors and quasi-posteriors for fixed dimension. In particular, Laplace c. 1809, Bickel and Yahav =-=[5]-=-, Ibragimov and Hasminskii [22], and Bunke and Milhaud [6] provided CLTs theorems for posteriors. Liu, Tian, and Wei [32] and Chernozhukov and Hong [9] provided CLTs for quasi-posteriors formed using ... |

17 |
Empirical Likelihood Estimation and Consistent Tests with Conditional Moment Restrictions,”mimeo
- Donald, Imbens, et al.
- 2008
(Show Context)
Citation Context ...r equations that result from the data x being an outcome of an optimization by rational decision-makers; see e.g. Hansen and Singleton [21], Chamberlain [8], Imbens [23], and Donald, Imbens and Newey =-=[11]-=-. Thus, the curved exponential framework is a fundamental complement of the exponential framework, at least in certain fields of data analysis. We require the following additional regularity condition... |

17 |
Sampling according to the multivariate normal density
- Kannan, Li
- 1996
(Show Context)
Citation Context ...ers studying the computational complexity of Metropolis procedures, especially Applegate and Kannan [2], Frieze, Kannan and Polson [16], Polson [38], Kannan, Lovász and Simonovits [27], Kannan and Li =-=[26]-=-, Lovász and Simonovits [34], and Lovász and Vempala [35, 36, 37]. Many of our results and proofs rely upon and extend the mathematical tools previously developed in these works. We extend the complex... |

16 |
Asymptotic Normality of Semiparametric and Nonparametric Posterior Distributions
- Shen
- 2002
(Show Context)
Citation Context ...heorem 1 extends the CLT previously derived in the literature for posteriors in the likelihood framework (Bickel and Yahav [5], Ibragimov and Hasminskii [22], Bunke and Milhadu [6], Ghosal [20], Shen =-=[43]-=-) and for quasi-posteriors in the general extremum framework, when the likelihood is replaced by general criterion functions (Liu, Tian, and Wei [32] and Chernozhukov and Hong [9]). The theorem also e... |

10 | Survival analysis with median regression models - Ying, Jung, et al. - 1995 |

9 |
Langevin-type models I: Diffusions with given stationary distributions, and their discretizations
- Stramer, Tweedie
- 1999
(Show Context)
Citation Context ...random walk is not the most sophisticated algorithm available. Thus, in principle further improvements could be obtained by considering different kinds of algorithms, for example, the Langevin method =-=[40, 45, 42, 1]-=- which adds the Newton one-steps to the Metropolis chain. (Of course, the algorithm requires a smooth gradient of the log-likelihood function, which rules out nonsmooth and discontinuous cases emphasi... |

8 |
Asymptotic behavior of Bayes estimates under possibly incorrect models
- Bunke, Milhaud
- 1998
(Show Context)
Citation Context ...reviously derived in the literature for posteriors and quasi-posteriors for fixed dimension. In particular, Laplace c. 1809, Bickel and Yahav [5], Ibragimov and Hasminskii [22], and Bunke and Milhaud =-=[6]-=- provided CLTs theorems for posteriors. Liu, Tian, and Wei [32] and Chernozhukov and Hong [9] provided CLTs for quasi-posteriors formed using various non-likelihood criterion functions. In contrast to... |

8 | Choosing sample path length and number of sample paths when starting at steady state - Fishman - 1994 |

8 | Asymptotic theory and econometric practice - Koenker - 1988 |

7 | Uniform Cental Limit Theorems, Cambridge Studies in advanced mathematics - Dudley - 2000 |

7 |
Geometric random walk: a survey. Combinatorial and computational geometry
- Vempala
- 2005
(Show Context)
Citation Context ...e Metropolis filter (which depends on the likelihood function ℓ on the current and on the candidate point) otherwise the random walk stays on the current point (see Casella and Robert [7] and Vempala =-=[47]-=- for details; Section 3.2.4 describe the canonical Gaussian random walk). In the complexity analysis of this algorithm we are interested in bounding the number of steps of the random walk required to ... |

6 | Approximate normality of large products - Blackwell - 1985 |

5 |
Theory of point estimation. Second edition. Springer Texts in Statistics
- Lehmann, Casella
- 1998
(Show Context)
Citation Context ...ns to illustrate the full applicability of the approach developed in this paper. 5.1. Concave Cases. Exponential families play a very important role in statistical estimation, cf. Lehmann and Casella =-=[31]-=-, especially in highdimensional contexts, cf. Portnoy [39], Ghosal [20], and Stone et al. [44]. For example, the high-dimensional situations arise in modern data sets in technometric and econometric a... |

5 | 2010): "Conditional Quantile Processes under Increasing Dimension - Belloni, Chernozhukov |

5 | On Parameters of Increasing Dimenions - He, Shao - 2000 |

4 |
Computationally Intensive Methods for Integration
- Geweke, Keane
- 2001
(Show Context)
Citation Context .... AMS 2000 subject classifications: Primary, 65C05; secondary 65C60 Keywords and phrases: Monte Carlo, Computational Complexity, Curved Exponential 12 BELLONI AND CHERNOZHUKOV [7], Chib [10], Geweke =-=[18]-=-, Liu [33] for detailed treatments of the MCMC methods and their applications in various areas of statistics, econometrics, and biometrics.) Bayesian methods rely on a likelihood formulation, while qu... |

4 | The rate of strong uniform consistency for the product-limit estimator - Csögő, Horváth - 1983 |

3 |
Atchade (2006). An adaptive version for the Metropolis adjusted Langevin algorithm with a truncated drift, Methodol
- F
(Show Context)
Citation Context ...ng CLT conditions and Theorem 2. 3.2.2. An Iso-perimetric Inequality. We start by defining a notion of approximate log-concavity. A function f : IR d → IR is said to be log-β-concave if for every α ∈ =-=[0,1]-=-, x,y ∈ IR d , we have f (αx + (1 − α)y) ≥ βf(x) α f(y) 1−αCOMPLEXITY OF MCMC 15 for some β ∈ (0,1]. f is said to be logconcave if β can be taken equal to one. The class of log-β-concave functions is... |

3 |
Sampling and Integration of Near Logconcave
- Applegate, Kannan
- 1993
(Show Context)
Citation Context ... BELLONI AND CHERNOZHUKOV Our analysis of computational complexity builds on several fundamental papers studying the computational complexity of Metropolis procedures, especially Applegate and Kannan =-=[2]-=-, Frieze, Kannan and Polson [16], Polson [38], Kannan, Lovász and Simonovits [27], Kannan and Li [26], Lovász and Simonovits [34], and Lovász and Vempala [35, 36, 37]. Many of our results and proofs r... |