## Convergence rates of posterior distributions (2000)

### Cached

### Download Links

Venue: | Ann. Statist |

Citations: | 44 - 11 self |

### BibTeX

@ARTICLE{Ghosal00convergencerates,

author = {Subhashis Ghosal and Jayanta K. Ghosh and Aad W. Van Der Vaart},

title = {Convergence rates of posterior distributions},

journal = {Ann. Statist},

year = {2000},

pages = {500--531}

}

### Years of Citing Articles

### OpenURL

### Abstract

We consider the asymptotic behavior of posterior distributions and Bayes estimators for infinite-dimensional statistical models. We give general results on the rate of convergence of the posterior measure. These are applied to several examples, including priors on finite sieves, log-spline models, Dirichlet processes and interval censoring. 1. Introduction. Suppose

### Citations

1094 |
A Practical Guide to Splines
- Boor
(Show Context)
Citation Context ...ictly less than q. ThesePOSTERIOR CONVERGENCE RATES 215 splines form a J = (q + K − 1)-dimensional linear space, with a convenient basis B1,B2,...,BJ being the B-splines, as defined in, for example, =-=[11]-=-. The B-splines satisfy (i) Bj ≥ 0, j = 1, 2,...,J, (ii) ∑J j=1 Bj = 1, (iii) Bj is supported inside an interval of length q/K and (iv) at most q of B1(x), . . . , BJ (x) are nonzero at any given x. L... |

720 | A bayesian analysis of some nonparametric problems - Ferguson - 1973 |

650 |
Time Series: Theory and Methods
- Brockwell, Davis
- 1987
(Show Context)
Citation Context ...nditions, (In(λn,1),...,In(λn,m)) converges weakly to a vector of independent exponential variables with mean vector (f (λ1), . . . , f (λm)); see, for instance, Theorem 10.3.2 of Brockwell and Davis =-=[6]-=-. Dahlhaus [10] applied the technique of Whittle likelihood to estimating the spectral density by the minimum contrast method. A consistent Bayesian nonparametric method has been proposed by Choudhuri... |

546 |
R.: Markov Chains and Stochastic Stability
- Meyn, Tweedie
- 1993
(Show Context)
Citation Context ...n (x, ·) − Q‖→0 as n →∞, uniformly in x, where‖·‖ is the total variation norm. It can be200 S. GHOSAL AND A. W. VAN DER VAART shown that the convergence is then automatically exponentially fast (cf. =-=[23]-=-, Theorem 16.0.2). Thus, the α-mixing coefficients are exponentially decreasing and hence satisfy ∑ ∞ h=0 α 1−2/s h < ∞ for every s>2. Hence, it suffices to verify (4.7) with some arbitrary fixed s>2.... |

344 | Weak Convergence and Empirical Processes - Vaart, Wellner - 2000 |

277 |
Asymptotic methods in statistical decision theory. Springer series in statistics
- LeCam
- 1986
(Show Context)
Citation Context ...2 (3.1) dµi. d 2 n (θ, θ ′ ) = 1 n i=1 Thus, d2 n is the average of the squares of the Hellinger distances for the distributions of the individual observations. The following lemma, due to Birgé (cf. =-=[22]-=-, page 491, or [4], Corollary 2 on page 149), guarantees the existence of tests satisfying the conditions of (2.2). LEMMA 2. If P (n) θ exist tests φn such that P (n) θ0 all θ ∈ � such that dn(θ, θ1) ... |

246 |
Weak Convergence and Empirical Processes: With Applications to Statistics
- Vaart, Wellner
- 1996
(Show Context)
Citation Context ...ral and real numbers, respectively. The ε-covering number of a set � for a semimetric d, denoted by N(ε,�,d), is the minimal number of d-balls of radius ε needed to cover the set �; see, for example, =-=[31]-=-.194 S. GHOSAL AND A. W. VAN DER VAART 2. General theorem. For each n ∈ N and θ ∈ �,letP (n) θ admit densities p (n) θ relative to a σ -finite measure µ(n). Assume that (x, θ) ↦→ p (n) θ (x) is joint... |

230 |
Statistical Estimation: Asymptotic Theory
- Ibragimov, Khasminskii
- 1981
(Show Context)
Citation Context ...5. Finite-dimensional i.n.i.d. models. Theorem 4 is also applicable to finitedimensional models and yields the usual convergence rate as shown below. The result may be compared with Theorem I.10.2 of =-=[19]-=- and Proposition 1 of [13]. THEOREM 10. Let X1,...,Xn be i.n.i.d. observations following densities pθ,i, where � ⊂ Rd . Let θ0 be an interior point of �. Assume that there existPOSTERIOR CONVERGENCE ... |

161 | Prior distributions on spaces of probability measures - Ferguson - 1974 |

161 |
Probability Theory
- Chow, Teicher
- 1988
(Show Context)
Citation Context ...s equal to the sum of the Kullback–Leibler divergences between the individual components. θPOSTERIOR CONVERGENCE RATES 197 Furthermore, as a consequence of the Marcinkiewiz–Zygmund inequality (e.g., =-=[9]-=-, page 356), the mean ¯Yn of n independent random variables satisfies E| ¯Yn − E ¯Yn| k ≤ Ckn−k/2 1 ∑ni=1 n E|Yi| k for k ≥ 2, where Ck is a constant depending only on k. Therefore, the set Bn(θ0,ε; k... |

142 | The use of polynomial splines and their tensor product in multivariate function estimation
- Stone
- 1994
(Show Context)
Citation Context ... T ∞ B − f0‖∞ � J −α ‖f0‖α. Thus, by increasing J appropriately with the sample size, we may view the space of splines as a sieve for the construction of the maximum likelihood estimator, as in Stone =-=[28, 29]-=-, and for Bayes estimates as in [14, 15] for the problem of density estimation. To put a prior on f , we represent it as fβ(z) = βT B(z) and induce a prior on f from a prior on β. Ghosal, Ghosh and va... |

118 |
Empirical Processes: Theory and Applications
- Pollard
- 1990
(Show Context)
Citation Context ...ant c and hence, with Fn denoting the support of �n, N(ε,{f ∈ Fn :‖f − f0‖2 ≤ 16ε}, ‖·‖2) ≤ N(ε,{α ∈ R K :‖α − α0‖2 ≤ 16cε}, ‖·‖2) ≤ (80c) K ,210 S. GHOSAL AND A. W. VAN DER VAART as in Lemma 4.1 of =-=[25]-=-. Thus, (4.5) holds if nε 2 n � K. To verify (4.7), note that for λ = (λ(I1), . . . , λ(IK)), ‖fα0 − f0‖ s s = ∫ |f0| s dλ+ ∑ ∫ |α0,k − f0| s rdλ≤ M s r(I0) + L s ‖λ‖ s s . I0 Hence, as f0 ∈ F ,foreve... |

82 | A course on empirical processes - Dudley - 1984 |

81 | Consistency of posterior distributions in nonparametric problems - Barron, Shervish, et al. - 1999 |

79 | From model selection to adaptive estimation - Birgé, Massart - 1997 |

73 | Minimum contrast estimators on sieves: exponential bounds and rates of convergence - Birgé, Massart - 1998 |

73 | On the asymptotic behavior of Bayes estimates in the discrete - FREEDMAN - 1963 |

72 | Asymptotics in Statistics. Some Basic Concepts. Second Edition - Cam, Yang - 2000 |

72 | On Bayes procedures - Schwartz - 1965 |

68 | P.: Rates of convergence of minimum contrast estimators - Birgé, Massart - 1993 |

67 | Posterior consistency of Dirichlet mixtures in density estimation - Ghosal, Ghosh, et al. - 1999 |

66 |
Approximations dans les espaces métriques et théorie de l’estimation
- Birgé
- 1983
(Show Context)
Citation Context ...e at ε>0isdefinedtobe log N(εξ,{θ : dn(θ, θ0) ≤ ε},en), that is, the logarithm of the minimum number of dn-balls of radius εξ needed to cover an en-ball of radius ε around the true parameter θ0. Birgé=-=[3, 4]-=- andLeCam[20–22] showed that there exist estimators ˆθn = ˆθn(X (n) ) such that dn( ˆθn,θ0) = OP (εn) under P (n) θ0 ,where sup log N ε>εn ( ) 2 (2.3) εξ,{θ : dn(θ, θ0) ≤ ε},en ≤ nεn . Further, under ... |

61 |
Some limit theorems for stationary processes, Theory of Probability and its Applications 7
- Ibragimov
- 1962
(Show Context)
Citation Context .... are α-mixing with mixing coefficients αh−1. Therefore, the variance of the left-hand side of (4.4) is bounded above by n(E|Yi| s ) 2/s × 4s(s − 2) −1 ∑ ∞ h=1 α 1−2/s h−1 , by the bound of Ibragimov =-=[18]-=-. □ Let �1 ⊂ � be the set of parameter values such that K(qθ0 ,qθ) and V(qθ0 ,qθ) are bounded by 1. Then from (4.3) and Lemma 4, it follows that for large n and ε2 ≥ 2/n,thesetBn(θ0,ε; 2) contains the... |

60 | The Dimensionality Reduction Principle for Generalized Additive Models - Stone - 1986 |

49 |
X.: Probability inequalities for likelihood ratios and convergence rates of sieve MLEs
- Wong, Shen
- 1995
(Show Context)
Citation Context ...ior point in the support of the prior. Let H(ε) be a bound for the Hellinger ε-entropy of the support of � and suppose that f0(x)/f (x) ≤ M(x) for all x, where ∫ M δ f0 < ∞, δ>0. Then by Theorem 5 of =-=[33]-=-, it follows that max{K(f0,f),V(f0,f)} � h 2 (f0,f)× log 2 (1/h(f0,f)).Leta(ε) =−log �(h(f0,f)≤ ε). The posterior convergence rate for density estimation is then εn,givenby max { H(εn), a ( εn/(log ε ... |

49 | Rates of convergence of posterior distributions
- Shen, Wasserman
- 2001
(Show Context)
Citation Context ...py or existence of certain tests) and the concentration rate of the prior around θ0 and computed the rate of convergence for a variety of examples. A similar result was obtained by Shen and Wasserman =-=[27]-=- under stronger conditions. Little is known about the asymptotic behavior of the posterior distribution in infinite-dimensional models when the observations are not i.i.d. For independent, nonidentica... |

43 |
Convergence of estimates under dimensionality restrictions
- Cam
- 1973
(Show Context)
Citation Context ... set, then the result holds even if � is not bounded. Often, such tests exist by virtue of bounds on log affinity, as in the case of normal distributions, or by large deviation type inequalities; see =-=[20]-=- and[14], Section 7. Further, if the prior density is not bounded above, but has a polynomial or subexponential majorant, then the rate calculation also remains valid. 7.6. White noise with conjugate ... |

35 | Model selection via testing: An alternative to (penalized) maximum likelihood estimators
- Birgé
- 2006
(Show Context)
Citation Context ...in Theorem 4, is bounded above by ‖·‖n, but is equivalent to this norm only if the class of regression functions is uniformly bounded, which makes it less attractive. However, it can be verified (cf. =-=[5]-=-) that the likelihood ratio test for f0 versus f1 satisfies the conclusion of Lemma 2 relative to ‖·‖n (instead of dn and θi = fi). Therefore, we may use the norm ‖·‖n instead of the average Hellinger... |

35 | Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities
- Ghosal, Vaart
- 2001
(Show Context)
Citation Context ...e scale parameter lying between two positive numbers and the base measure having compact support, and if the true error density is also a normal mixture of this type, then by Ghosal and van der Vaart =-=[16]-=-, it follows that the convergence rate is (log n)/ √ n. The assumption of compact support of the base measure can be relaxed by using sieves. Compactness of the support of the prior for α and β may be... |

29 | From model selection to adaptive estimation. In: Festschrift for Lucien Le Cam - Birgé, Massart - 1997 |

29 | On the Consistency of Bayes Estimates (with Discussion - Diaconis, Freedman - 1986 |

28 | Application of the theory of martingales - Doob - 1949 |

23 | Convergence rates for density estimation with Bernstein polynomials
- Ghosal
(Show Context)
Citation Context ...fy e−β1k log k � ρ(k) � e−β2k .Let�denote the resulting prior. Clearly, as f0 ∈ K, restricting the prior to K can only increase the prior probability of {f :‖f − f0‖∞ <ε}. Therefore, following Ghosal =-=[12]-=-, �(‖f − f0‖∞ < ε) � e−cε−1 log ε−1 . Hence, εn of the order n−1/3 (log n) 1/3 satisfies (3.4). Consider a sieve Fn for the parameter space K, which consists solely of Bernstein polynomials of order k... |

23 | Posterior Convergence Rates for Dirichlet Mixtures of Beta Densities
- Kruijer, Vaart
(Show Context)
Citation Context ...ion such that u ≥ p and v = u/ ∫ u, then because 2ab ≤ (a2 + b2 ), it easily follows that h2 (p, v) ≤ ( ∫ udµ) −1/2h2 (p, u). For any two probability densities p and q, we have (see, e.g., Lemma 8 of =-=[17]-=-) K(p,q) � h 2 ( ) p (p, q) 1 + log ∥ ∥ , V(p,q)�h q 2 ( ) p 2 (p, q) 1 + log ∥ ∥ . q ∥ ∞ Together with the elementary inequalities 1 + log x ≤ 2 √ x and (1 + log x) 2 ≤ (4x1/4) 2 = 16x1/2 for all x ≥... |

20 |
Large-sample inference for log-spline models
- Stone
- 1990
(Show Context)
Citation Context ... T ∞ B − f0‖∞ � J −α ‖f0‖α. Thus, by increasing J appropriately with the sample size, we may view the space of splines as a sieve for the construction of the maximum likelihood estimator, as in Stone =-=[28, 29]-=-, and for Bayes estimates as in [14, 15] for the problem of density estimation. To put a prior on f , we represent it as fβ(z) = βT B(z) and induce a prior on f from a prior on β. Ghosal, Ghosh and va... |

20 |
Posterior consistency for semiparametric regression problems
- Amewou-Atisso, Ghosal, et al.
- 2003
(Show Context)
Citation Context ...bservations, infinite dimensional model, Markov chains, posterior distribution, rate of convergence, tests. 192POSTERIOR CONVERGENCE RATES 193 dressed by Amewou-Atisso, Ghosal, Ghosh and Ramamoorthi =-=[1]-=- and Choudhuri, Ghosal and Roy [7]. The main purpose of the present paper is to obtain a theorem on rates of convergence of posterior distributions in a general framework not restricted to the setup o... |

20 |
Bayesian density estimation using Bernstein polynomials
- Petrone
- 1999
(Show Context)
Citation Context ...e prior used by Choudhuri, Ghosal and Roy [7], namely f = τq,whereτ = var(Xt) has a nonsingular prior density and q, a probability density on [0, 1], is given the Dirichlet–Bernstein prior of Petrone =-=[24]-=-. We then restrict the prior to the set K ={f : m<f <M}. The order of the Bernstein polynomial, k, has prior mass function ρ, which is assumed to satisfy e−β1k log k � ρ(k) � e−β2k .Let�denote the res... |

17 | Asymptotic properties of nonparametric Bayesian procedures - WASSERMAN - 1998 |

16 | Bayesian estimation of the spectral density of a time series
- Choudhuri, Ghosal, et al.
- 2004
(Show Context)
Citation Context ...model, Markov chains, posterior distribution, rate of convergence, tests. 192POSTERIOR CONVERGENCE RATES 193 dressed by Amewou-Atisso, Ghosal, Ghosh and Ramamoorthi [1] and Choudhuri, Ghosal and Roy =-=[7]-=-. The main purpose of the present paper is to obtain a theorem on rates of convergence of posterior distributions in a general framework not restricted to the setup of i.i.d. observations. We speciali... |

16 |
Asymptotic Normality of Semiparametric and Nonparametric Posterior Distributions
- Shen
- 2002
(Show Context)
Citation Context ...ate of convergence is generally obtained in the classical context. The posterior of the Euclidean part is also expected to converge at an n −1/2 rate and the Bernstein–von Mises theorem may hold; see =-=[26]-=- for some results. However, as we consider (f,α,β) together and obtain global convergence rates, it seems unlikely that our methods will yield these improved convergence rates for the Euclidean portio... |

16 |
Bayesian aspects of some nonparametric problems
- Zhao
- 1998
(Show Context)
Citation Context ...lid. 7.6. White noise with conjugate priors. In this section, we consider the white noise model of Section 5 with a conjugate Gaussian prior. This allows us to complement and rederive results of Zhao =-=[34]-=- and Shen and Wasserman [27] in our framework. Thus, we observe an infinite sequence X1,X2,...of independent random variables, where Xi is normally distributed with mean θi and variance n −1 . We cons... |

15 | Consistency issues in Bayesian nonparametrics - Ghosal, Ghosh, et al. - 1997 |

14 |
On convergence of posterior distributions
- Ghosal, Ghosh, et al.
- 1995
(Show Context)
Citation Context ...i.d. models. Theorem 4 is also applicable to finitedimensional models and yields the usual convergence rate as shown below. The result may be compared with Theorem I.10.2 of [19] and Proposition 1 of =-=[13]-=-. THEOREM 10. Let X1,...,Xn be i.n.i.d. observations following densities pθ,i, where � ⊂ Rd . Let θ0 be an interior point of �. Assume that there existPOSTERIOR CONVERGENCE RATES 211 constants α>0 an... |

13 |
Empirical spectral processes and their applications to time series analysis
- Dahlhaus
- 1988
(Show Context)
Citation Context ...λn,1),...,In(λn,m)) converges weakly to a vector of independent exponential variables with mean vector (f (λ1), . . . , f (λm)); see, for instance, Theorem 10.3.2 of Brockwell and Davis [6]. Dahlhaus =-=[10]-=- applied the technique of Whittle likelihood to estimating the spectral density by the minimum contrast method. A consistent Bayesian nonparametric method has been proposed by Choudhuri, Ghosal and Ro... |

13 | On local and global properties in the theory of asymptotic normality of experiments - CAM - 1975 |

12 | un théorèm de minimax et son application aux tests. Probab - Sur - 1984 |

12 |
On Bayesian adaptation
- GHOSAL, LEMBER, et al.
- 2003
(Show Context)
Citation Context ...reasing J appropriately with the sample size, we may view the space of splines as a sieve for the construction of the maximum likelihood estimator, as in Stone [28, 29], and for Bayes estimates as in =-=[14, 15]-=- for the problem of density estimation. To put a prior on f , we represent it as fβ(z) = βT B(z) and induce a prior on f from a prior on β. Ghosal, Ghosh and van der Vaart [14], in the context of dens... |

10 | Non-informative priors via sieves and packing numbers - Ghosal, Ghosh, et al. - 1997 |

9 | Epsilon-entropy and epsilon-capacity of sets in functional spaces - KOLMOGOROV, TIHOMIROV - 1959 |

9 | On Bayes procedures. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete - Schwartz - 1965 |