Results 1 - 10
of
21
Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities
- Ann. Statist
, 2001
"... We study the rates of convergence of the maximum likelihood estimator (MLE) and posterior distribution in density estimation problems, where the densities are location or location-scale mixtures of normal distributions with the scale parameter lying between two positive numbers. The true density is ..."
Abstract
-
Cited by 23 (9 self)
- Add to MetaCart
We study the rates of convergence of the maximum likelihood estimator (MLE) and posterior distribution in density estimation problems, where the densities are location or location-scale mixtures of normal distributions with the scale parameter lying between two positive numbers. The true density is also assumed to lie in this class with the true mixing distribution either compactly supported or having sub-Gaussian tails. We obtain bounds for Hellinger bracketing entropies for this class, and from these bounds, we deduce the convergence rates of (sieve) MLEs in Hellinger distance. The rate turns out to be �log n � κ / √ n, where κ ≥ 1 is a constant that depends on the type of mixtures and the choice of the sieve. Next, we consider a Dirichlet mixture of normals as a prior on the unknown density. We estimate the prior probability of a certain Kullback-Leibler type neighborhood and then invoke a general theorem that computes the posterior convergence rate in terms the growth rate of the Hellinger entropy and the concentration rate of the prior. The posterior distribution is also seen to converge at the rate �log n � κ / √ n in, where κ now depends on the tail behavior of the base measure of the Dirichlet process. 1. Introduction. A
Posterior convergence rates of Dirichlet mixtures at smooth densities
- Ann. Statist
, 2007
"... We study the rates of convergence of the posterior distribution for Bayesian density estimation with Dirichlet mixtures of normal distributions as the prior. The true density is assumed to be twice continuously differentiable. The bandwidth is given a sequence of priors which is obtained by scaling ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
We study the rates of convergence of the posterior distribution for Bayesian density estimation with Dirichlet mixtures of normal distributions as the prior. The true density is assumed to be twice continuously differentiable. The bandwidth is given a sequence of priors which is obtained by scaling a single prior by an appropriate order. In order to handle this problem, we derive a new general rate theorem by considering a countable covering of the parameter space whose prior probabilities satisfy a summability condition together with certain individual bounds on the Hellinger metric entropy. We apply this new general theorem on posterior convergence rates by computing bounds for Hellinger (bracketing) entropy numbers for the involved class of densities, the error in the approximation of a smooth density by normal mixtures and the concentration rate of the prior. The best obtainable rate of convergence of the posterior turns out to be equivalent to the well-known frequentist rate for integrated mean squared error n −2/5 up to a logarithmic factor. 1. Introduction. Kernel
Posterior consistency of Gaussian process prior for nonparametric binary regression
"... Consider binary observations whose response probability is an unknown smooth function of a set of covariates. Suppose that a prior on the response probability function is induced by a Gaussian process mapped to the unit interval through a link function. In this paper we study consistency of the resu ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Consider binary observations whose response probability is an unknown smooth function of a set of covariates. Suppose that a prior on the response probability function is induced by a Gaussian process mapped to the unit interval through a link function. In this paper we study consistency of the resulting posterior distribution. If the covariance kernel has derivatives up to a desired order and the bandwidth parameter of the kernel is allowed to take arbitrarily small values, we show that the posterior distribution is consistent in the L1-distance. As an auxiliary result to our proofs, we show that, under certain conditions, a Gaussian process assigns positive probabilities to the uniform neighborhoods of a continuous function. This result may be of independent interest in the literature for small ball probabilities of Gaussian processes. 1. Introduction. Consider
Convergence rates of posterior distributions for noniid observations
- Ann. Statist
, 2007
"... We consider the asymptotic behavior of posterior distributions and Bayes estimators based on observations which are required to be neither independent nor identically distributed. We give general results on the rate of convergence of the posterior measure relative to distances derived from a testing ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We consider the asymptotic behavior of posterior distributions and Bayes estimators based on observations which are required to be neither independent nor identically distributed. We give general results on the rate of convergence of the posterior measure relative to distances derived from a testing criterion. We then specialize our results to independent, nonidentically distributed observations, Markov processes, stationary Gaussian time series and the white noise model. We apply our general results to several examples of infinite-dimensional statistical models including nonparametric regression with normal errors, binary regression, Poisson regression, an interval censoring model, Whittle estimation of the spectral density of a time series and a nonlinear autoregressive model.: θ ∈ Θ) be a sequence of statistical experiments with observations X (n), where the parameter set Θ is arbitrary and n is an indexing parameter, usually the sample size. We put a prior distribution Πn on θ ∈ Θ and study the rate of convergence of the posterior
Misspecification in infinite-dimensional Bayesian statistics
- Annals of Statistics
, 2006
"... We consider the asymptotic behavior of posterior distributions if the model is misspecified. Given a prior distribution and a random sample from a distribution P0, which may not be in the support of the prior, we show that the posterior concentrates its mass near the points in the support of the pri ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We consider the asymptotic behavior of posterior distributions if the model is misspecified. Given a prior distribution and a random sample from a distribution P0, which may not be in the support of the prior, we show that the posterior concentrates its mass near the points in the support of the prior that minimize the Kullback–Leibler divergence with respect to P0. An entropy condition and a prior-mass condition determine the rate of convergence. The method is applied to several examples, with special interest for infinite-dimensional models. These include Gaussian mixtures, nonparametric regression and parametric models.
Zanten, Bayesian inference with rescaled Gaussian process priors, Electron
- Mathematics Institute University of Warwick Coventry
"... Abstract: We use rescaled Gaussian processes as prior models for functional parameters in nonparametric statistical models. We show how the rate of contraction of the posterior distributions depends on the scaling factor. In particular, we exhibit rescaled Gaussian process priors yielding posteriors ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Abstract: We use rescaled Gaussian processes as prior models for functional parameters in nonparametric statistical models. We show how the rate of contraction of the posterior distributions depends on the scaling factor. In particular, we exhibit rescaled Gaussian process priors yielding posteriors that contract around the true parameter at optimal convergence rates. To derive our results we establish bounds on small deviation probabilities for smooth stationary Gaussian processes.
Large sample asymptotics for the two-parameter Poisson-Dirichlet process
- Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh
"... Abstract: This paper explores large sample properties of the two parameter (α, θ) Poisson-Dirichlet Process in two contexts. In a Bayesian context of estimating an unknown probability measure, viewing this process as a natural extension of the Dirichlet process, we explore the consistency and weak c ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract: This paper explores large sample properties of the two parameter (α, θ) Poisson-Dirichlet Process in two contexts. In a Bayesian context of estimating an unknown probability measure, viewing this process as a natural extension of the Dirichlet process, we explore the consistency and weak convergence of the the two parameter Poisson Dirichlet posterior process. We also establish the weak convergence of properly centered two parameter Poisson Dirichlet processes for large θ+nα. This latter result complements large θ results for the Dirichlet process and Poisson Dirichlet sequences, and complements a recent result on large deviation principles for the two parameter Poisson Dirichlet process. A crucial component of our results is the use of distributional identities that may be useful in other contexts.
Convergence rates for Bayesian density estimation of infinite-dimensional exponential families
- Ann. Statist
, 2006
"... We study the rate of convergence of posterior distributions in density estimation problems for log-densities in periodic Sobolev classes characterized by a smoothness parameter p. The posterior expected density provides a nonparametric estimation procedure attaining the optimal minimax rate of conve ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We study the rate of convergence of posterior distributions in density estimation problems for log-densities in periodic Sobolev classes characterized by a smoothness parameter p. The posterior expected density provides a nonparametric estimation procedure attaining the optimal minimax rate of convergence under Hellinger loss if the posterior distribution achieves the optimal rate over certain uniformity classes. A prior on the density class of interest is induced by a prior on the coefficients of the trigonometric series expansion of the logdensity. We show that when p is known, the posterior distribution of a Gaussian prior achieves the optimal rate provided the prior variances die off sufficiently rapidly. For a mixture of normal distributions, the mixing weights on the dimension of the exponential family are assumed to be bounded below by an exponentially decreasing sequence. To avoid the use of infinite bases, we develop priors that cut off the series at a sample-size-dependent truncation point. When the degree of smoothness is unknown, a finite mixture of normal priors indexed by the smoothness parameter, which is also assigned a prior, produces the best rate. A rate-adaptive estimator is derived.
Dynamics of Bayesian updating with dependent data and misspecified models
, 2009
"... Abstract: Much is now known about the consistency of Bayesian updating on infinite-dimensional parameter spaces with independent or Markovian data. Necessary conditions for consistency include the prior putting enough weight on the correct neighborhoods of the data-generating distribution; various s ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract: Much is now known about the consistency of Bayesian updating on infinite-dimensional parameter spaces with independent or Markovian data. Necessary conditions for consistency include the prior putting enough weight on the correct neighborhoods of the data-generating distribution; various sufficient conditions further restrict the prior in ways analogous to capacity control in frequentist nonparametrics. The asymptotics of Bayesian updating with mis-specified models or priors, or non-Markovian data, are far less well explored. Here I establish sufficient conditions for posterior convergence when all hypotheses are wrong, and the data have complex dependencies. The main dynamical assumption is the asymptotic equipartition (Shannon-McMillan-Breiman) property of information theory. This, along with Egorov’s Theorem on uniform convergence, lets me build a sieve-like structure for the prior. The main statistical assumption, also a form of capacity control, concerns the compatibility of the prior and the data-generating process, controlling the fluctuations in the loglikelihood when averaged over the sieve-like sets. In addition to posterior convergence, I derive a kind of large deviations principle for the posterior measure, extending in some cases to rates of convergence, and discuss the advantages of predicting using a combination of models known to be wrong. An appendix sketches connections between these results and the replicator dynamics of evolutionary theory.
On the consistency of Bayesian variable selection for high dimensional binary regression and classification
- Neural Comput
, 2006
"... Bayesian variable selection has gained much empirical success recently in a variety of applications when the number K of explanatory variables (x1,...,xK) is possibly much larger than the sample size n. For generalized linear models, if most of the xj’s have very small effects on the response y, we ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Bayesian variable selection has gained much empirical success recently in a variety of applications when the number K of explanatory variables (x1,...,xK) is possibly much larger than the sample size n. For generalized linear models, if most of the xj’s have very small effects on the response y, we show that it is possible to use Bayesian variable selection to reduce overfitting caused by the curse of dimensionality K ≫ n. In this approach a suitable prior can be used to choose a few out of the many xj’s to model y, so that the posterior will propose probability densities p that are “often close ” to the true density p ∗ in some sense. The closeness can be described by a Hellinger distance between p and p ∗ that scales at a power very close to n −1/2, which is the “finite-dimensional rate ” corresponding to a low-dimensional situation. These findings extend some recent work of Jiang [Technical Report 05-02 (2005) Dept. Statistics, Northwestern Univ.] on consistency of Bayesian variable selection for binary classification.

