Results 1  10
of
196
Informationtheoretic asymptotics of Bayes methods
 IEEE Transactions on Information Theory
, 1990
"... AbstractIn the absence of knowledge of the true density function, Bayesian models take the joint density function for a sequence of n random variables to be an average of densities with respect to a prior. We examine the relative entropy distance D,, between the true density and the Bayesian densit ..."
Abstract

Cited by 107 (10 self)
 Add to MetaCart
AbstractIn the absence of knowledge of the true density function, Bayesian models take the joint density function for a sequence of n random variables to be an average of densities with respect to a prior. We examine the relative entropy distance D,, between the true density and the Bayesian density and show that the asymptotic distance is (d/2Xlogn)+ c, where d is the dimension of the parameter vector. Therefore, the relative entropy rate D,,/n converges to zero at rate (logn)/n. The constant c, which we explicitly identify, depends only on the prior density function and the Fisher information matrix evaluated at the true parameter value. Consequences are given for density estimation, universal data compression, composite hypothesis testing, and stockmarket portfolio selection. 1.
Highdimensional data analysis: The curses and blessings of dimensionality. AideMemoire of a Lecture at
 AMS Conference on Math Challenges of the 21st Century
, 2000
"... The coming century is surely the century of data. A combination of blind faith and serious purpose makes our society invest massively in the collection and processing of data of all kinds, on scales unimaginable until recently. Hyperspectral Imagery, Internet Portals, Financial tickbytick data, an ..."
Abstract

Cited by 89 (0 self)
 Add to MetaCart
The coming century is surely the century of data. A combination of blind faith and serious purpose makes our society invest massively in the collection and processing of data of all kinds, on scales unimaginable until recently. Hyperspectral Imagery, Internet Portals, Financial tickbytick data, and DNA Microarrays are just a few of the betterknown sources, feeding data in torrential streams into scientific and business databases worldwide. In traditional statistical data analysis, we think of observations of instances of particular phenomena (e.g. instance ↔ human being), these observations being a vector of values we measured on several variables (e.g. blood pressure, weight, height,...). In traditional statistical methodology, we assumed many observations and a few, wellchosen variables. The trend today is towards more observations but even more so, to radically larger numbers of variables – voracious, automatic, systematic collection of hyperinformative detail about each observed instance. We are seeing examples where the observations gathered on individual instances are curves, or spectra, or images, or
Convergence of a stochastic approximation version of the EM algorithm
, 1997
"... The Expectation Maximization (EM) algorithm is a powerful computational technique for locating maxima of functions... ..."
Abstract

Cited by 86 (8 self)
 Add to MetaCart
The Expectation Maximization (EM) algorithm is a powerful computational technique for locating maxima of functions...
On choosing and bounding probability metrics
 Internat. Statist. Rev. (2002
"... Abstract. When studying convergence of measures, an important issue is the choice of probability metric. We provide a summary and some new results concerning bounds among some important probability metrics/distances that are used by statisticians and probabilists. Knowledge of other metrics can prov ..."
Abstract

Cited by 84 (2 self)
 Add to MetaCart
Abstract. When studying convergence of measures, an important issue is the choice of probability metric. We provide a summary and some new results concerning bounds among some important probability metrics/distances that are used by statisticians and probabilists. Knowledge of other metrics can provide a means of deriving bounds for another one in an applied problem. Considering other metrics can also provide alternate insights. We also give examples that show that rates of convergence can strongly depend on the metric chosen. Careful consideration is necessary when choosing a metric. Abrégé. Le choix de métrique de probabilité est une décision très importante lorsqu’on étudie la convergence des mesures. Nous vous fournissons avec un sommaire de plusieurs métriques/distances de probabilité couramment utilisées par des statisticiens(nes) at par des probabilistes, ainsi que certains nouveaux résultats qui se rapportent à leurs bornes. Avoir connaissance d’autres métriques peut vous fournir avec un moyen de dériver des bornes pour une autre métrique dans un problème appliqué. Le fait de prendre en considération plusieurs métriques vous permettra d’approcher des problèmes d’une manière différente. Ainsi, nous vous démontrons que les taux de convergence peuvent dépendre de façon importante sur votre choix de métrique. Il est donc important de tout considérer lorsqu’on doit choisir une métrique. 1.
An MCMC approach to classical estimation
, 2003
"... This paper studies computationally and theoretically attractive estimators called here Laplace type estimators (LTEs), which include means and quantiles of quasiposterior distributions dened as transformations of general (nonlikelihoodbased) statistical criterion functions, such as those in GMM, n ..."
Abstract

Cited by 56 (8 self)
 Add to MetaCart
This paper studies computationally and theoretically attractive estimators called here Laplace type estimators (LTEs), which include means and quantiles of quasiposterior distributions dened as transformations of general (nonlikelihoodbased) statistical criterion functions, such as those in GMM, nonlinear IV, empirical likelihood, and minimum distance methods. The approach generates an alternative to classical extremum estimation and also falls outside the parametric Bayesian approach. For example, it o ers a new attractive estimation method for such important semiparametric problems as censored and instrumental quantile regression, nonlinear GMM and valueatrisk models. The LTEs are computed using Markov Chain Monte Carlo methods, which help circumvent the computational curse of dimensionality. Alarge sample theory is obtained and illustrated for regular cases.
Informationtheoretic limits on sparsity recovery in the highdimensional and noisy setting
, 2007
"... Abstract—The problem of sparsity pattern or support set recovery refers to estimating the set of nonzero coefficients of an un3 p known vector 2 based on a set of n noisy observations. It arises in a variety of settings, including subset selection in regression, graphical model selection, signal de ..."
Abstract

Cited by 51 (2 self)
 Add to MetaCart
Abstract—The problem of sparsity pattern or support set recovery refers to estimating the set of nonzero coefficients of an un3 p known vector 2 based on a set of n noisy observations. It arises in a variety of settings, including subset selection in regression, graphical model selection, signal denoising, compressive sensing, and constructive approximation. The sample complexity of a given method for subset recovery refers to the scaling of the required sample size n as a function of the signal dimension p, sparsity index k (number of nonzeroes in 3), as well as the minimum value min of 3 over its support and other parameters of measurement matrix. This paper studies the informationtheoretic limits of sparsity recovery: in particular, for a noisy linear observation model based on random measurement matrices drawn from general Gaussian measurement matrices, we derive both a set of sufficient conditions for exact support recovery using an exhaustive search decoder, as well as a set of necessary conditions that any decoder, regardless of its computational complexity, must satisfy for exact support recovery. This analysis of fundamental limits complements our previous work on sharp thresholds for support set recovery over the same set of random measurement ensembles using the polynomialtime Lasso method (`1constrained quadratic programming). Index Terms—Compressed sensing, `1relaxation, Fano’s method, highdimensional statistical inference, informationtheoretic
AlphaDivergence for Classification, Indexing and Retrieval
 UNIVERSITY OF MICHIGAN
, 2001
"... Motivated by Chernoff's bound on asymptotic probability of error we propose the alphadivergence measure and a surrogate, the alphaJensen difference, for feature classification, indexing and retrieval in image and other databases. The alpha ..."
Abstract

Cited by 45 (4 self)
 Add to MetaCart
Motivated by Chernoff's bound on asymptotic probability of error we propose the alphadivergence measure and a surrogate, the alphaJensen difference, for feature classification, indexing and retrieval in image and other databases. The alpha
Convergence rates of posterior distributions
 Ann. Statist
, 2000
"... We consider the asymptotic behavior of posterior distributions and Bayes estimators for infinitedimensional statistical models. We give general results on the rate of convergence of the posterior measure. These are applied to several examples, including priors on finite sieves, logspline models, D ..."
Abstract

Cited by 43 (11 self)
 Add to MetaCart
We consider the asymptotic behavior of posterior distributions and Bayes estimators for infinitedimensional statistical models. We give general results on the rate of convergence of the posterior measure. These are applied to several examples, including priors on finite sieves, logspline models, Dirichlet processes and interval censoring. 1. Introduction. Suppose
Minimax rates of estimation for highdimensional linear regression over balls
, 2009
"... Abstract—Consider the highdimensional linear regression model,where is an observation vector, is a design matrix with, is an unknown regression vector, and is additive Gaussian noise. This paper studies the minimax rates of convergence for estimating in eitherloss andprediction loss, assuming tha ..."
Abstract

Cited by 43 (15 self)
 Add to MetaCart
Abstract—Consider the highdimensional linear regression model,where is an observation vector, is a design matrix with, is an unknown regression vector, and is additive Gaussian noise. This paper studies the minimax rates of convergence for estimating in eitherloss andprediction loss, assuming that belongs to anball for some.Itisshown that under suitable regularity conditions on the design matrix, the minimax optimal rate inloss andprediction loss scales as. The analysis in this paper reveals that conditions on the design matrix enter into the rates forerror andprediction error in complementary ways in the upper and lower bounds. Our proofs of the lower bounds are information theoretic in nature, based on Fano’s inequality and results on the metric entropy of the balls, whereas our proofs of the upper bounds are constructive, involving direct analysis of least squares overballs. For the special case, corresponding to models with an exact sparsity constraint, our results show that although computationally efficientbased methods can achieve the minimax rates up to constant factors, they require slightly stronger assumptions on the design matrix than optimal algorithms involving leastsquares over theball. Index Terms—Compressed sensing, minimax techniques, regression analysis. I.
Sharp Adaptation for Inverse Problems With Random Noise
, 2000
"... We consider a heteroscedastic sequence space setup with polynomially increasing variances of observations that allows to treat a number of inverse problems, in particular multivariate ones. We propose an adaptive estimator that attains simultaneously exact asymptotic minimax constants on every ellip ..."
Abstract

Cited by 41 (6 self)
 Add to MetaCart
We consider a heteroscedastic sequence space setup with polynomially increasing variances of observations that allows to treat a number of inverse problems, in particular multivariate ones. We propose an adaptive estimator that attains simultaneously exact asymptotic minimax constants on every ellipsoid of functions within a wide scale (that includes ellipoids with polynomially and exponentially decreasing axes) and, at the same time, satisfies asymptotically exact oracle inequalities within any class of linear estimates having monotone nondecreasing weights. As application, we construct sharp adaptive estimators in the problems of deconvolution and tomography.