Results 1  10
of
110
Informationtheoretic limits on sparsity recovery in the highdimensional and noisy setting
, 2007
"... Abstract—The problem of sparsity pattern or support set recovery refers to estimating the set of nonzero coefficients of an un3 p known vector 2 based on a set of n noisy observations. It arises in a variety of settings, including subset selection in regression, graphical model selection, signal de ..."
Abstract

Cited by 77 (2 self)
 Add to MetaCart
(Show Context)
Abstract—The problem of sparsity pattern or support set recovery refers to estimating the set of nonzero coefficients of an un3 p known vector 2 based on a set of n noisy observations. It arises in a variety of settings, including subset selection in regression, graphical model selection, signal denoising, compressive sensing, and constructive approximation. The sample complexity of a given method for subset recovery refers to the scaling of the required sample size n as a function of the signal dimension p, sparsity index k (number of nonzeroes in 3), as well as the minimum value min of 3 over its support and other parameters of measurement matrix. This paper studies the informationtheoretic limits of sparsity recovery: in particular, for a noisy linear observation model based on random measurement matrices drawn from general Gaussian measurement matrices, we derive both a set of sufficient conditions for exact support recovery using an exhaustive search decoder, as well as a set of necessary conditions that any decoder, regardless of its computational complexity, must satisfy for exact support recovery. This analysis of fundamental limits complements our previous work on sharp thresholds for support set recovery over the same set of random measurement ensembles using the polynomialtime Lasso method (`1constrained quadratic programming). Index Terms—Compressed sensing, `1relaxation, Fano’s method, highdimensional statistical inference, informationtheoretic
Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization
 In Advances in Neural Information Processing Systems (NIPS
, 2007
"... by convex risk minimization ..."
(Show Context)
Mixing Strategies for Density Estimation
 Ann. Statist
"... General results on adaptive density estimation are obtained with respect to any countable collection of estimation strategies under KullbackLeibler and square L 2 losses. It is shown that without knowing which strategy works best for the underlying density, a single strategy can be constructed by m ..."
Abstract

Cited by 51 (9 self)
 Add to MetaCart
(Show Context)
General results on adaptive density estimation are obtained with respect to any countable collection of estimation strategies under KullbackLeibler and square L 2 losses. It is shown that without knowing which strategy works best for the underlying density, a single strategy can be constructed by mixing the proposed ones to be adaptive in terms of statistical risks. A consequence is that under some mild conditions, an asymptotically minimaxrate adaptive estimator exists for a given countable collection of density classes, i.e., a single estimator can be constructed to be simultaneously minimaxrate optimal for all the function classes being considered. A demonstration is given for highdimensional density estimation on [0; 1] d where the constructed estimator adapts to smoothness and interactionorder over some piecewise Besov classes, and is consistent for all the densities with finite entropy. 1. Introduction. In Recent years, there has been an increasing interest in adaptive fu...
Adaptive Regression by Mixing
 Journal of American Statistical Association
"... Adaptation over different procedures is of practical importance. Different procedures perform well under different conditions. In many practical situations, it is rather hard to assess which conditions are (approximately) satisfied so as to identify the best procedure for the data at hand. Thus auto ..."
Abstract

Cited by 51 (9 self)
 Add to MetaCart
Adaptation over different procedures is of practical importance. Different procedures perform well under different conditions. In many practical situations, it is rather hard to assess which conditions are (approximately) satisfied so as to identify the best procedure for the data at hand. Thus automatic adaptation over various scenarios is desirable. A practically feasible method, named Adaptive Regression by Mixing (ARM) is proposed to convexly combine general candidate regression procedures. Under mild conditions, the resulting estimator is theoretically shown to perform optimally in rates of convergence without knowing which of the original procedures work the best. Simulations are conducted in several settings, including comparing a parametric model with nonparametric alternatives, comparing a neural network with a projection pursuit in multidimensional regression, and combining bandwidths in kernel regression. The results clearly support the theoretical property of ARM. The ARM ...
Informationtheoretic lower bounds on the oracle complexity of convex optimization
"... ..."
(Show Context)
Minimax rates of estimation for highdimensional linear regression over balls
, 2009
"... Abstract—Consider the highdimensional linear regression model,where is an observation vector, is a design matrix with, is an unknown regression vector, and is additive Gaussian noise. This paper studies the minimax rates of convergence for estimating in eitherloss andprediction loss, assuming tha ..."
Abstract

Cited by 49 (13 self)
 Add to MetaCart
(Show Context)
Abstract—Consider the highdimensional linear regression model,where is an observation vector, is a design matrix with, is an unknown regression vector, and is additive Gaussian noise. This paper studies the minimax rates of convergence for estimating in eitherloss andprediction loss, assuming that belongs to anball for some.Itisshown that under suitable regularity conditions on the design matrix, the minimax optimal rate inloss andprediction loss scales as. The analysis in this paper reveals that conditions on the design matrix enter into the rates forerror andprediction error in complementary ways in the upper and lower bounds. Our proofs of the lower bounds are information theoretic in nature, based on Fano’s inequality and results on the metric entropy of the balls, whereas our proofs of the upper bounds are constructive, involving direct analysis of least squares overballs. For the special case, corresponding to models with an exact sparsity constraint, our results show that although computationally efficientbased methods can achieve the minimax rates up to constant factors, they require slightly stronger assumptions on the design matrix than optimal algorithms involving leastsquares over theball. Index Terms—Compressed sensing, minimax techniques, regression analysis. I.
Model selection via testing: an alternative to (penalized) maximum likelihood estimators
, 2003
"... This paper is devoted to the description and study of a family of estimators, that we shall call T estimators (T for tests), for minimax estimation and model selection. Their construction is based on former ideas about deriving estimators from some families of tests due to Le Cam (1973 and 1975) ..."
Abstract

Cited by 45 (7 self)
 Add to MetaCart
This paper is devoted to the description and study of a family of estimators, that we shall call T estimators (T for tests), for minimax estimation and model selection. Their construction is based on former ideas about deriving estimators from some families of tests due to Le Cam (1973 and 1975) and Birge (1983, 1984a and b) and about complexity based model selection from Barron and Cover (1991). It is
Risk bounds for Statistical Learning
"... We propose a general theorem providing upper bounds for the risk of an empirical risk minimizer (ERM).We essentially focus on the binary classi…cation framework. We extend Tsybakov’s analysis of the risk of an ERM under margin type conditions by using concentration inequalities for conveniently weig ..."
Abstract

Cited by 44 (2 self)
 Add to MetaCart
(Show Context)
We propose a general theorem providing upper bounds for the risk of an empirical risk minimizer (ERM).We essentially focus on the binary classi…cation framework. We extend Tsybakov’s analysis of the risk of an ERM under margin type conditions by using concentration inequalities for conveniently weighted empirical processes. This allows us to deal with other ways of measuring the ”size”of a class of classi…ers than entropy with bracketing as in Tsybakov’s work. In particular we derive new risk bounds for the ERM when the classi…cation rules belong to some VCclass under margin conditions and discuss the optimality of those bounds in a minimax sense.
High dimensional analysis of semidefinite relaxations for sparse principal component analysis
, 2008
"... Principal component analysis (PCA) is a classical method for dimensionality reduction based on extracting the dominant eigenvectors of the sample covariance matrix. However, PCA is well known to behave poorly in the “large p, small n ” setting, in which the problem dimension p is comparable to or la ..."
Abstract

Cited by 43 (2 self)
 Add to MetaCart
(Show Context)
Principal component analysis (PCA) is a classical method for dimensionality reduction based on extracting the dominant eigenvectors of the sample covariance matrix. However, PCA is well known to behave poorly in the “large p, small n ” setting, in which the problem dimension p is comparable to or larger than the sample size n. This paper studies PCA in this highdimensional regime, but under the additional assumption that the maximal eigenvector is sparse, say with at most k nonzero components. We analyze two computationally tractable methods for recovering the support of this maximal eigenvector: (a) a simple diagonal cutoff method, which transitions from success to failure as a function of the order parameter θdia(n, p, k) = n/[k 2 log(p − k)]; and (b) a more sophisticated semidefinite programming (SDP) relaxation, which succeeds once the order parameter θsdp(n, p, k) = n/[k log(p − k)] is larger than a critical threshold. Our results thus highlight an interesting tradeoff between computational and statistical efficiency in highdimensional inference.