Results 1  10
of
15
SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2007
"... We exhibit an approximate equivalence between the Lasso estimator and Dantzig selector. For both methods we derive parallel oracle inequalities for the prediction risk in the general nonparametric regression model, as well as bounds on the ℓp estimation loss for 1 ≤ p ≤ 2 in the linear model when th ..."
Abstract

Cited by 189 (6 self)
 Add to MetaCart
We exhibit an approximate equivalence between the Lasso estimator and Dantzig selector. For both methods we derive parallel oracle inequalities for the prediction risk in the general nonparametric regression model, as well as bounds on the ℓp estimation loss for 1 ≤ p ≤ 2 in the linear model when the number of variables can be much larger than the sample size.
Sparsity oracle inequalities for the lasso
 Electronic Journal of Statistics
"... Abstract: This paper studies oracle properties of ℓ1penalized least squares in nonparametric regression setting with random design. We show that the penalized least squares estimator satisfies sparsity oracle inequalities, i.e., bounds in terms of the number of nonzero components of the oracle vec ..."
Abstract

Cited by 83 (12 self)
 Add to MetaCart
Abstract: This paper studies oracle properties of ℓ1penalized least squares in nonparametric regression setting with random design. We show that the penalized least squares estimator satisfies sparsity oracle inequalities, i.e., bounds in terms of the number of nonzero components of the oracle vector. The results are valid even when the dimension of the model is (much) larger than the sample size and the regression matrix is not positive definite. They can be applied to highdimensional linear regression, to nonparametric adaptive regression estimation and to the problem of aggregation of arbitrary estimators.
Learning by mirror averaging
 The Annals of Statistics
"... Given a finite collection of estimators or classifiers, we study the problem of model selection type aggregation, that is, we construct a new estimator or classifier, called aggregate, which is nearly as good as the best among them with respect to a given risk criterion. We define our aggregate by a ..."
Abstract

Cited by 33 (3 self)
 Add to MetaCart
Given a finite collection of estimators or classifiers, we study the problem of model selection type aggregation, that is, we construct a new estimator or classifier, called aggregate, which is nearly as good as the best among them with respect to a given risk criterion. We define our aggregate by a simple recursive procedure which solves an auxiliary stochastic linear programming problem related to the original nonlinear one and constitutes a special case of the mirror averaging algorithm. We show that the aggregate satisfies sharp oracle inequalities under some general assumptions. The results are applied to several problems including regression, classification and density estimation. 1. Introduction. Several
Linear and convex aggregation of density estimators
, 2004
"... We study the problem of learning the best linear and convex combination of M estimators of a density with respect to the mean squared risk. We suggest aggregation procedures and we prove sharp oracle inequalities for their risks, i.e., oracle inequalities with leading constant 1. We also obtain lowe ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
We study the problem of learning the best linear and convex combination of M estimators of a density with respect to the mean squared risk. We suggest aggregation procedures and we prove sharp oracle inequalities for their risks, i.e., oracle inequalities with leading constant 1. We also obtain lower bounds showing that these procedures attain optimal rates of aggregation. As an example, we consider aggregation of multivariate kernel density estimators with different bandwidths. We show that linear and convex aggregates mimic the kernel oracles in asymptotically exact sense. We prove that, for Pinsker’s kernel, the proposed aggregates are sharp asymptotically minimax simultaneously over a large scale of Sobolev classes of densities. Finally, we provide simulations demonstrating performance of the convex aggregation procedure.
Convergence rates for pointwise curve estimation with a degenerate design
 Mathematical Methods of Statistics
"... The nonparametric regression with a random design model is considered. We want to recover the regression function at a point x0 where the design density is vanishing or exploding. Depending on assumptions on local regularity of the regression function and on the local behaviour of the design, we fin ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
The nonparametric regression with a random design model is considered. We want to recover the regression function at a point x0 where the design density is vanishing or exploding. Depending on assumptions on local regularity of the regression function and on the local behaviour of the design, we find several minimax rates. These rates lie in a wide range, from slow ℓ(n) rates, where ℓ is slowly varying (for instance (log n) −1), to fast n −1/2 ℓ(n) rates. If the continuity modulus of the regression function at x0 can be bounded from above by an sregularly varying function, and if the design density is βregularly varying, we prove that the minimax convergence rate at x0 is n −s/(1+2s+β) ℓ(n).
Optimal rates of aggregation in classification under low noise assumption
, 2007
"... In the same spirit as Tsybakov, we define the optimality of an aggregation procedure in the problem of classification. Using an aggregate with exponential weights, we obtain an optimal rate of convex aggregation for the hinge risk under the margin assumption. Moreover, we obtain an optimal rate of m ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
In the same spirit as Tsybakov, we define the optimality of an aggregation procedure in the problem of classification. Using an aggregate with exponential weights, we obtain an optimal rate of convex aggregation for the hinge risk under the margin assumption. Moreover, we obtain an optimal rate of model selection aggregation under the margin assumption for the excess Bayes risk.
Aggregation of SVM Classifiers Using Sobolev Spaces
"... This paper investigates statistical performances of Support Vector Machines (SVM) and considers the problem of adaptation to the margin parameter and to complexity. In particular we provide a classifier with no tuning parameter. It is a combination of SVM classifiers. Our contribution is twofold: ( ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
This paper investigates statistical performances of Support Vector Machines (SVM) and considers the problem of adaptation to the margin parameter and to complexity. In particular we provide a classifier with no tuning parameter. It is a combination of SVM classifiers. Our contribution is twofold: (1) we propose learning rates for SVM using Sobolev spaces and build a numerically realizable aggregate that converges with same rate; (2) we present practical experiments of this method of aggregation for SVM using both Sobolev spaces and Gaussian kernels.
characterization of functional images
, 2006
"... In this paper we use an approach of spatial multiscales for an improved characterization of functional pixel intensities of images. Examples are numerous such as temporal dependence of brain response intensities measured by fMRI or frequency dependence of NMR spectra measured at each pixel. The over ..."
Abstract
 Add to MetaCart
In this paper we use an approach of spatial multiscales for an improved characterization of functional pixel intensities of images. Examples are numerous such as temporal dependence of brain response intensities measured by fMRI or frequency dependence of NMR spectra measured at each pixel. The overall goal is to improve the misclassification rate in clustering (unsupervised learning) of the functional image content into a finite but unknown number of classes. Hereby we adopt a nonparametric point of view to reduce the functional dimensionality of the observed pixel intensities, modelled to be of a very general functional form, by a combination of “aggregation” and truncation techniques. Clustering is applied via an EMalgorithm for estimating a Gaussian mixture model in the domain of the discrete wavelet transform of the pixel intensity curves. We show improvements of our multiscale method, based on complexitypenalised likelihood estimation for Recursive Dyadic Partitioning of the image, over existing monoscale approaches, by simulated and real data examples, and we give some theoretical treatment of the resulting
COBRA: A Nonlinear Aggregation Strategy
, 2013
"... A new method for combining several initial estimators of the regression function is introduced. Instead of building a linear or convex optimized combination over a collection of basic estimators r1,..., rM, we use them as a collective indicator of the distance between the training data and a test ob ..."
Abstract
 Add to MetaCart
A new method for combining several initial estimators of the regression function is introduced. Instead of building a linear or convex optimized combination over a collection of basic estimators r1,..., rM, we use them as a collective indicator of the distance between the training data and a test observation. This local distance approach is modelfree and extremely fast. Most importantly, the resulting collective estimator is shown to perform asymptotically at least as well in the L2 sense as the best basic estimator in the collective. Moreover, it does so without having to declare which might be the best basic estimator for the given data set. A companion R package called COBRA (standing for COmBined Regression Alternative) is presented (downloadable on