Results 1  10
of
42
SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2007
"... We exhibit an approximate equivalence between the Lasso estimator and Dantzig selector. For both methods we derive parallel oracle inequalities for the prediction risk in the general nonparametric regression model, as well as bounds on the ℓp estimation loss for 1 ≤ p ≤ 2 in the linear model when th ..."
Abstract

Cited by 465 (8 self)
 Add to MetaCart
We exhibit an approximate equivalence between the Lasso estimator and Dantzig selector. For both methods we derive parallel oracle inequalities for the prediction risk in the general nonparametric regression model, as well as bounds on the ℓp estimation loss for 1 ≤ p ≤ 2 in the linear model when the number of variables can be much larger than the sample size.
Sparsity oracle inequalities for the lasso
 Electronic Journal of Statistics
"... Abstract: This paper studies oracle properties of ℓ1penalized least squares in nonparametric regression setting with random design. We show that the penalized least squares estimator satisfies sparsity oracle inequalities, i.e., bounds in terms of the number of nonzero components of the oracle vec ..."
Abstract

Cited by 176 (12 self)
 Add to MetaCart
(Show Context)
Abstract: This paper studies oracle properties of ℓ1penalized least squares in nonparametric regression setting with random design. We show that the penalized least squares estimator satisfies sparsity oracle inequalities, i.e., bounds in terms of the number of nonzero components of the oracle vector. The results are valid even when the dimension of the model is (much) larger than the sample size and the regression matrix is not positive definite. They can be applied to highdimensional linear regression, to nonparametric adaptive regression estimation and to the problem of aggregation of arbitrary estimators.
Adaptive Regression by Mixing
 Journal of American Statistical Association
"... Adaptation over different procedures is of practical importance. Different procedures perform well under different conditions. In many practical situations, it is rather hard to assess which conditions are (approximately) satisfied so as to identify the best procedure for the data at hand. Thus auto ..."
Abstract

Cited by 67 (11 self)
 Add to MetaCart
Adaptation over different procedures is of practical importance. Different procedures perform well under different conditions. In many practical situations, it is rather hard to assess which conditions are (approximately) satisfied so as to identify the best procedure for the data at hand. Thus automatic adaptation over various scenarios is desirable. A practically feasible method, named Adaptive Regression by Mixing (ARM) is proposed to convexly combine general candidate regression procedures. Under mild conditions, the resulting estimator is theoretically shown to perform optimally in rates of convergence without knowing which of the original procedures work the best. Simulations are conducted in several settings, including comparing a parametric model with nonparametric alternatives, comparing a neural network with a projection pursuit in multidimensional regression, and combining bandwidths in kernel regression. The results clearly support the theoretical property of ARM. The ARM ...
Mixing Strategies for Density Estimation
 Ann. Statist
"... General results on adaptive density estimation are obtained with respect to any countable collection of estimation strategies under KullbackLeibler and square L 2 losses. It is shown that without knowing which strategy works best for the underlying density, a single strategy can be constructed by m ..."
Abstract

Cited by 59 (9 self)
 Add to MetaCart
General results on adaptive density estimation are obtained with respect to any countable collection of estimation strategies under KullbackLeibler and square L 2 losses. It is shown that without knowing which strategy works best for the underlying density, a single strategy can be constructed by mixing the proposed ones to be adaptive in terms of statistical risks. A consequence is that under some mild conditions, an asymptotically minimaxrate adaptive estimator exists for a given countable collection of density classes, i.e., a single estimator can be constructed to be simultaneously minimaxrate optimal for all the function classes being considered. A demonstration is given for highdimensional density estimation on [0; 1] d where the constructed estimator adapts to smoothness and interactionorder over some piecewise Besov classes, and is consistent for all the densities with finite entropy. 1. Introduction. In Recent years, there has been an increasing interest in adaptive fu...
Combining Different Procedures for Adaptive Regression
 Journal of Multivariate Analysis
, 1998
"... Given any countable collection of regression procedures (e.g., kernel, spline, wavelet, local polynomial, neural nets, etc), we show that a single adaptive procedure can be constructed to share the advantages of them to a great extent in terms of global squared L 2 risk. The combined procedure basic ..."
Abstract

Cited by 58 (10 self)
 Add to MetaCart
Given any countable collection of regression procedures (e.g., kernel, spline, wavelet, local polynomial, neural nets, etc), we show that a single adaptive procedure can be constructed to share the advantages of them to a great extent in terms of global squared L 2 risk. The combined procedure basically pays a price only of order 1=n for adaptation over the collection. An interesting consequence is that for a countable collection of classes of regression functions (possibly of completely different characteristics), a minimaxrate adaptive estimator can be constructed such that it automatically converges at the right rate for each of the classes being considered.
Regression with Multiple Candidate Models: Selecting or Mixing?
 STATISTICA SINICA
, 1999
"... Model averaging provides an alternative to model selection. An algorithm ARM rooted in information theory is proposed to combine different regression models/methods. A simulation is conducted in the context of linear regression to compare its performance with familiar model selection criteria AIC ..."
Abstract

Cited by 37 (9 self)
 Add to MetaCart
Model averaging provides an alternative to model selection. An algorithm ARM rooted in information theory is proposed to combine different regression models/methods. A simulation is conducted in the context of linear regression to compare its performance with familiar model selection criteria AIC and BIC, and also with some Bayesian model averaging (BMA) methods. The simulation suggests
Aggregated Estimators And Empirical Complexity For Least Square Regression
"... Numerous empirical results have shown that combining regression procedures can be a very ecient method. This work provides PAC bounds for the L generalization error of such methods. The interest of these bounds are twofold. First, it gives for any aggregating procedure a bound for the expected ris ..."
Abstract

Cited by 35 (2 self)
 Add to MetaCart
Numerous empirical results have shown that combining regression procedures can be a very ecient method. This work provides PAC bounds for the L generalization error of such methods. The interest of these bounds are twofold. First, it gives for any aggregating procedure a bound for the expected risk depending on the empirical risk and the empirical complexity measured by the KullbackLeibler divergence between the aggregating distribution ^ and a prior distribution and by the empirical mean of the variance of the regression functions under the probability ^ .
Combining forecasting procedures: some theoretical results
 Econometric Theory
, 2004
"... We study some methods of combining procedures for forecasting a continuous random variable. Statistical risk bounds under the square error loss are obtained under mild distributional assumptions on the future given the current outside information and the past observations. The risk bounds show that ..."
Abstract

Cited by 34 (4 self)
 Add to MetaCart
We study some methods of combining procedures for forecasting a continuous random variable. Statistical risk bounds under the square error loss are obtained under mild distributional assumptions on the future given the current outside information and the past observations. The risk bounds show that the combined forecast automatically achieves the best performance among the candidate procedures up to a constant factor and an additive penalty term. In term of the rate of convergence, the combined forecast performs as well as if one knew which candidate forecasting procedure is the best in advance. Empirical studies suggest combining procedures can sometimes improve forecasting accuracy compared to the original procedures. Risk bounds are derived to theoretically quantify the potential gain and price for linearly combining forecasts for improvement. The result supports the empirical finding that it is not automatically a good idea to combine forecasts. A blind combining can degrade performance dramatically due to the undesirable large variability in estimating the best combining weights. An automated combining method is shown in theory to achieve a balance between the potential gain and the complexity penalty (the price for combining); to take advantage (if any) of sparse combining; and to maintain the best performance (in rate) among the candidate forecasting procedures if linear or sparse combining does not help.
N.: Recursive Aggregation of Estimators by Mirror Descent Algorithm with averaging. Problems of Information Transmission
"... We consider a recursive algorithm to construct an aggregated estimator from a finite number of base decision rules in the classification problem. The estimator approximately minimizes a convex risk functional under the ℓ 1constraint. It is defined by a stochastic version of the mirror descent algor ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
(Show Context)
We consider a recursive algorithm to construct an aggregated estimator from a finite number of base decision rules in the classification problem. The estimator approximately minimizes a convex risk functional under the ℓ 1constraint. It is defined by a stochastic version of the mirror descent algorithm (i.e., of the method which performs gradient descent in the dual space) with an additional averaging. The main result of the paper is an upper bound for the expected accuracy 1 of the proposed estimator. This bound is of the order √ (log M)/t with an explicit and small constant factor, where M is the dimension of the problem and t stands for the sample size. Similar bound is proved for a more general setting that covers, in particular, the regression model with squared loss. 1
Aggregating Regression Procedures for a Better Performance
 Bernoulli
, 1999
"... Methods have been proposed to linearly combine candidate regression procedures to improve estimation accuraccy. Applications of these methods in many examples are very succeesful, pointing to the great potential of combining procedures. A fundamental question regarding combining procedure is: What i ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
Methods have been proposed to linearly combine candidate regression procedures to improve estimation accuraccy. Applications of these methods in many examples are very succeesful, pointing to the great potential of combining procedures. A fundamental question regarding combining procedure is: What is the potential gain and how much one needs to pay for it? A partial answer to this question is obtained by Juditsky and Nemirovski (1996) for the case when a large number of procedures are to be combined. We attempt to give a more general solution. Under a l 1 constrain on the linear coefficients, we show that for pursuing the best linear combination over n procedures, in terms of rate of convergence under the squared L 2 loss, one can pay a price of order O \Gamma log n=n 1\Gamma \Delta when 0 ! ! 1=2 and a price of order O i (log n=n) 1=2 j when 1=2 ! 1. These rates can not be improved or essentially improved in a uniform sense. This result suggests that one should be cautious...