Results 1  10
of
46
Adaptive forwardbackward greedy algorithm for learning sparse representations
 IEEE Trans. Inform. Theory
, 2011
"... Consider linear prediction models where the target function is a sparse linear combination of a set of basis functions. We are interested in the problem of identifying those basis functions with nonzero coefficients and reconstructing the target function from noisy observations. Two heuristics that ..."
Abstract

Cited by 55 (8 self)
 Add to MetaCart
Consider linear prediction models where the target function is a sparse linear combination of a set of basis functions. We are interested in the problem of identifying those basis functions with nonzero coefficients and reconstructing the target function from noisy observations. Two heuristics that are widely used in practice are forward and backward greedy algorithms. First, we show that neither idea is adequate. Second, we propose a novel combination that is based on the forward greedy algorithm but takes backward steps adaptively whenever beneficial. We prove strong theoretical results showing that this procedure is effective in learning sparse representations. Experimental results support our theory. 1
Boosting algorithms: Regularization, prediction and model fitting
 Statistical Science
, 2007
"... Abstract. We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and correspo ..."
Abstract

Cited by 48 (9 self)
 Add to MetaCart
Abstract. We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and corresponding Akaike or Bayesian information criteria, particularly useful for regularization and variable selection in highdimensional covariate spaces, are discussed as well. The practical aspects of boosting procedures for fitting statistical models are illustrated by means of the dedicated opensource software package mboost. This package implements functions which can be used for model fitting, prediction and variable selection. It is flexible, allowing for the implementation of new boosting algorithms optimizing userspecified loss functions. Key words and phrases: Generalized linear models, generalized additive models, gradient boosting, survival analysis, variable selection, software. 1.
Bolasso: model consistent lasso estimation through the bootstrap
 In Proceedings of the Twentyfifth International Conference on Machine Learning (ICML
, 2008
"... We consider the leastsquare linear regression problem with regularization by the ℓ1norm, a problem usually referred to as the Lasso. In this paper, we present a detailed asymptotic analysis of model consistency of the Lasso. For various decays of the regularization parameter, we compute asymptotic ..."
Abstract

Cited by 39 (14 self)
 Add to MetaCart
We consider the leastsquare linear regression problem with regularization by the ℓ1norm, a problem usually referred to as the Lasso. In this paper, we present a detailed asymptotic analysis of model consistency of the Lasso. For various decays of the regularization parameter, we compute asymptotic equivalents of the probability of correct model selection (i.e., variable selection). For a specific rate decay, we show that the Lasso selects all the variables that should enter the model with probability tending to one exponentially fast, while it selects all other variables with strictly positive probability. We show that this property implies that if we run the Lasso for several bootstrapped replications of a given sample, then intersecting the supports of the Lasso bootstrap estimates leads to consistent model selection. This novel variable selection algorithm, referred to as the Bolasso, is compared favorably to other linear regression methods on synthetic data and datasets from the UCI machine learning repository. 1.
Partial Correlation Estimation by Joint Sparse Regression Models
 JASA
, 2008
"... In this article, we propose a computationally efficient approach—space (Sparse PArtial Correlation Estimation)—for selecting nonzero partial correlations under the highdimensionlowsamplesize setting. This method assumes the overall sparsity of the partial correlation matrix and employs sparse re ..."
Abstract

Cited by 38 (4 self)
 Add to MetaCart
In this article, we propose a computationally efficient approach—space (Sparse PArtial Correlation Estimation)—for selecting nonzero partial correlations under the highdimensionlowsamplesize setting. This method assumes the overall sparsity of the partial correlation matrix and employs sparse regression techniques for model fitting. We illustrate the performance of space by extensive simulation studies. It is shown that space performs well in both nonzero partial correlation selection and the identification of hub variables, and also outperforms two existing methods. We then apply space to a microarray breast cancer dataset and identify a set of hub genes that may provide important insights on genetic regulatory networks. Finally, we prove that, under a set of suitable assumptions, the proposed procedure is asymptotically consistent in terms of model selection and parameter estimation.
Statistical analysis of Bayes optimal subset ranking
 IEEE Transactions on Information Theory
, 2008
"... Abstract—The ranking problem has become increasingly important in modern applications of statistical methods in automated decision making systems. In particular, we consider a formulation of the statistical ranking problem which we call subset ranking, and focus on the DCG (discounted cumulated gain ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
Abstract—The ranking problem has become increasingly important in modern applications of statistical methods in automated decision making systems. In particular, we consider a formulation of the statistical ranking problem which we call subset ranking, and focus on the DCG (discounted cumulated gain) criterion that measures the quality of items near the top of the ranklist. Similar to error minimization for binary classification, direct optimization of natural ranking criteria such as DCG leads to a nonconvex optimization problems that can be NPhard. Therefore a computationally more tractable approach is needed. We present bounds that relate the approximate optimization of DCG to the approximate minimization of certain regression errors. These bounds justify the use of convex learning formulations for solving the subset ranking problem. The resulting estimation methods are not conventional, in that we focus on the estimation quality in the topportion of the ranklist. We further investigate the asymptotic statistical behavior of these formulations. Under appropriate conditions, the consistency of the estimation schemes with respect to the DCG metric can be derived. I.
Pvalues for highdimensional regression
, 2009
"... Assigning significance in highdimensional regression is challenging. Most computationally efficient selection algorithms cannot guard against inclusion of noise variables. Asymptotically valid pvalues are not available. An exception is a recent proposal by Wasserman and Roeder (2008) which splits ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
Assigning significance in highdimensional regression is challenging. Most computationally efficient selection algorithms cannot guard against inclusion of noise variables. Asymptotically valid pvalues are not available. An exception is a recent proposal by Wasserman and Roeder (2008) which splits the data into two parts. The number of variables is then reduced to a manageable size using the first split, while classical variable selection techniques can be applied to the remaining variables, using the data from the second split. This yields asymptotic error control under minimal conditions. It involves, however, a onetime random split of the data. Results are sensitive to this arbitrary choice: it amounts to a “pvalue lottery ” and makes it difficult to reproduce results. Here, we show that inference across multiple random splits can be aggregated, while keeping asymptotic control over the inclusion of noise variables. In addition, the proposed aggregation is shown to improve power, while reducing the number of falsely selected variables substantially. Keywords: Highdimensional variable selection, data splitting, multiple comparisons. 1
Sparse Boosting
 Journal of Machine Learning Research
, 2006
"... We propose Sparse Boosting (the SparseL 2 Boost algorithm), a variant on boosting with the squared error loss. SparseL 2 Boost yields sparser solutions than the previously proposed L 2 Boosting by minimizing some penalized L 2 loss functions, the FPE model selection criteria, through smallstep g ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
We propose Sparse Boosting (the SparseL 2 Boost algorithm), a variant on boosting with the squared error loss. SparseL 2 Boost yields sparser solutions than the previously proposed L 2 Boosting by minimizing some penalized L 2 loss functions, the FPE model selection criteria, through smallstep gradient descent. Although boosting may give already relatively sparse solutions, for example corresponding to the softthresholding estimator in orthogonal linear models, there is sometimes a desire for more sparseness to increase prediction accuracy and ability for better variable selection: such goals can be achieved with SparseL 2 Boost.
Variable Selection and Model Choice in Geoadditive Regression Models
"... Model choice and variable selection are issues of major concern in practical regression analyses. We propose a boosting procedure that facilitates both tasks in a class of complex geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction s ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
Model choice and variable selection are issues of major concern in practical regression analyses. We propose a boosting procedure that facilitates both tasks in a class of complex geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, random effects, and varying coefficient terms. The major modelling component are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a remaining smooth component with one degree of freedom to obtain a fair comparison between all model terms. A generic representation of the geoadditive model allows to devise a general boosting algorithm that implements automatic model choice and variable selection. We demonstrate the versatility of our approach with two examples: a geoadditive Poisson regression model for species counts in habitat suitability analyses and a geoadditive logit model for the analysis of forest health. Key words: bivariate smoothing, boosting, functional gradient, penalised splines, random effects, spacevarying effects
On the consistency of Bayesian variable selection for high dimensional binary regression and classification
 Neural Comput
, 2006
"... Bayesian variable selection has gained much empirical success recently in a variety of applications when the number K of explanatory variables (x1,...,xK) is possibly much larger than the sample size n. For generalized linear models, if most of the xj’s have very small effects on the response y, we ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Bayesian variable selection has gained much empirical success recently in a variety of applications when the number K of explanatory variables (x1,...,xK) is possibly much larger than the sample size n. For generalized linear models, if most of the xj’s have very small effects on the response y, we show that it is possible to use Bayesian variable selection to reduce overfitting caused by the curse of dimensionality K ≫ n. In this approach a suitable prior can be used to choose a few out of the many xj’s to model y, so that the posterior will propose probability densities p that are “often close ” to the true density p ∗ in some sense. The closeness can be described by a Hellinger distance between p and p ∗ that scales at a power very close to n −1/2, which is the “finitedimensional rate ” corresponding to a lowdimensional situation. These findings extend some recent work of Jiang [Technical Report 0502 (2005) Dept. Statistics, Northwestern Univ.] on consistency of Bayesian variable selection for binary classification.
Boosting for highmultivariate responses in highdimensional linear regression
 Statist. Sin
, 2006
"... Abstract: We propose a boosting method, multivariate L2Boosting, for multivariate linear regression based on some squared error loss for multivariate data. It can be applied to multivariate linear regression with continuous responses and to vector autoregressive time series. We prove, for i.i.d. as ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Abstract: We propose a boosting method, multivariate L2Boosting, for multivariate linear regression based on some squared error loss for multivariate data. It can be applied to multivariate linear regression with continuous responses and to vector autoregressive time series. We prove, for i.i.d. as well as time series data, that multivariate L2Boosting can consistently recover sparse highdimensional multivariate linear functions, even when the number of predictor variables pn and the dimension of the response qn grow almost exponentially with sample size n, pn = qn = O(exp(Cn 1)) (0 < < 1, 0 < C < 1), but the `1norm of the true underlying function is nite. Our theory seems to be among the rst to address the issue of large dimension of the response variable; the relevance of such settings is brie
y outlined. We also identify empirically some cases where our multivariate L2Boosting is better than multiple application of univariate methods to single response components, thus demonstrating that the multivariate approach can be very useful. Key words and phrases: Highmultivariate highdimensional linear regression, L2Boosting, vector AR time series. 1.