Results 1  10
of
35
Adaptive forwardbackward greedy algorithm for learning sparse representations
 IEEE Trans. Inform. Theory
, 2011
"... Consider linear prediction models where the target function is a sparse linear combination of a set of basis functions. We are interested in the problem of identifying those basis functions with nonzero coefficients and reconstructing the target function from noisy observations. Two heuristics that ..."
Abstract

Cited by 51 (8 self)
 Add to MetaCart
Consider linear prediction models where the target function is a sparse linear combination of a set of basis functions. We are interested in the problem of identifying those basis functions with nonzero coefficients and reconstructing the target function from noisy observations. Two heuristics that are widely used in practice are forward and backward greedy algorithms. First, we show that neither idea is adequate. Second, we propose a novel combination that is based on the forward greedy algorithm but takes backward steps adaptively whenever beneficial. We prove strong theoretical results showing that this procedure is effective in learning sparse representations. Experimental results support our theory. 1
Boosting algorithms: Regularization, prediction and model fitting
 Statistical Science
, 2007
"... Abstract. We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and correspo ..."
Abstract

Cited by 38 (5 self)
 Add to MetaCart
Abstract. We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and corresponding Akaike or Bayesian information criteria, particularly useful for regularization and variable selection in highdimensional covariate spaces, are discussed as well. The practical aspects of boosting procedures for fitting statistical models are illustrated by means of the dedicated opensource software package mboost. This package implements functions which can be used for model fitting, prediction and variable selection. It is flexible, allowing for the implementation of new boosting algorithms optimizing userspecified loss functions. Key words and phrases: Generalized linear models, generalized additive models, gradient boosting, survival analysis, variable selection, software. 1.
Bolasso: model consistent lasso estimation through the bootstrap
 In Proceedings of the Twentyfifth International Conference on Machine Learning (ICML
, 2008
"... We consider the leastsquare linear regression problem with regularization by the ℓ1norm, a problem usually referred to as the Lasso. In this paper, we present a detailed asymptotic analysis of model consistency of the Lasso. For various decays of the regularization parameter, we compute asymptotic ..."
Abstract

Cited by 36 (12 self)
 Add to MetaCart
We consider the leastsquare linear regression problem with regularization by the ℓ1norm, a problem usually referred to as the Lasso. In this paper, we present a detailed asymptotic analysis of model consistency of the Lasso. For various decays of the regularization parameter, we compute asymptotic equivalents of the probability of correct model selection (i.e., variable selection). For a specific rate decay, we show that the Lasso selects all the variables that should enter the model with probability tending to one exponentially fast, while it selects all other variables with strictly positive probability. We show that this property implies that if we run the Lasso for several bootstrapped replications of a given sample, then intersecting the supports of the Lasso bootstrap estimates leads to consistent model selection. This novel variable selection algorithm, referred to as the Bolasso, is compared favorably to other linear regression methods on synthetic data and datasets from the UCI machine learning repository. 1.
Partial Correlation Estimation by Joint Sparse Regression Models
 JASA
, 2008
"... In this article, we propose a computationally efficient approach—space (Sparse PArtial Correlation Estimation)—for selecting nonzero partial correlations under the highdimensionlowsamplesize setting. This method assumes the overall sparsity of the partial correlation matrix and employs sparse re ..."
Abstract

Cited by 36 (4 self)
 Add to MetaCart
In this article, we propose a computationally efficient approach—space (Sparse PArtial Correlation Estimation)—for selecting nonzero partial correlations under the highdimensionlowsamplesize setting. This method assumes the overall sparsity of the partial correlation matrix and employs sparse regression techniques for model fitting. We illustrate the performance of space by extensive simulation studies. It is shown that space performs well in both nonzero partial correlation selection and the identification of hub variables, and also outperforms two existing methods. We then apply space to a microarray breast cancer dataset and identify a set of hub genes that may provide important insights on genetic regulatory networks. Finally, we prove that, under a set of suitable assumptions, the proposed procedure is asymptotically consistent in terms of model selection and parameter estimation.
Statistical analysis of Bayes optimal subset ranking
 IEEE Transactions on Information Theory
, 2008
"... Abstract—The ranking problem has become increasingly important in modern applications of statistical methods in automated decision making systems. In particular, we consider a formulation of the statistical ranking problem which we call subset ranking, and focus on the DCG (discounted cumulated gain ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
Abstract—The ranking problem has become increasingly important in modern applications of statistical methods in automated decision making systems. In particular, we consider a formulation of the statistical ranking problem which we call subset ranking, and focus on the DCG (discounted cumulated gain) criterion that measures the quality of items near the top of the ranklist. Similar to error minimization for binary classification, direct optimization of natural ranking criteria such as DCG leads to a nonconvex optimization problems that can be NPhard. Therefore a computationally more tractable approach is needed. We present bounds that relate the approximate optimization of DCG to the approximate minimization of certain regression errors. These bounds justify the use of convex learning formulations for solving the subset ranking problem. The resulting estimation methods are not conventional, in that we focus on the estimation quality in the topportion of the ranklist. We further investigate the asymptotic statistical behavior of these formulations. Under appropriate conditions, the consistency of the estimation schemes with respect to the DCG metric can be derived. I.
Sparse Boosting
 Journal of Machine Learning Research
, 2006
"... We propose Sparse Boosting (the SparseL 2 Boost algorithm), a variant on boosting with the squared error loss. SparseL 2 Boost yields sparser solutions than the previously proposed L 2 Boosting by minimizing some penalized L 2 loss functions, the FPE model selection criteria, through smallstep g ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
We propose Sparse Boosting (the SparseL 2 Boost algorithm), a variant on boosting with the squared error loss. SparseL 2 Boost yields sparser solutions than the previously proposed L 2 Boosting by minimizing some penalized L 2 loss functions, the FPE model selection criteria, through smallstep gradient descent. Although boosting may give already relatively sparse solutions, for example corresponding to the softthresholding estimator in orthogonal linear models, there is sometimes a desire for more sparseness to increase prediction accuracy and ability for better variable selection: such goals can be achieved with SparseL 2 Boost.
Pvalues for highdimensional regression
, 2009
"... Assigning significance in highdimensional regression is challenging. Most computationally efficient selection algorithms cannot guard against inclusion of noise variables. Asymptotically valid pvalues are not available. An exception is a recent proposal by Wasserman and Roeder (2008) which splits ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Assigning significance in highdimensional regression is challenging. Most computationally efficient selection algorithms cannot guard against inclusion of noise variables. Asymptotically valid pvalues are not available. An exception is a recent proposal by Wasserman and Roeder (2008) which splits the data into two parts. The number of variables is then reduced to a manageable size using the first split, while classical variable selection techniques can be applied to the remaining variables, using the data from the second split. This yields asymptotic error control under minimal conditions. It involves, however, a onetime random split of the data. Results are sensitive to this arbitrary choice: it amounts to a “pvalue lottery ” and makes it difficult to reproduce results. Here, we show that inference across multiple random splits can be aggregated, while keeping asymptotic control over the inclusion of noise variables. In addition, the proposed aggregation is shown to improve power, while reducing the number of falsely selected variables substantially. Keywords: Highdimensional variable selection, data splitting, multiple comparisons. 1
On the consistency of Bayesian variable selection for high dimensional binary regression and classification
 Neural Comput
, 2006
"... Bayesian variable selection has gained much empirical success recently in a variety of applications when the number K of explanatory variables (x1,...,xK) is possibly much larger than the sample size n. For generalized linear models, if most of the xj’s have very small effects on the response y, we ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Bayesian variable selection has gained much empirical success recently in a variety of applications when the number K of explanatory variables (x1,...,xK) is possibly much larger than the sample size n. For generalized linear models, if most of the xj’s have very small effects on the response y, we show that it is possible to use Bayesian variable selection to reduce overfitting caused by the curse of dimensionality K ≫ n. In this approach a suitable prior can be used to choose a few out of the many xj’s to model y, so that the posterior will propose probability densities p that are “often close ” to the true density p ∗ in some sense. The closeness can be described by a Hellinger distance between p and p ∗ that scales at a power very close to n −1/2, which is the “finitedimensional rate ” corresponding to a lowdimensional situation. These findings extend some recent work of Jiang [Technical Report 0502 (2005) Dept. Statistics, Northwestern Univ.] on consistency of Bayesian variable selection for binary classification.
Variable Selection and Model Choice in Geoadditive Regression Models
"... Model choice and variable selection are issues of major concern in practical regression analyses. We propose a boosting procedure that facilitates both tasks in a class of complex geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction s ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
Model choice and variable selection are issues of major concern in practical regression analyses. We propose a boosting procedure that facilitates both tasks in a class of complex geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, random effects, and varying coefficient terms. The major modelling component are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a remaining smooth component with one degree of freedom to obtain a fair comparison between all model terms. A generic representation of the geoadditive model allows to devise a general boosting algorithm that implements automatic model choice and variable selection. We demonstrate the versatility of our approach with two examples: a geoadditive Poisson regression model for species counts in habitat suitability analyses and a geoadditive logit model for the analysis of forest health. Key words: bivariate smoothing, boosting, functional gradient, penalised splines, random effects, spacevarying effects
Adapting prediction error estimates for biased complexity selection in highdimensional bootstrap samples
 Statistical Applications in Genetics and Molecular Biology
, 2008
"... The bootstrap is a tool that allows for efficient evaluation of prediction performance of statistical techniques without having to set aside data for validation. This is especially important for highdimensional data, e.g. arising from microarrays, because there the number of observations is often l ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
The bootstrap is a tool that allows for efficient evaluation of prediction performance of statistical techniques without having to set aside data for validation. This is especially important for highdimensional data, e.g. arising from microarrays, because there the number of observations is often limited. The statistical technique to be evaluated has to be applied to every bootstrap sample in the same manner it would be used on new data. This includes selection of complexity, e.g. the number of boosting steps for gradient boosting algorithms. Using the latter, we demonstrate in a simulation study that complexity selection in conventional bootstrap samples, drawn with replacement, is biased in many scenarios. This results in models that are much more complex compared to models selected in the original data. Potential remedies for this complexity selection bias, such as alternatively using a fixed level of complexity or of using sampling without replacement are investigated and it is shown that the latter works well in many settings. We focus on highdimensional binary response data, with bootstrap.632+ estimates of the Brier score for performance evaluation, and censored timetoevent data with.632+ prediction error curve estimates. The latter, with the modified bootstrap procedure, is then applied to an example with microarray data from patients with diffuse large Bcell lymphoma.