• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Hothorn T: Boosting Algorithms: Regularization, Prediction and Model Fitting (0)

by P Bühlmann
Venue:Journal of Statistical Science
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 12
Next 10 →

Variable Selection and Model Choice in Geoadditive Regression Models

by Thomas Kneib, Torsten Hothorn, Gerhard Tutz, Thomas Kneib, Torsten Hothorn, Gerhard Tutz
"... Model choice and variable selection are issues of major concern in practi-cal regression analyses. We propose a boosting procedure that facilitates both tasks in a class of complex geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction s ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Model choice and variable selection are issues of major concern in practi-cal regression analyses. We propose a boosting procedure that facilitates both tasks in a class of complex geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, random effects, and varying coefficient terms. The major modelling compo-nent are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a remaining smooth component with one degree of freedom to obtain a fair comparison between all model terms. A generic representation of the geoadditive model allows to devise a general boosting algorithm that imple-ments automatic model choice and variable selection. We demonstrate the versatility of our approach with two examples: a geoadditive Poisson regres-sion model for species counts in habitat suitability analyses and a geoadditive logit model for the analysis of forest health. Key words: bivariate smoothing, boosting, functional gradient, penalised splines, random effects, space-varying effects

Boosting Additive Models using Component-wise P-Splines

by Matthias Schmid, Torsten Hothorn, Matthias Schmid, Torsten Hothorn
"... We consider an efficient approximation of Bühlmann & Yu’s L2Boosting algorithm with component-wise smoothing splines. Smoothing spline base-learners are replaced by P-spline base-learners which yield similar prediction errors but are more advantageous from a computational point of view. In particula ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
We consider an efficient approximation of Bühlmann & Yu’s L2Boosting algorithm with component-wise smoothing splines. Smoothing spline base-learners are replaced by P-spline base-learners which yield similar prediction errors but are more advantageous from a computational point of view. In particular, we give a detailed analysis on the effect of various P-spline hyper-parameters on the boosting fit. In addition, we derive a new theoretical result on the relationship between the boosting stopping iteration and the step length factor used for shrinking the boosting estimates. Key words: L2Boosting, P-splines, smoothing splines, additive models, variable selection, component-wise base-learners 1

BMC Bioinformatics BioMed Central Methodology article Incorporating pathway information into boosting estimation of

by Harald Binder, Martin Schumacher, Open Access , 2009
"... high-dimensional risk prediction models ..."
Abstract - Add to MetaCart
high-dimensional risk prediction models

Additive Models: The Men’s Olympic 1500m, Air Pollution in the USA, and

by Brian S. Everitt, Torsten Hothorn Chapter, Scatterplot Smoothers, Risk Factors For Kyphosis
"... To begin we will construct a scatterplot of winning time against year the games ..."
Abstract - Add to MetaCart
To begin we will construct a scatterplot of winning time against year the games

BMC Bioinformatics BioMed Central Methodology article Flexible boosting of accelerated failure time models

by Matthias Schmid, Torsten Hothorn , 2008
"... © 2008 Schmid and Hothorn; licensee BioMed Central Ltd. ..."
Abstract - Add to MetaCart
© 2008 Schmid and Hothorn; licensee BioMed Central Ltd.

Model-Based Boosting: Unbiased Variable Selection and Model Choice

by Benjamin Hofner
"... Variable selection and model choice are of major concern in many applications, especially in high-dimensional settings. Boosting (for an overview see Bühlmann and Hothorn (2007)) is a useful method for model fitting with intrinsic variable selection and model choice. However, a central problem remai ..."
Abstract - Add to MetaCart
Variable selection and model choice are of major concern in many applications, especially in high-dimensional settings. Boosting (for an overview see Bühlmann and Hothorn (2007)) is a useful method for model fitting with intrinsic variable selection and model choice. However, a central problem remains: Variable selection is biased if the covariates are of very different nature. An important example is given by models that try to make use of continuous and categorical covariates at the same time. Especially if the number of categories increases, categorical covariates offer an increased flexibility and thus are preferred over continuous covariates (with linear effects). A closely related problem is model choice, where one tries to choose between different modeling alternatives for one covariate. The choice between linear or smooth effects is a classical example. The two competitors have different degrees of freedom (1 df for the linear effect and considerably more than 1 df for the smooth effect). Hence, smooth effects are preferably selected. To make categorical covariates comparable to linear effects in the boosting framework one could use ridge penalized base-learners (i.e, modeling components) with 1 df in this case. To overcome the problem of different degrees of freedom of, e.g., linear and smooth effects Kneib

Statistics and Its Interface Volume 2 (2009) 341–348 FIRST: Combining forward iterative selection and shrinkage in

by Subhashis Ghosal
"... high dimensional sparse linear regression ..."
Abstract - Add to MetaCart
high dimensional sparse linear regression

STOCHASTIC BOOSTING ALGORITHMS

by Ajay Jasra, Chris Holmes
"... Abstract. In this article, we discuss a class of stochastic boosting algorithms, which corrects and develops the work of [23], showing how to perform statistical inference in a computationally efficient manner. Sequential Monte Carlo (SMC) methods are used to illustrate that the stochastic boosting ..."
Abstract - Add to MetaCart
Abstract. In this article, we discuss a class of stochastic boosting algorithms, which corrects and develops the work of [23], showing how to perform statistical inference in a computationally efficient manner. Sequential Monte Carlo (SMC) methods are used to illustrate that the stochastic boosting methods can provide better predictions, for a higher computational cost, than the corresponding boosting algorithm. A theoretical result is also given, which expresses an upper-bound of the posterior-predictive test error, in terms of that of boosting. The result shows that the averaged predictions used, are relatively stable with respect to boosting, when the latter provides the single best prediction. We also investigate the method on a real case study from machine learning and in a regression context, showing that it can be a useful tool for data exploration.

This is an extended and slightly modified version of the manuscript

by Benjamin Hofner, Andreas Mayr, Nikolay Robinzonov, Matthias Schmid, Benjamin Hofner, Andreas Mayr, Nikolay Robinzonov, Mattthias Schmid , 2012
"... We provide a detailed hands-on tutorial for the R add-on package mboost. The package implements boosting for optimizing general risk functions utilizing component-wise (penalized) least squares estimates as base-learners for fitting various kinds of generalized linear and generalized additive models ..."
Abstract - Add to MetaCart
We provide a detailed hands-on tutorial for the R add-on package mboost. The package implements boosting for optimizing general risk functions utilizing component-wise (penalized) least squares estimates as base-learners for fitting various kinds of generalized linear and generalized additive models to potentially high-dimensional data. We give a theoretical background and demonstrate how mboost can be used to fit interpretable models of different complexity. As an example we use mboost to predict the body fat based on anthropometric measurements throughout the tutorial. 1

Reprints and permission: sagepub.com/journalsPermissions.nav

by Jocelyn E. Holden, W. Holmes Finch, Ken Kelley
"... The statistical classification of N individuals into G mutually exclusive groups when the actual group membership is unknown is common in the social and behavioral sciences. The results of such classification methods often have important consequences. Among the most common methods of statistical cla ..."
Abstract - Add to MetaCart
The statistical classification of N individuals into G mutually exclusive groups when the actual group membership is unknown is common in the social and behavioral sciences. The results of such classification methods often have important consequences. Among the most common methods of statistical classification are linear discriminant analysis, quadratic discriminant analysis, and logistic regression. However, recent developments in the statistics literature have brought new and potentially more flexible classification models to the forefront. Although these new models are increasingly being used in the physical sciences and marketing research, they are still relatively little used in the social and behavioral sciences. The purpose of this article is to provide a comparison of these modern methods with the classical methods widely used in situations that are relevant in the social and behavioral sciences. This study uses a large-scale Monte Carlo simulation study for the comparisons, as analytic comparisons are often not tractable. Results indicate that classification and regression trees generally produced the highest classification accuracy of all techniques tested, though study design characteristics such as sample size and model complexity can greatly influence optimal choice or effectiveness of statistical classification method. Keywords discriminant analysis, logistic regression, multivariate adaptive regression splines, classification and regression trees, boosting, generalized additive models, neural
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University