Results 1 - 10
of
12
Variable Selection and Model Choice in Geoadditive Regression Models
"... Model choice and variable selection are issues of major concern in practi-cal regression analyses. We propose a boosting procedure that facilitates both tasks in a class of complex geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction s ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Model choice and variable selection are issues of major concern in practi-cal regression analyses. We propose a boosting procedure that facilitates both tasks in a class of complex geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, random effects, and varying coefficient terms. The major modelling compo-nent are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a remaining smooth component with one degree of freedom to obtain a fair comparison between all model terms. A generic representation of the geoadditive model allows to devise a general boosting algorithm that imple-ments automatic model choice and variable selection. We demonstrate the versatility of our approach with two examples: a geoadditive Poisson regres-sion model for species counts in habitat suitability analyses and a geoadditive logit model for the analysis of forest health. Key words: bivariate smoothing, boosting, functional gradient, penalised splines, random effects, space-varying effects
Boosting Additive Models using Component-wise P-Splines
"... We consider an efficient approximation of Bühlmann & Yu’s L2Boosting algorithm with component-wise smoothing splines. Smoothing spline base-learners are replaced by P-spline base-learners which yield similar prediction errors but are more advantageous from a computational point of view. In particula ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We consider an efficient approximation of Bühlmann & Yu’s L2Boosting algorithm with component-wise smoothing splines. Smoothing spline base-learners are replaced by P-spline base-learners which yield similar prediction errors but are more advantageous from a computational point of view. In particular, we give a detailed analysis on the effect of various P-spline hyper-parameters on the boosting fit. In addition, we derive a new theoretical result on the relationship between the boosting stopping iteration and the step length factor used for shrinking the boosting estimates. Key words: L2Boosting, P-splines, smoothing splines, additive models, variable selection, component-wise base-learners 1
BMC Bioinformatics BioMed Central Methodology article Incorporating pathway information into boosting estimation of
, 2009
"... high-dimensional risk prediction models ..."
Additive Models: The Men’s Olympic 1500m, Air Pollution in the USA, and
"... To begin we will construct a scatterplot of winning time against year the games ..."
Abstract
- Add to MetaCart
To begin we will construct a scatterplot of winning time against year the games
BMC Bioinformatics BioMed Central Methodology article Flexible boosting of accelerated failure time models
, 2008
"... © 2008 Schmid and Hothorn; licensee BioMed Central Ltd. ..."
Model-Based Boosting: Unbiased Variable Selection and Model Choice
"... Variable selection and model choice are of major concern in many applications, especially in high-dimensional settings. Boosting (for an overview see Bühlmann and Hothorn (2007)) is a useful method for model fitting with intrinsic variable selection and model choice. However, a central problem remai ..."
Abstract
- Add to MetaCart
Variable selection and model choice are of major concern in many applications, especially in high-dimensional settings. Boosting (for an overview see Bühlmann and Hothorn (2007)) is a useful method for model fitting with intrinsic variable selection and model choice. However, a central problem remains: Variable selection is biased if the covariates are of very different nature. An important example is given by models that try to make use of continuous and categorical covariates at the same time. Especially if the number of categories increases, categorical covariates offer an increased flexibility and thus are preferred over continuous covariates (with linear effects). A closely related problem is model choice, where one tries to choose between different modeling alternatives for one covariate. The choice between linear or smooth effects is a classical example. The two competitors have different degrees of freedom (1 df for the linear effect and considerably more than 1 df for the smooth effect). Hence, smooth effects are preferably selected. To make categorical covariates comparable to linear effects in the boosting framework one could use ridge penalized base-learners (i.e, modeling components) with 1 df in this case. To overcome the problem of different degrees of freedom of, e.g., linear and smooth effects Kneib
Statistics and Its Interface Volume 2 (2009) 341–348 FIRST: Combining forward iterative selection and shrinkage in
"... high dimensional sparse linear regression ..."
STOCHASTIC BOOSTING ALGORITHMS
"... Abstract. In this article, we discuss a class of stochastic boosting algorithms, which corrects and develops the work of [23], showing how to perform statistical inference in a computationally efficient manner. Sequential Monte Carlo (SMC) methods are used to illustrate that the stochastic boosting ..."
Abstract
- Add to MetaCart
Abstract. In this article, we discuss a class of stochastic boosting algorithms, which corrects and develops the work of [23], showing how to perform statistical inference in a computationally efficient manner. Sequential Monte Carlo (SMC) methods are used to illustrate that the stochastic boosting methods can provide better predictions, for a higher computational cost, than the corresponding boosting algorithm. A theoretical result is also given, which expresses an upper-bound of the posterior-predictive test error, in terms of that of boosting. The result shows that the averaged predictions used, are relatively stable with respect to boosting, when the latter provides the single best prediction. We also investigate the method on a real case study from machine learning and in a regression context, showing that it can be a useful tool for data exploration.
This is an extended and slightly modified version of the manuscript
, 2012
"... We provide a detailed hands-on tutorial for the R add-on package mboost. The package implements boosting for optimizing general risk functions utilizing component-wise (penalized) least squares estimates as base-learners for fitting various kinds of generalized linear and generalized additive models ..."
Abstract
- Add to MetaCart
We provide a detailed hands-on tutorial for the R add-on package mboost. The package implements boosting for optimizing general risk functions utilizing component-wise (penalized) least squares estimates as base-learners for fitting various kinds of generalized linear and generalized additive models to potentially high-dimensional data. We give a theoretical background and demonstrate how mboost can be used to fit interpretable models of different complexity. As an example we use mboost to predict the body fat based on anthropometric measurements throughout the tutorial. 1
Reprints and permission: sagepub.com/journalsPermissions.nav
"... The statistical classification of N individuals into G mutually exclusive groups when the actual group membership is unknown is common in the social and behavioral sciences. The results of such classification methods often have important consequences. Among the most common methods of statistical cla ..."
Abstract
- Add to MetaCart
The statistical classification of N individuals into G mutually exclusive groups when the actual group membership is unknown is common in the social and behavioral sciences. The results of such classification methods often have important consequences. Among the most common methods of statistical classification are linear discriminant analysis, quadratic discriminant analysis, and logistic regression. However, recent developments in the statistics literature have brought new and potentially more flexible classification models to the forefront. Although these new models are increasingly being used in the physical sciences and marketing research, they are still relatively little used in the social and behavioral sciences. The purpose of this article is to provide a comparison of these modern methods with the classical methods widely used in situations that are relevant in the social and behavioral sciences. This study uses a large-scale Monte Carlo simulation study for the comparisons, as analytic comparisons are often not tractable. Results indicate that classification and regression trees generally produced the highest classification accuracy of all techniques tested, though study design characteristics such as sample size and model complexity can greatly influence optimal choice or effectiveness of statistical classification method. Keywords discriminant analysis, logistic regression, multivariate adaptive regression splines, classification and regression trees, boosting, generalized additive models, neural

