• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Boosting with the L2-loss: Regression and classification (2003)

by P Bühlmann, B Yu
Venue:2 ) 2
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 65
Next 10 →

Propensity Score Estimation with Boosted Regression for Evaluating Causal Effects in Observational Studies

by Daniel F. Mccaffrey, Greg Ridgeway, Andrew R. Morral - Psychological Methods , 2004
"... Causal effect modeling with naturalistic rather than experimental data is challenging. In observational studies participants in different treatment conditions may also differ on pretreatment characteristics that influence outcomes. Propensity score methods can theoretically eliminate these confounds ..."
Abstract - Cited by 15 (2 self) - Add to MetaCart
Causal effect modeling with naturalistic rather than experimental data is challenging. In observational studies participants in different treatment conditions may also differ on pretreatment characteristics that influence outcomes. Propensity score methods can theoretically eliminate these confounds for all observed covariates, but accurate estimation of propensity scores is impeded by large numbers of covariates, uncertain functional forms for their associations with treatment selection, and other problems. This paper demonstrates that boosting, a modern statistical technique, can overcome many of these obstacles. We illustrate this approach with a study of adolescent probationers in substance abuse treatment programs. Propensity score weights estimated using boosting eliminate most pretreatment group differences, and substantially alter the apparent relative effects of adolescent substance abuse treatment. Experimental studies offer the most rigorous evidence with which to establish treatment efficacy, but they are not always practical or feasible. Experimental treatment evaluations can be expensive to field and may be too slow to produce answers to pressing questions. In some cases

High dimensional classification using features annealed independence rules

by Jianqing Fan, Yingying Fan - Ann. Statist , 2008
"... ABSTRACT. Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel an ..."
Abstract - Cited by 14 (4 self) - Add to MetaCart
ABSTRACT. Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel and Levina (2004) show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as bad as the random guessing due to noise accumulation in estimating population centroids in high-dimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as bad as the random guessing. Thus, it is paramountly important to select a subset of important features for high-dimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the two-sample t-statistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.

Multi-class AdaBoost

by Ji Zhu, Hui Zou, Saharon Rosset, Trevor Hastie - STATISTICS AND ITS INTERFACE VOLUME , 2009
"... Boosting has been a very successful technique for solving the two-class classification problem. In going from two-class to multi-class classification, most algorithms have been restricted to reducing the multi-class classification problem to multiple two-class problems. In this paper, we develop a n ..."
Abstract - Cited by 13 (0 self) - Add to MetaCart
Boosting has been a very successful technique for solving the two-class classification problem. In going from two-class to multi-class classification, most algorithms have been restricted to reducing the multi-class classification problem to multiple two-class problems. In this paper, we develop a new algorithm that directly extends the AdaBoost algorithm to the multi-class case without reducing it to multiple two-class problems. We show that the proposed multi-class AdaBoost algorithm is equivalent to a forward stagewise additive modeling algorithm that minimizes a novel exponential loss for multi-class classification. Furthermore, we show that the exponential loss is a member of a class of Fisher-consistent loss functions for multi-class classification. As shown in the paper, the new algorithm is extremely easy to implement and is highly competitive in terms of misclassification error rate.

Evidence Contrary to the Statistical View of Boosting

by David Mease Mease, Yoav Freund - Machine Learning Research (Forthcoming , 2006
"... The statistical perspective on boosting algorithms focuses on optimization, drawing parallels with maximum likelihood estimation for logistic regression. In this paper we present empirical evidence that raises questions about this view. Although the statistical perspective provides a theoretical fra ..."
Abstract - Cited by 13 (1 self) - Add to MetaCart
The statistical perspective on boosting algorithms focuses on optimization, drawing parallels with maximum likelihood estimation for logistic regression. In this paper we present empirical evidence that raises questions about this view. Although the statistical perspective provides a theoretical framework within which it is possible to derive theorems and create new algorithms in general contexts, we show that there remain many unanswered important questions. Furthermore, we provide examples that reveal crucial flaws in the many practical suggestions and new methods that are derived from the statistical view. We perform carefully designed experiments using simple simulation models to illustrate some of these flaws and their practical consequences.

NEW MULTICATEGORY BOOSTING ALGORITHMS BASED ON MULTICATEGORY FISHER-CONSISTENT LOSSES

by Hui Zou, Ji Zhu, Trevor Hastie - SUBMITTED TO THE ANNALS OF APPLIED STATISTICS
"... Fisher-consistent loss functions play a fundamental role in the construction of successful binary margin-based classifiers. In this paper we establish the Fisher-consistency condition for multicategory classification problems. Our approach uses the margin vector concept which can be regarded as a mu ..."
Abstract - Cited by 9 (0 self) - Add to MetaCart
Fisher-consistent loss functions play a fundamental role in the construction of successful binary margin-based classifiers. In this paper we establish the Fisher-consistency condition for multicategory classification problems. Our approach uses the margin vector concept which can be regarded as a multicategory generalization of the binary margin. We characterize a wide class of smooth convex loss functions that are Fisher-consistent for multicategory classification. We then consider using the margin-vector-based loss functions to derive multicategory boosting algorithms. In particular, we derive two new multicategory boosting algorithms by using the exponential and logistic regression losses.

Sparse Boosting

by Peter Bühlmann, Bin Yu, Yoram Singer, Larry Wasserman - Journal of Machine Learning Research , 2006
"... We propose Sparse Boosting (the SparseL 2 Boost algorithm), a variant on boosting with the squared error loss. SparseL 2 Boost yields sparser solutions than the previously proposed L 2 Boosting by minimizing some penalized L 2 -loss functions, the FPE model selection criteria, through smallstep g ..."
Abstract - Cited by 7 (2 self) - Add to MetaCart
We propose Sparse Boosting (the SparseL 2 Boost algorithm), a variant on boosting with the squared error loss. SparseL 2 Boost yields sparser solutions than the previously proposed L 2 Boosting by minimizing some penalized L 2 -loss functions, the FPE model selection criteria, through smallstep gradient descent. Although boosting may give already relatively sparse solutions, for example corresponding to the soft-thresholding estimator in orthogonal linear models, there is sometimes a desire for more sparseness to increase prediction accuracy and ability for better variable selection: such goals can be achieved with SparseL 2 Boost.

Kernel density classification and boosting: an L2 analysis

by M. Di Marzio - Statistics and Computing , 2005
"... Abstract. Kernel density estimation is a commonly used approach to classification. However, most of the theoretical results for kernel methods apply to estimation per se and not necessarily to classification. In this paper we show that when estimating the difference between two densities, the optima ..."
Abstract - Cited by 6 (1 self) - Add to MetaCart
Abstract. Kernel density estimation is a commonly used approach to classification. However, most of the theoretical results for kernel methods apply to estimation per se and not necessarily to classification. In this paper we show that when estimating the difference between two densities, the optimal smoothing parameters are increasing functions of the sample size of the complementary group, and we provide a small simluation study which examines the relative performance of kernel density methods when the final goal is classification. A relative newcomer to the classification portfolio is “boosting”, and this paper proposes an algorithm for boosting kernel density classifiers. We note that boosting is closely linked to a previously proposed method of bias reduction in kernel density estimation and indicate how it will enjoy similar properties for classification. We show that boosting kernel classifiers reduces the bias whilst only slightly increasing the variance, with an overall reduction in error. Numerical examples and simulations are used to illustrate the findings, and we also suggest further areas of research.

A note on margin-based loss functions

by Yi Lin - in classification,” Statistics and Probability Letters , 2004
"... Classification ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
Classification

Spectral Algorithms for Supervised Learning

by L. Lo Gerfo, L. Rosasco, F. Odone, E. De Vito, A. Verri , 2007
"... We discuss how a large class of regularization methods, collectively known as spectral regularization and originally designed for solving ill-posed inverse problems, gives rise to regularized learning algorithms. All these algorithms are consistent kernel methods which can be easily implemented. The ..."
Abstract - Cited by 6 (3 self) - Add to MetaCart
We discuss how a large class of regularization methods, collectively known as spectral regularization and originally designed for solving ill-posed inverse problems, gives rise to regularized learning algorithms. All these algorithms are consistent kernel methods which can be easily implemented. The intuition behind their derivation is that the same principle allowing to numerically stabilize a matrix inversion problem

Volatility Estimation with Functional Gradient Descent for Very High-Dimensional Financial Time Series

by Francesco Audrino, Peter Bühlmann - the Journal of Computational Finance , 2002
"... (Revised Version, forthcoming in Journal of Computational Finance) We propose a functional gradient descent algorithm (FGD) for estimating volatility and conditional covariances (given the past) for very high-dimensional financial time series of asset price returns. FGD is a kind of hybrid of nonpar ..."
Abstract - Cited by 4 (3 self) - Add to MetaCart
(Revised Version, forthcoming in Journal of Computational Finance) We propose a functional gradient descent algorithm (FGD) for estimating volatility and conditional covariances (given the past) for very high-dimensional financial time series of asset price returns. FGD is a kind of hybrid of nonparametric statistical function estimation and numerical optimization. Our FGD algorithm is computationally feasible in multivariate problems with dozens up to thousands of individual return series. Moreover, we demonstrate on some synthetic and real data-sets with dimensions up to 100, that it yields significantly, much better predictions than more classical approaches such as a constant conditional correlation GARCH-type model. Since our FGD algorithm is constructed from a generic algorithm, the technique can be adapted to other problems of learning in very high dimensions. Heading: High-dimensional volatility estimation 1 1
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University