Results 1 - 10
of
26
Statistical performance of Support Vector Machines
, 2004
"... Abstract The "Support Vector Machine " algorithm is well known to the computer learningcommunity for its very good practical results. The goal of the present paper to study this algorithm from a statistical perspective, using tools of concentration theory and empiricalprocesses. ..."
Abstract
-
Cited by 30 (6 self)
- Add to MetaCart
Abstract The "Support Vector Machine " algorithm is well known to the computer learningcommunity for its very good practical results. The goal of the present paper to study this algorithm from a statistical perspective, using tools of concentration theory and empiricalprocesses.
Statistical analysis of some multi-category large margin classification methods
- Journal of Machine Learning Research
, 2004
"... The purpose of this paper is to investigate statistical properties of risk minimization based multicategory classification methods. These methods can be considered as natural extensions of binary large margin classification. We establish conditions that guarantee the consistency of classifiers obtai ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
The purpose of this paper is to investigate statistical properties of risk minimization based multicategory classification methods. These methods can be considered as natural extensions of binary large margin classification. We establish conditions that guarantee the consistency of classifiers obtained in the risk minimization framework with respect to the classification error. Examples are provided for four specific forms of the general formulation, which extend a number of known methods. Using these examples, we show that some risk minimization formulations can also be used to obtain conditional probability estimates for the underlying problem. Such conditional probability information can be useful for statistical inferencing tasks beyond classification. 1.
Boosting with early stopping: convergence and consistency
- Annals of Statistics
, 2003
"... Abstract Boosting is one of the most significant advances in machine learning for classification and regression. In its original and computationally flexible version, boosting seeks to minimize empirically a loss function in a greedy fashion. The resulted estimator takes an additive function form an ..."
Abstract
-
Cited by 28 (4 self)
- Add to MetaCart
Abstract Boosting is one of the most significant advances in machine learning for classification and regression. In its original and computationally flexible version, boosting seeks to minimize empirically a loss function in a greedy fashion. The resulted estimator takes an additive function form and is built iteratively by applying a base estimator (or learner) to updated samples depending on the previous iterations. An unusual regularization technique, early stopping, is employed based on CV or a test set. This paper studies numerical convergence, consistency, and statistical rates of convergence of boosting with early stopping, when it is carried out over the linear span of a family of basis functions. For general loss functions, we prove the convergence of boosting's greedy optimization to the infinimum of the loss function over the linear span. Using the numerical convergence result, we find early stopping strategies under which boosting is shown to be consistent based on iid samples, and we obtain bounds on the rates of convergence for boosting estimators. Simulation studies are also presented to illustrate the relevance of our theoretical results for providing insights to practical aspects of boosting. As a side product, these results also reveal the importance of restricting the greedy search step sizes, as known in practice through the works of Friedman and others. Moreover, our results lead to a rigorous proof that for a linearly separable problem, AdaBoost with ffl! 0 stepsize becomes an L1-margin maximizer when left to run to convergence. 1 Introduction In this paper we consider boosting algorithms for classification and regression. These algorithms present one of the major progresses in machine learning. In their original version, the computational aspect is explicitly specified as part of the estimator/algorithm. That is, the empirical minimization of an appropriate loss function is carried out in a greedy fashion, which means that at each step, a basis function that leads to the largest reduction of empirical risk is added into the estimator. This specification distinguishes boosting from other statistical procedures which are defined by an empirical minimization of a loss function without the numerical optimization details.
Minimax-optimal classification with dyadic decision trees
- IEEE TRANSACTIONS ON INFORMATION THEORY
, 2006
"... Decision trees are among the most popular types of classifiers, with interpretability and ease of im-plementation being among their chief attributes. Despite the widespread use of decision trees, theoretical analysis of their performance has only begun to emerge in recent years. In this paper it is ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
Decision trees are among the most popular types of classifiers, with interpretability and ease of im-plementation being among their chief attributes. Despite the widespread use of decision trees, theoretical analysis of their performance has only begun to emerge in recent years. In this paper it is shown that a new family of decision trees, dyadic decision trees (DDTs), attain nearly optimal (in a minimax sense) rates of convergence for a broad range of classification problems. Furthermore, DDTs are surprisingly adaptive in three important respects: They automatically (1) adapt to favorable conditions near the Bayes decision boundary; (2) focus on data distributed on lower dimensional manifolds; and (3) reject irrelevant features. DDTs are constructed by penalized empirical risk minimization using a new data-dependent penalty and may be computed exactly with computational complexity that is nearly linear in the training sample size. DDTs are the first classifier known to achieve nearly optimal rates for the diverse class of distributions studied here while also being practical and implementable. This is also the first study (of which we are aware) to consider rates for adaptation to intrinsic data dimension and relevant features.
A Neyman-Pearson approach to statistical learning
- IEEE Trans. Inform. Theory
, 2005
"... The Neyman-Pearson (NP) approach to hypothesis testing is useful in situations where different types of error have different consequences or a priori probabilities are unknown. For any α> 0, the Neyman-Pearson lemma specifies the most powerful test of size α, but assumes the distributions for each h ..."
Abstract
-
Cited by 19 (8 self)
- Add to MetaCart
The Neyman-Pearson (NP) approach to hypothesis testing is useful in situations where different types of error have different consequences or a priori probabilities are unknown. For any α> 0, the Neyman-Pearson lemma specifies the most powerful test of size α, but assumes the distributions for each hypothesis are known or (in some cases) the likelihood ratio is monotonic in an unknown parameter. This paper investigates an extension of NP theory to situations in which one has no knowledge of the underlying distributions except for a collection of independent and identically distributed training examples from each hypothesis. Building on a “fundamental lemma ” of Cannon et al., we demonstrate that several concepts from statistical learning theory have counterparts in the NP context. Specifically, we consider constrained versions of empirical risk minimization (NP-ERM) and structural risk minimization (NP-SRM), and prove performance guarantees for both. General conditions are given under which NP-SRM leads to strong universal consistency. We also apply NP-SRM to (dyadic) decision trees to derive rates of convergence. Finally, we present explicit algorithms to implement NP-SRM for histograms and dyadic decision trees. 1
BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING
- SUBMITTED TO STATISTICAL SCIENCE
"... We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and corresponding Akai ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and corresponding Akaike or Bayesian information criteria, particularly useful for regularization and variable selection in high-dimensional covariate spaces, are discussed as well. The practical aspects of boosting procedures for fitting statistical models are illustrated by means of the dedicated open-source software package mboost. This package implements functions which can be used for model fitting, prediction and variable selection. It is flexible, allowing for the implementation of new boosting algorithms optimizing user-specified loss functions.
Oracle Bounds and Exact Algorithm for Dyadic Classification Trees
, 2004
"... This paper introduces a new method using dyadic decision trees for estimating a classification or a regression function in a multiclass classification problem. The estimator is based on model selection by penalized empirical loss minimization. Our work consists in two complementary parts: first, ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
This paper introduces a new method using dyadic decision trees for estimating a classification or a regression function in a multiclass classification problem. The estimator is based on model selection by penalized empirical loss minimization. Our work consists in two complementary parts: first, a theoretical analysis of the method leads to deriving oracle-type inequalities for three di#erent possible loss functions.
Ranking and scoring using empirical risk minimization
- Proceedings of the Eighteenth Annual Conference on Computational Learning Theory (COLT
, 2005
"... Abstract. A general model is proposed for studying ranking problems. We investigate learning methods based on empirical minimization of the natural estimates of the ranking risk. The empirical estimates are of the form of a U-statistic. Inequalities from the theory of U-statistics and Uprocesses are ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Abstract. A general model is proposed for studying ranking problems. We investigate learning methods based on empirical minimization of the natural estimates of the ranking risk. The empirical estimates are of the form of a U-statistic. Inequalities from the theory of U-statistics and Uprocesses are used to obtain performance bounds for the empirical risk minimizers. Convex risk minimization methods are also studied to give a theoretical framework for ranking algorithms based on boosting and support vector machines. Just like in binary classification, fast rates of convergence are achieved under certain noise assumption. General sufficient conditions are proposed in several special cases that guarantee fast rates of convergence. 1
AdaBoost is consistent
- In Advances in Neural Information Processing Systems
, 2006
"... The risk, or probability of error, of the classifier produced by the AdaBoost algorithm is investigated. In particular, we consider the stopping strategy to be used in AdaBoost to achieve universal consistency. We show that provided AdaBoost is stopped after n 1−ε iterations—for sample size n and ε ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
The risk, or probability of error, of the classifier produced by the AdaBoost algorithm is investigated. In particular, we consider the stopping strategy to be used in AdaBoost to achieve universal consistency. We show that provided AdaBoost is stopped after n 1−ε iterations—for sample size n and ε ∈ (0,1)—the sequence of risks of the classifiers it produces approaches the Bayes risk.

