Results 1  10
of
35
On the Rate of Convergence of Regularized Boosting Classifiers
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... A regularized boosting method is introduced, for which regularization is obtained through a penalization function. It is shown through oracle inequalities that this method is model adaptive. The rate of convergence of the probability of misclassification is investigated. It is shown that for quite ..."
Abstract

Cited by 46 (10 self)
 Add to MetaCart
A regularized boosting method is introduced, for which regularization is obtained through a penalization function. It is shown through oracle inequalities that this method is model adaptive. The rate of convergence of the probability of misclassification is investigated. It is shown that for quite a large class of distributions, the probability of error converges to the Bayes risk at a rate faster than n (V+2)/(4(V+1)) where V is the VC dimension of the "base" class whose elements are combined by boosting methods to obtain an aggregated classifier. The dimensionindependent nature of the rates may partially explain the good behavior of these methods in practical problems. Under Tsybakov's noise condition the rate of convergence is even faster. We investigate the conditions necessary to obtain such rates for different base classes. The special case of boosting using decision stumps is studied in detail. We characterize the class of classifiers realizable by aggregating decision stumps.
Statistical performance of support vector machines
 ANN. STATIST
, 2008
"... The support vector machine (SVM) algorithm is well known to the computer learning community for its very good practical results. The goal of the present paper is to study this algorithm from a statistical perspective, using tools of concentration theory and empirical processes. Our main result build ..."
Abstract

Cited by 42 (8 self)
 Add to MetaCart
The support vector machine (SVM) algorithm is well known to the computer learning community for its very good practical results. The goal of the present paper is to study this algorithm from a statistical perspective, using tools of concentration theory and empirical processes. Our main result builds on the observation made by other authors that the SVM can be viewed as a statistical regularization procedure. From this point of view, it can also be interpreted as a model selection principle using a penalized criterion. It is then possible to adapt general methods related to model selection in this framework to study two important points: (1) what is the minimum penalty and how does it compare to the penalty actually used in the SVM algorithm; (2) is it possible to obtain “oracle inequalities ” in that setting, for the specific loss function used in the SVM algorithm? We show that the answer to the latter question is positive and provides relevant insight to the former. Our result shows that it is possible to obtain fast rates of convergence for SVMs.
Fast learning rates for plugin classifiers
 Ann. Statist
, 2007
"... It has been recently shown that, under the margin (or low noise) assumption, there exist classifiers attaining fast rates of convergence of the excess Bayes risk, that is, rates faster than n −1/2. The work on this subject has suggested the following two conjectures: (i) the best achievable fast rat ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
It has been recently shown that, under the margin (or low noise) assumption, there exist classifiers attaining fast rates of convergence of the excess Bayes risk, that is, rates faster than n −1/2. The work on this subject has suggested the following two conjectures: (i) the best achievable fast rate is of the order n −1, and (ii) the plugin classifiers generally converge more slowly than the classifiers based on empirical risk minimization. We show that both conjectures are not correct. In particular, we construct plugin classifiers that can achieve not only fast, but also superfast rates, that is, rates faster than n −1. We establish minimax lower bounds showing that the obtained rates cannot be improved. 1. Introduction. Let (X,Y
Ranking and scoring using empirical risk minimization
 Proceedings of the Eighteenth Annual Conference on Computational Learning Theory (COLT
, 2005
"... Abstract. A general model is proposed for studying ranking problems. We investigate learning methods based on empirical minimization of the natural estimates of the ranking risk. The empirical estimates are of the form of a Ustatistic. Inequalities from the theory of Ustatistics and Uprocesses are ..."
Abstract

Cited by 24 (7 self)
 Add to MetaCart
Abstract. A general model is proposed for studying ranking problems. We investigate learning methods based on empirical minimization of the natural estimates of the ranking risk. The empirical estimates are of the form of a Ustatistic. Inequalities from the theory of Ustatistics and Uprocesses are used to obtain performance bounds for the empirical risk minimizers. Convex risk minimization methods are also studied to give a theoretical framework for ranking algorithms based on boosting and support vector machines. Just like in binary classification, fast rates of convergence are achieved under certain noise assumption. General sufficient conditions are proposed in several special cases that guarantee fast rates of convergence. 1
RATES OF CONVERGENCE IN ACTIVE LEARNING
 SUBMITTED TO THE ANNALS OF STATISTICS
"... We study the rates of convergence in generalization error achievable by active learning under various types of label noise. Additionally, we study the general problem of model selection for active learning with a nested hierarchy of hypothesis classes, and propose an algorithm whose error rate prova ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
We study the rates of convergence in generalization error achievable by active learning under various types of label noise. Additionally, we study the general problem of model selection for active learning with a nested hierarchy of hypothesis classes, and propose an algorithm whose error rate provably converges to the best achievable error among classifiers in the hierarchy at a rate adaptive to both the complexity of the optimal classifier and the noise conditions. In particular, we state sufficient conditions for these rates to be dramatically faster than those achievable by passive learning.
Oracle Bounds and Exact Algorithm for Dyadic Classification Trees
, 2004
"... This paper introduces a new method using dyadic decision trees for estimating a classification or a regression function in a multiclass classification problem. The estimator is based on model selection by penalized empirical loss minimization. Our work consists in two complementary parts: first, ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
This paper introduces a new method using dyadic decision trees for estimating a classification or a regression function in a multiclass classification problem. The estimator is based on model selection by penalized empirical loss minimization. Our work consists in two complementary parts: first, a theoretical analysis of the method leads to deriving oracletype inequalities for three di#erent possible loss functions.
RANKING AND EMPIRICAL MINIMIZATION OF USTATISTICS
"... The problem of ranking/ordering instances, instead of simply classifying them, has recently gained much attention in machine learning. In this paper we formulate the ranking problem in a rigorous statistical framework. The goal is to learn a ranking rule for deciding, among two instances, which one ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
The problem of ranking/ordering instances, instead of simply classifying them, has recently gained much attention in machine learning. In this paper we formulate the ranking problem in a rigorous statistical framework. The goal is to learn a ranking rule for deciding, among two instances, which one is “better, ” with minimum ranking risk. Since the natural estimates of the risk are of the form of a Ustatistic, results of the theory of Uprocesses are required for investigating the consistency of empirical risk minimizers. We establish, in particular, a tail inequality for degenerate Uprocesses, and apply it for showing that fast rates of convergence may be achieved under specific noise assumptions, just like in classification. Convex risk minimization methods are also studied. 1. Introduction. Motivated
Concentration inequalities and asymptotic results for ratio type empirical processes
 Ann. Probab
, 2006
"... Let F be a class of measurable functions on a measurable space (S, S) with values in [0, 1] and let Pn = n −1 n ∑ δXi i=1 be the empirical measure based on an i.i.d. sample (X1,...,Xn) from a probability distribution P on (S, S). We study the behavior of suprema of the following type: sup rn<σP f ≤δ ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
Let F be a class of measurable functions on a measurable space (S, S) with values in [0, 1] and let Pn = n −1 n ∑ δXi i=1 be the empirical measure based on an i.i.d. sample (X1,...,Xn) from a probability distribution P on (S, S). We study the behavior of suprema of the following type: sup rn<σP f ≤δn Pnf − Pf  φ(σPf) where σP f ≥ Var 1/2 P f and φ is a continuous, strictly increasing function with φ(0) = 0. Using Talagrand’s concentration inequality for empirical processes, we establish concentration inequalities for such suprema and use them to derive several results about their asymptotic behavior, expressing the conditions in terms of expectations of localized suprema of empirical processes. We also prove new bounds for expected values of supnorms of empirical processes in terms of the largest σP f and the L2(P) norm of the envelope of the function class, which are especially suited for estimating localized suprema. With this technique, we extend to function classes most of the known results on ratio type suprema of empirical processes, including some of Alexander’s results for VC classes of sets. We also consider applications of these results to several important problems in nonparametric statistics and in learning theory (including general excess risk bounds in empirical risk minimization and their versions for L2regression and classification and ratio type bounds for margin distributions in classification). 1. Introduction. Let
Adaptive Rates of Convergence in Active Learning
"... We study the rates of convergence in classification error achievable by active learning in the presence of label noise. Additionally, we study the more general problem of active learning with a nested hierarchy of hypothesis classes, and propose an algorithm whose error rate provably converges to th ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
We study the rates of convergence in classification error achievable by active learning in the presence of label noise. Additionally, we study the more general problem of active learning with a nested hierarchy of hypothesis classes, and propose an algorithm whose error rate provably converges to the best achievable error among classifiers in the hierarchy at a rate adaptive to both the complexity of the optimal classifier and the noise conditions. In particular, we state sufficient conditions for these rates to be dramatically faster than those achievable by passive learning. 1