Results 1 
3 of
3
Smoothness, low noise and fast rates
 In NIPS
, 2010
"... We establish an excess risk bound of Õ HR2 n + √ HL ∗) Rn for ERM with an Hsmooth loss function and a hypothesis class with Rademacher complexity Rn, where L ∗ is the best risk achievable by the hypothesis class. For typical hypothesis classes where Rn = √ R/n, this translates to a learning rate o ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
We establish an excess risk bound of Õ HR2 n + √ HL ∗) Rn for ERM with an Hsmooth loss function and a hypothesis class with Rademacher complexity Rn, where L ∗ is the best risk achievable by the hypothesis class. For typical hypothesis classes where Rn = √ R/n, this translates to a learning rate of Õ (RH/n) in the separable (L ∗ = 0) case and Õ RH/n + √ L ∗) RH/n more generally. We also provide similar guarantees for online and stochastic convex optimization of a smooth nonnegative objective. 1
Suboptimality of penalized empirical risk minimization in classification
 In Proceedings of the 20th annual conference on Computational Learning Theory (COLT). Lecture Notes in Computer Science 4539 142–156
, 2007
"... Abstract. Let F be a set of M classification procedures with values in [−1, 1]. Given a loss function, we want to construct a procedure which mimics at the best possible rate the best procedure in F. This fastest rate is called optimal rate of aggregation. Considering a continuous scale of loss func ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
Abstract. Let F be a set of M classification procedures with values in [−1, 1]. Given a loss function, we want to construct a procedure which mimics at the best possible rate the best procedure in F. This fastest rate is called optimal rate of aggregation. Considering a continuous scale of loss functions with various types of convexity, we prove that optimal rates of aggregation can be either ((log M)/n) 1/2 or (log M)/n. We prove that, if all the M classifiers are binary, the (penalized) Empirical Risk Minimization procedures are suboptimal (even under the margin/low noise condition) when the loss function is somewhat more than convex, whereas, in that case, aggregation procedures with exponential weights achieve the optimal rate of aggregation. 1
Optimal rates of aggregation in classification under low noise assumption
, 2007
"... In the same spirit as Tsybakov, we define the optimality of an aggregation procedure in the problem of classification. Using an aggregate with exponential weights, we obtain an optimal rate of convex aggregation for the hinge risk under the margin assumption. Moreover, we obtain an optimal rate of m ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
In the same spirit as Tsybakov, we define the optimality of an aggregation procedure in the problem of classification. Using an aggregate with exponential weights, we obtain an optimal rate of convex aggregation for the hinge risk under the margin assumption. Moreover, we obtain an optimal rate of model selection aggregation under the margin assumption for the excess Bayes risk.