Results 1  10
of
31
Learning by mirror averaging
 The Annals of Statistics
"... Given a finite collection of estimators or classifiers, we study the problem of model selection type aggregation, that is, we construct a new estimator or classifier, called aggregate, which is nearly as good as the best among them with respect to a given risk criterion. We define our aggregate by a ..."
Abstract

Cited by 33 (3 self)
 Add to MetaCart
(Show Context)
Given a finite collection of estimators or classifiers, we study the problem of model selection type aggregation, that is, we construct a new estimator or classifier, called aggregate, which is nearly as good as the best among them with respect to a given risk criterion. We define our aggregate by a simple recursive procedure which solves an auxiliary stochastic linear programming problem related to the original nonlinear one and constitutes a special case of the mirror averaging algorithm. We show that the aggregate satisfies sharp oracle inequalities under some general assumptions. The results are applied to several problems including regression, classification and density estimation. 1. Introduction. Several
Bayesian nonlocal means filter, image redundancy and adaptive dictionaries for noise removal
 In Proc. Conf. ScaleSpace and Variational Meth. (SSVM’ 07
, 2007
"... Abstract. Partial Differential equations (PDE), waveletsbased methods and neighborhood filters were proposed as locally adaptive machines for noise removal. Recently, Buades, Coll and Morel proposed the NonLocal (NL) means filter for image denoising. This method replaces a noisy pixel by the weig ..."
Abstract

Cited by 24 (6 self)
 Add to MetaCart
(Show Context)
Abstract. Partial Differential equations (PDE), waveletsbased methods and neighborhood filters were proposed as locally adaptive machines for noise removal. Recently, Buades, Coll and Morel proposed the NonLocal (NL) means filter for image denoising. This method replaces a noisy pixel by the weighted average of other image pixels with weights reflecting the similarity between local neighborhoods of the pixel being processed and the other pixels. The NLmeans filter was proposed as an intuitive neighborhood filter but theoretical connections to diffusion and nonparametric estimation approaches are also given by the authors. In this paper we propose another bridge, and show that the NLmeans filter also emerges from the Bayesian approach with new arguments. Based on this observation, we show how the performance of this filter can be significantly improved by introducing adaptive local dictionaries and a new statistical distance measure to compare patches. The new Bayesian NLmeans filter is better parametrized and the amount of smoothing is directly determined by the noise variance (estimated from image data) given the patch size. Experimental results are given for real images with artificial Gaussian noise added, and for images with real imagedependent noise. 1
Fast learning rates in statistical inference through aggregation
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2008
"... We develop minimax optimal risk bounds for the general learning task consisting in predicting as well as the best function in a reference set G up to the smallest possible additive term, called the convergence rate. When the reference set is finite and when n denotes the size of the training data, w ..."
Abstract

Cited by 23 (5 self)
 Add to MetaCart
(Show Context)
We develop minimax optimal risk bounds for the general learning task consisting in predicting as well as the best function in a reference set G up to the smallest possible additive term, called the convergence rate. When the reference set is finite and when n denotes the size of the training data, we provide minimax convergence rates of the form C () log G  v with tight evaluation of the positive constant C and with n exact 0 < v ≤ 1, the latter value depending on the convexity of the loss function and on the level of noise in the output distribution. The risk upper bounds are based on a sequential randomized algorithm, which at each step concentrates on functions having both low risk and low variance with respect to the previous step prediction function. Our analysis puts forward the links between the probabilistic and worstcase viewpoints, and allows to obtain risk bounds unachievable with the standard statistical learning approach. One of the key idea of this work is to use probabilistic inequalities with respect to appropriate (Gibbs) distributions on the prediction function space instead of using them with respect to the distribution generating the data. The risk lower bounds are based on refinements of the Assouad lemma taking particularly into account the properties of the loss function. Our key example to illustrate the upper and lower bounds is to consider the Lqregression setting for which an exhaustive analysis of the convergence rates is given while q ranges in [1; +∞[.
Aggregation by exponential weighting and sharp oracle inequalities
"... Abstract. In the present paper, we study the problem of aggregation under the squared loss in the model of regression with deterministic design. We obtain sharp oracle inequalities for convex aggregates defined via exponential weights, under general assumptions on the distribution of errors and on t ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
(Show Context)
Abstract. In the present paper, we study the problem of aggregation under the squared loss in the model of regression with deterministic design. We obtain sharp oracle inequalities for convex aggregates defined via exponential weights, under general assumptions on the distribution of errors and on the functions to aggregate. We show how these results can be applied to derive a sparsity oracle inequality. 1
Simultaneous adaptation to the margin and to complexity in classification
, 2005
"... We consider the problem of adaptation to the margin and to complexity in binary classification. We suggest a learning method with a numerically easy aggregation step. Adaptivity both to the margin and complexity in classification, usually involves empirical risk minimization or Rademacher complexiti ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
We consider the problem of adaptation to the margin and to complexity in binary classification. We suggest a learning method with a numerically easy aggregation step. Adaptivity both to the margin and complexity in classification, usually involves empirical risk minimization or Rademacher complexities which lead to numerical difficulties. On the other hand there exist classifiers that are easy to compute and that converge with fast rates but are not adaptive. Combining these classifiers by our aggregation procedure we get numerically realizable adaptive classifiers that converge with fast rates.
Progressive mixture rules are deviation suboptimal
 Advances in Neural Information Processing Systems
"... We consider the learning task consisting in predicting as well as the best function in a finite reference set G up to the smallest possible additive term. If R(g) denotes the generalization error of a prediction function g, under reasonable assumptions on the loss function (typically satisfied by th ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
(Show Context)
We consider the learning task consisting in predicting as well as the best function in a finite reference set G up to the smallest possible additive term. If R(g) denotes the generalization error of a prediction function g, under reasonable assumptions on the loss function (typically satisfied by the least square loss when the output is bounded), it is known that the progressive mixture rule ˆg satisfies ER(ˆg) ≤ ming∈G R(g) + Cst log G n, (1) where n denotes the size of the training set, and E denotes the expectation w.r.t. the training set distribution.This work shows that, surprisingly, for appropriate reference sets G, the deviation convergence rate of the progressive mixture rule is no better than Cst / √ n: it fails to achieve the expected Cst/n. We also provide an algorithm which does not suffer from this drawback, and which is optimal in both deviation and expectation convergence rates. 1
Supervised Aggregation of Classifiers using Artificial Prediction Markets
"... Prediction markets are used in real life to predict outcomes of interest such as presidential elections. In this work we introduce a mathematical theory for Artificial Prediction Markets for supervised classifier aggregation and probability estimation. We introduce the artificial prediction market a ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Prediction markets are used in real life to predict outcomes of interest such as presidential elections. In this work we introduce a mathematical theory for Artificial Prediction Markets for supervised classifier aggregation and probability estimation. We introduce the artificial prediction market as a novel way to aggregate classifiers. We derive the market equations to enforce total budget conservation, show the market price uniqueness and give efficient algorithms for computing it. We show how to train the market participants by updating their budgets using training examples. We introduce classifier specialization as a new differentiating characteristic between classifiers. Finally, we present experiments using random decision rules as specialized classifiers and show that the prediction market consistently outperforms Random Forest on real and synthetic data of varying degrees of difficulty. 1.
Sparsity regret bounds for individual sequences in online linear regression
 JMLR Workshop and Conference Proceedings, 19 (COLT 2011 Proceedings):377–396
, 2011
"... We consider the problem of online linear regression on arbitrary deterministic sequences when the ambient dimension d can be much larger than the number of time rounds T. We introduce the notion of sparsity regret bound, which is a deterministic online counterpart of recent risk bounds derived in th ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We consider the problem of online linear regression on arbitrary deterministic sequences when the ambient dimension d can be much larger than the number of time rounds T. We introduce the notion of sparsity regret bound, which is a deterministic online counterpart of recent risk bounds derived in the stochastic setting under a sparsity scenario. We prove such regret bounds for an onlinelearning algorithm called SeqSEW and based on exponential weighting and datadriven truncation. In a second part we apply a parameterfree version of this algorithm to the stochastic setting (regression model with random design). This yields risk bounds of the same flavor as in Dalalyan and Tsybakov (2012a) but which solve two questions left open therein. In particular our risk bounds are adaptive (up to a logarithmic factor) to the unknown variance of the noise if the latter is Gaussian. We also address the regression model with fixed design.
On the optimality of the empirical risk minimization procedure for the Convex Aggregation problem
, 2011
"... We study the performance of empirical risk minimization (ERM), with respect to the quadratic risk, in the context of convex aggregation, in which one wants to construct a procedure whose risk is as close as possible to the best function in the convex hull of an arbitrary finite class F. We show that ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
We study the performance of empirical risk minimization (ERM), with respect to the quadratic risk, in the context of convex aggregation, in which one wants to construct a procedure whose risk is as close as possible to the best function in the convex hull of an arbitrary finite class F. We show that ERM performed in the convex hull of F is an optimal aggregation procedure for the convex aggregation problem. We also show that if this procedure is used for the problem of model selection aggregation, in which one wants to mimic the performance of the best function in F itself, then its rate is the same as the one achieved for the convex aggregation problem, and thus is far from optimal. These results are obtained in deviation and are sharp up to logarithmic factors. 1 Introduction and main results In this note, we study the optimality of the empirical risk minimization procedure in the aggregation framework. Let X be a probability space and let (X, Y) and (X1, Y1),..., (Xn, Yn) be n + 1