Results 1  10
of
12
Transductive Rademacher complexity and its applications
 in Proceedings of COLT07, 20th Annual Conference on Learning Theory
, 2007
"... Abstract We develop a technique for deriving datadependent error bounds for transductive learning algorithms based on transductive Rademacher complexity. Our technique is based on a novel general error bound for transduction in terms of transductive Rademacher complexity, together with a novel bou ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
(Show Context)
Abstract We develop a technique for deriving datadependent error bounds for transductive learning algorithms based on transductive Rademacher complexity. Our technique is based on a novel general error bound for transduction in terms of transductive Rademacher complexity, together with a novel bounding technique for Rademacher averages for particular algorithms, in terms of their "unlabeledlabeled" representation. This technique is relevant to many advanced graphbased transductive algorithms and we demonstrate its effectiveness by deriving error bounds to three well known algorithms. Finally, we present a new PACBayesian bound for mixtures of transductive algorithms based on our Rademacher bounds.
Combining PACBayesian and Generic Chaining Bounds
, 2007
"... There exist many different generalization error bounds in statistical learning theory. Each of these bounds contains an improvement over the others for certain situations or algorithms. Our goal is, first, to underline the links between these bounds, and second, to combine the different improvements ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
There exist many different generalization error bounds in statistical learning theory. Each of these bounds contains an improvement over the others for certain situations or algorithms. Our goal is, first, to underline the links between these bounds, and second, to combine the different improvements into a single bound. In particular we combine the PACBayes approach introduced by McAllester (1998), which is interesting for randomized predictions, with the optimal union bound provided by the generic chaining technique developed by Fernique and Talagrand (see Talagrand, 1996), in a way that also takes into account the variance of the combined functions. We also show how this connects to Rademacher based bounds.
Mixability in statistical learning
 In Advances in Neural Information Processing Systems
"... Statistical learning and sequential prediction are two different but related formalisms to study the quality of predictions. Mapping out their relations and transferring ideas is an active area of investigation. We provide another piece of the puzzle by showing that an important concept in sequent ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Statistical learning and sequential prediction are two different but related formalisms to study the quality of predictions. Mapping out their relations and transferring ideas is an active area of investigation. We provide another piece of the puzzle by showing that an important concept in sequential prediction, the mixability of a loss, has a natural counterpart in the statistical setting, which we call stochastic mixability. Just as ordinary mixability characterizes fast rates for the worstcase regret in sequential prediction, stochastic mixability characterizes fast rates in statistical learning. We show that, in the special case of logloss, stochastic mixability reduces to a wellknown (but usually unnamed) martingale condition, which is used in existing convergence theorems for minimum description length and Bayesian inference. In the case of 0/1loss, it reduces to the margin condition of Mammen and Tsybakov, and in the case that the model under consideration contains all possible predictors, it is equivalent to ordinary mixability. 1
Model selection type aggregation with better variance control or Fast learning rates in statistical inference through aggregation
, 2008
"... ..."
Choice of V for V Fold CrossValidation in LeastSquares Density Estimation
, 2016
"... Abstract This paper studies V fold crossvalidation for model selection in leastsquares density estimation. The goal is to provide theoretical grounds for choosing V in order to minimize the leastsquares loss of the selected estimator. We first prove a nonasymptotic oracle inequality for V fol ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract This paper studies V fold crossvalidation for model selection in leastsquares density estimation. The goal is to provide theoretical grounds for choosing V in order to minimize the leastsquares loss of the selected estimator. We first prove a nonasymptotic oracle inequality for V fold crossvalidation and its biascorrected version (V fold penalization). In particular, this result implies that V fold penalization is asymptotically optimal in the nonparametric case. Then, we compute the variance of V fold crossvalidation and related criteria, as well as the variance of key quantities for model selection performance. We show that these variances depend on V like 1 + 4/(V − 1), at least in some particular cases, suggesting that the performance increases much from V = 2 to V = 5 or 10, and then is almost constant. Overall, this can explain the common advice to take V = 5 at least in our setting and when the computational power is limited, as supported by some simulation experiments. An oracle inequality and exact formulas for the variance are also proved for MonteCarlo crossvalidation, also known as repeated crossvalidation, where the parameter V is replaced by the number B of random splits of the data.
Mixability in Statistical Learning Tim
"... Statistical learning and sequential prediction are two different but related formalisms to study the quality of predictions. Mapping out their relations and transferring ideas is an active area of investigation. We provide another piece of the puzzle by showing that an important concept in sequentia ..."
Abstract
 Add to MetaCart
Statistical learning and sequential prediction are two different but related formalisms to study the quality of predictions. Mapping out their relations and transferring ideas is an active area of investigation. We provide another piece of the puzzle by showing that an important concept in sequential prediction, the mixability of a loss, has a natural counterpart in the statistical setting, which we call stochastic mixability. Just as ordinary mixability characterizes fast rates for the worstcase regret in sequential prediction, stochastic mixability characterizes fast rates in statistical learning. We show that, in the special case of logloss, stochastic mixability reduces to a wellknown (but usually unnamed) martingale condition, which is used in existing convergence theorems for minimum description length and Bayesian inference. In the case of 0/1loss, it reduces to the margin condition of Mammen and Tsybakov, and in the case that the model under consideration contains all possible predictors, it is equivalent to ordinary mixability. 1
REQUIREMENTS
"... This thesis would be impossible without a wise guidance of my advisor, Prof. Ran ElYaniv. The path to the results presented in this thesis was long and I thank Ran for never loosing the faith in the final success. In both peaceful and stressful times, Ran constantly supported and navigated me towar ..."
Abstract
 Add to MetaCart
This thesis would be impossible without a wise guidance of my advisor, Prof. Ran ElYaniv. The path to the results presented in this thesis was long and I thank Ran for never loosing the faith in the final success. In both peaceful and stressful times, Ran constantly supported and navigated me towards stronger results. Many thanks go to my coauthors during the thesis period: Ron
Randomized estimators, Statistical learning
, 2009
"... ABSTRACT: We consider the problem of predicting as well as the best linear combination of d given functions in least squares regression, and variants of this problem including constraints on the parameters of the linear combination. When the input distribution is known, there already exists an algor ..."
Abstract
 Add to MetaCart
ABSTRACT: We consider the problem of predicting as well as the best linear combination of d given functions in least squares regression, and variants of this problem including constraints on the parameters of the linear combination. When the input distribution is known, there already exists an algorithm having an expected excess risk of order d/n, where n is the size of the training data. Without this strong assumption, standard results often contain a multiplicative log n factor, and require some additional assumptions like uniform boundedness of the ddimensional input representation and exponential moments of the output. This work provides new risk bounds for the ridge estimator and the ordinary least squares estimator, and their variants. It also provides shrinkage procedures with convergence rate d/n (i.e., without the logarithmic factor) in expectation and in deviations, under various assumptions. The key common surprising factor of these results is the absence of exponential moment condition on the output distribution while achieving exponential deviations. All risk bounds are obtained through a PACBayesian analysis on truncated differences of losses. Finally, we show that some of these results are not particular to the