Results 1 
9 of
9
Estimating the Support of a HighDimensional Distribution
, 1999
"... Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propose a metho ..."
Abstract

Cited by 501 (32 self)
 Add to MetaCart
Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propose a method to approach this problem by trying to estimate a function f which is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a preliminary theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabelled d...
On the Generalization Ability of Online Learning Algorithms
 IEEE Transactions on Information Theory
, 2001
"... In this paper we show that online algorithms for classification and regression can be naturally used to obtain hypotheses with good datadependent tail bounds on their risk. Our results are proven without requiring complicated concentrationofmeasure arguments and they hold for arbitrary onlin ..."
Abstract

Cited by 133 (8 self)
 Add to MetaCart
In this paper we show that online algorithms for classification and regression can be naturally used to obtain hypotheses with good datadependent tail bounds on their risk. Our results are proven without requiring complicated concentrationofmeasure arguments and they hold for arbitrary online learning algorithms. Furthermore, when applied to concrete online algorithms, our results yield tail bounds that in many cases are comparable or better than the best known bounds.
An introduction to boosting and leveraging
 Advanced Lectures on Machine Learning, LNCS
, 2003
"... ..."
A PACBayesian approach to adaptive classification
 Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris 6 and Paris 7
, 2003
"... Abstract. This is meant to be a selfcontained presentation of adaptive classification seen from the PACBayesian point of view. Although most of the results are original, some review materials about the VC dimension and support vector machines are also included. This study falls in the field of sta ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
Abstract. This is meant to be a selfcontained presentation of adaptive classification seen from the PACBayesian point of view. Although most of the results are original, some review materials about the VC dimension and support vector machines are also included. This study falls in the field of statistical learning theory, where complex data have to be analyzed from a limited amount of informations, drawn from a finite sample. It relies on non asymptotic deviation inequalities, where the complexity of models is captured through the use of prior measures. The main improvements brought here are more localized bounds and the use of exchangeable prior distributions. Interesting consequences are drawn for the generalization properties of support vector machines and the design of new classification algorithms.
Machine Learning with Data Dependent Hypothesis Classes
 Journal of Machine Learning Research
, 2002
"... We extend the VC theory of statistical learning to data dependent spaces of classifiers. ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
We extend the VC theory of statistical learning to data dependent spaces of classifiers.
Tree Decomposition for LargeScale SVM Problems: Experimental and Theoretical Results
, 2009
"... To handle problems created by large data sets, we propose a method that uses a decision tree to decompose a data space and trains SVMs on the decomposed regions. Although there are other means of decomposing a data space, we show that the decision tree has several merits for largescale SVM training ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
To handle problems created by large data sets, we propose a method that uses a decision tree to decompose a data space and trains SVMs on the decomposed regions. Although there are other means of decomposing a data space, we show that the decision tree has several merits for largescale SVM training. First, it can classify some data points by its own means, thereby reducing the cost of SVM training applied to the remaining data points. Second, it is efficient for seeking the parameter values that maximize the validation accuracy, which helps maintain good test accuracy. Third, we can provide a generalization error bound for the classifier derived by the tree decomposition method. For experiment data sets whose size can be handled by current nonlinear, or kernelbased SVM training techniques, the proposed method can speed up the training by a factor of thousands, and still achieve comparable test accuracy.
Margin Driven Separate and Conquer by Working Set Expansion
"... Covering algorithms for binary classification build a list of onesided partial models in a greedy manner. The original motivation therefor stems from the context of rule learning where the expressiveness of a single rule is too limited to serve as standalone model. If the model space is richer, the ..."
Abstract
 Add to MetaCart
Covering algorithms for binary classification build a list of onesided partial models in a greedy manner. The original motivation therefor stems from the context of rule learning where the expressiveness of a single rule is too limited to serve as standalone model. If the model space is richer, the decomposition into subproblems is not strictly necessary but separately solved subproblems might still lead to better models specially when the subproblems are less demanding in terms of the input model. We investigate in this direction with an AQR style covering algorithm that uses an SVM base learner for discovering the subproblems along with a corresponding output model. The experimental study covers
Margin Driven Separate and Conquer by Assymmetric Loss
"... Separate and Conquer training for binary classification greedily builds a linear decision tree from the root to the leaf. This training scheme stems from the context of rule learning where a single rule is not expressiveness enough to serve as full prediction model. When we apply kernel methods a si ..."
Abstract
 Add to MetaCart
Separate and Conquer training for binary classification greedily builds a linear decision tree from the root to the leaf. This training scheme stems from the context of rule learning where a single rule is not expressiveness enough to serve as full prediction model. When we apply kernel methods a single model usually suffices. Nonetheless, it might advantageous to use multiple models that apply to different parts of the input space in this context too. Single SVMmodels have the shortcoming that there one parameter only to tradeoff regularization and error. This can be inappropriate if the classes are intermingled to different degree in different part of the input space. We try a margindriven, kernelbased separate and conquer algorithm iteratively builds a list of partial models by means of a CSVMlike objective with asymmetric losses. Falsenegatives can be undone by subsequent partial models, therefore we use a weaker loss on the class the model is to detect. Experimental results show that model lists obtained with this training scheme can indeed compensate weak kernels, though only for some of the probed data sets. They also document the main flaw of the scheme: it is likely to happen that taking out parts of the input space produce hard residual problems instead of alleviating them as
PACBAYESIAN INDUCTIVE AND TRANSDUCTIVE LEARNING
, 2006
"... Abstract: We present here a PACBayesian point of view on adaptive supervised classification. Using convex analysis on the set of posterior probability measures on the parameter space, we show how to get local measures of the complexity of the classification model involving the relative entropy of p ..."
Abstract
 Add to MetaCart
Abstract: We present here a PACBayesian point of view on adaptive supervised classification. Using convex analysis on the set of posterior probability measures on the parameter space, we show how to get local measures of the complexity of the classification model involving the relative entropy of posterior distributions with respect to Gibbs posterior measures. We then discuss relative bounds, comparing the generalization error of two classification rules, showing how the margin assumption of Mammen and Tsybakov can be replaced with some empirical measure of the covariance structure of the classification model. We also show how to associate to any posterior distribution an effective temperature relating it to the Gibbs prior distribution with the same level of expected error rate, and how to estimate this effective temperature from data, resulting in an estimator whose expected error rate converges according to the best possible power of the sample size adaptively under any margin and parametric complexity assumptions. Then we introduce a PACBayesian point of view on transductive learning and use it to improve on known Vapnik’s generalization bounds, extending them to the case when the sample is made of independent non identically distributed pairs of patterns and labels. Eventually we review briefly the construction of Support Vector Machines and show how to derive generalization bounds for them, measuring the complexity either through the number of support vectors or through transductive or inductive margin estimates.