Results 1  10
of
19
PACBayesian Learning of Linear Classifiers
"... We present a general PACBayes theorem from which all known PACBayes risk bounds are obtained as particular cases. We also propose different learning algorithms for finding linear classifiers that minimize these bounds. These learning algorithms are generally competitive with both AdaBoost and the ..."
Abstract

Cited by 59 (8 self)
 Add to MetaCart
(Show Context)
We present a general PACBayes theorem from which all known PACBayes risk bounds are obtained as particular cases. We also propose different learning algorithms for finding linear classifiers that minimize these bounds. These learning algorithms are generally competitive with both AdaBoost and the SVM. 1. Intoduction For the classification problem, we are given a training set of examples—each generated according to the same (but unknown) distribution D, and the goal is to find a classifier that minimizes the true risk (i.e., the generalization error or the expected loss). Since the true risk is defined only with respect to the unknown distribution D, we are automatically confronted with the problem of specifying exactly what we should optimize on the training data to find a classifier having the smallest possible true risk. Many different specifications (of what should be optimized on the training data) have been provided by using different inductive principles but the final guarantee on the true risk, however, always comes with a socalled risk bound that holds uniformly over a set of classifiers. Hence, the formal justification of a learning strategy has always come a posteriori via a risk bound. Since a risk bound can be computed from what a classifier achieves on the training data, it automatically suggests the following optimization problem for learning algorithms: given a risk (upper) bound, find a classifier that minimizes it. Despite the enormous impact they had on our understanding of learning, the VC bounds are generally very loose. These bounds are characterized by the fact that
VC Theory of Large Margin MultiCategory Classifiers
"... In the context of discriminant analysis, Vapnik’s statistical learning theory has mainly been developed in three directions: the computation of dichotomies with binaryvalued functions, the computation of dichotomies with realvalued functions, and the computation of polytomies with functions taking ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
(Show Context)
In the context of discriminant analysis, Vapnik’s statistical learning theory has mainly been developed in three directions: the computation of dichotomies with binaryvalued functions, the computation of dichotomies with realvalued functions, and the computation of polytomies with functions taking their values in finite sets, typically the set of categories itself. The case of classes of vectorvalued functions used to compute polytomies has seldom been considered independently, which is unsatisfactory, for three main reasons. First, this case encompasses the other ones. Second, it cannot be treated appropriately through a naïve extension of the results devoted to the computation of dichotomies. Third, most of the classification problems met in practice involve multiple categories. In this paper, a VC theory of large margin multicategory classifiers is introduced. Central in this theory are generalized VC dimensions called the γΨdimensions. First, a uniform convergence bound on the risk of the classifiers of interest is derived. The capacity measure involved in this bound is a covering number. This covering number can be upper bounded in terms of the γΨdimensions thanks to generalizations of Sauer’s lemma, as is illustrated in the specific case of the scalesensitive Natarajan dimension. A bound on this latter dimension is then computed for the class of functions on which multiclass SVMs are based. This makes it possible to apply the structural risk minimization inductive principle to those machines.
Chromatic pac bayes bounds for noniid data
 In Twelfth International Conference on Artificial Intelligence and Statistics. Omnipress
, 2009
"... PACBayes bounds are among the most accurate generalization bounds for classifiers learned with IID data, and it is particularly so for margin classifiers. However, there are many practical cases where the training data show some dependencies and where the traditional IID assumption does not apply. ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
PACBayes bounds are among the most accurate generalization bounds for classifiers learned with IID data, and it is particularly so for margin classifiers. However, there are many practical cases where the training data show some dependencies and where the traditional IID assumption does not apply. Stating generalization bounds for such frameworks is therefore of the utmost interest, both from theoretical and practical standpoints. In this work, we propose the first – to the best of our knowledge – PACBayes generalization bounds for classifiers trained on data exhibiting dependencies. The approach undertaken to establish our results is based on the decomposition of a socalled dependency graph that encodes the dependencies within the data, in sets of independent data, through the tool of graph fractional covers. Our bounds are very general, since being able to find an upper bound on the (fractional) chromatic number of the dependency graph is sufficient to get new PACBayes bounds for specific settings. We show how our results can be used to derive bounds for bipartite ranking and windowed prediction on sequential data. 1
A PACBayesian Approach for Domain Adaptation with Specialization to Linear Classifiers
, 2013
"... We provide a first PACBayesian analysis for domain adaptation (DA) which arises when the learning and test distributions differ. It relies on a novel distribution pseudodistance based on a disagreement averaging. Using this measure, we derive a PACBayesian DA bound for the stochastic Gibbs classif ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
We provide a first PACBayesian analysis for domain adaptation (DA) which arises when the learning and test distributions differ. It relies on a novel distribution pseudodistance based on a disagreement averaging. Using this measure, we derive a PACBayesian DA bound for the stochastic Gibbs classifier. This bound has the advantage of being directly optimizable for any hypothesis space. We specialize it to linear classifiers, and design a learning algorithm which shows interesting results on a synthetic problem and on a popular sentiment annotation task. This opens the door to tackling DA tasks by making use of all the PACBayesian tools. 1.
Dimensionality Dependent PACBayes Margin Bound
"... Margin is one of the most important concepts in machine learning. Previous margin bounds, both for SVM and for boosting, are dimensionality independent. A major advantage of this dimensionality independency is that it can explain the excellent performance of SVM whose feature spaces are often of hig ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Margin is one of the most important concepts in machine learning. Previous margin bounds, both for SVM and for boosting, are dimensionality independent. A major advantage of this dimensionality independency is that it can explain the excellent performance of SVM whose feature spaces are often of high or infinite dimension. In this paper we address the problem whether such dimensionality independency is intrinsic for the margin bounds. We prove a dimensionality dependent PACBayes margin bound. The bound is monotone increasing with respect to the dimension when keeping all other factors fixed. We show that our bound is strictly sharper than a previously wellknown PACBayes margin bound if the feature space is of finite dimension; and the two bounds tend to be equivalent as the dimension goes to infinity. In addition, we show that the VC bound for linear classifiers can be recovered from our bound under mild conditions. We conduct extensive experiments on benchmark datasets and find that the new bound is useful for model selection and is usually significantly sharper than the dimensionality independent PACBayes margin bound as well as the VC bound for linear classifiers. 1
PACBayes Generalization Bounds for Randomized Structured Prediction
"... We present a new PACBayes generalization bound for structured prediction that is applicable to perturbationbased probabilistic models. Our analysis explores the relationship between perturbationbased modeling and the PACBayes framework, and connects to recently introduced generalization bounds f ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We present a new PACBayes generalization bound for structured prediction that is applicable to perturbationbased probabilistic models. Our analysis explores the relationship between perturbationbased modeling and the PACBayes framework, and connects to recently introduced generalization bounds for structured prediction. We obtain the first PACBayes bounds that guarantee better generalization as the size of each structured example grows. 1
OROZCO et al.: HEAD POSE CLASSIFICATION IN CROWDED SCENES 1 Head Pose Classification in Crowded Scenes
"... We propose a novel technique for head pose classification in crowded public space under poor lighting and in lowresolution video images. Unlike previous approaches, we avoid the need for explicit segmentation of skin and hair regions from a head image and implicitly encode spatial information using ..."
Abstract
 Add to MetaCart
(Show Context)
We propose a novel technique for head pose classification in crowded public space under poor lighting and in lowresolution video images. Unlike previous approaches, we avoid the need for explicit segmentation of skin and hair regions from a head image and implicitly encode spatial information using a grid map for more robustness given lowresolution images. Specifically, a new head pose descriptor is formulated using similarity distance maps by indexing each pixel of a head image to the mean appearance templates of head images at different poses. These distance feature maps are then used to train a multiclass Support Vector Machine for pose classification. Our approach is evaluated against established techniques [3, 13, 14] using the iLIDS underground scene dataset [9] under challenging lighting and viewing conditions. The results demonstrate that our model gives significant improvement in head pose estimation accuracy, with over 80% pose recognition rate against 32 % from the best of existing models. 1
Text classification with a Primal SVM endowed with domain knowledge
"... In this paper we solve a document classification task by incorporating prior/domain knowledge onto the SVM. The algorithm consists in to learn a prior classifier in the primal space (words) from an ‘external ’ source of information to the text classification itself: patterns of reader’s eyes movemen ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper we solve a document classification task by incorporating prior/domain knowledge onto the SVM. The algorithm consists in to learn a prior classifier in the primal space (words) from an ‘external ’ source of information to the text classification itself: patterns of reader’s eyes movements when reading relevant words for discriminating texts. This prior weight vector is then plugged into the SVM optimisation in the primal space. Experimental results include a comparison of the proposed algorithm with plain SVM classifiers and with an alternative way of mixing textual and eye information based on the SVM2K. 1
PACBayesian Analysis of Coclustering with Extensions to Matrix Trifactorization, Graph Clustering, Pairwise Clustering, and Graphical Models
 JOURNAL OF MACHINE LEARNING RESEARCH
"... This paper promotes a novel point of view on unsupervised learning. We argue that the goal of unsupervised learning is to facilitate a solution of some higher level task, and that it should be evaluated in terms of its contribution to the solution of this task. We present an example of such an analy ..."
Abstract
 Add to MetaCart
This paper promotes a novel point of view on unsupervised learning. We argue that the goal of unsupervised learning is to facilitate a solution of some higher level task, and that it should be evaluated in terms of its contribution to the solution of this task. We present an example of such an analysis for the case of coclustering, which is a widely used approach to the analysis of data matrices. This paper identifies two possible highlevel tasks in matrix data analysis: discriminative prediction of the missing entries and estimation of the joint probability distribution of row and column variables. We derive PACBayesian generalization bounds for the expected outofsample performance of coclusteringbased solutions for these two tasks. The analysis yields regularization terms that have not been part of previous formulations of coclustering. The bounds suggest that the expected performance of coclustering is governed by a tradeoff between its empirical performance and the mutual information preserved by the cluster variables on row and column IDs. We derive an iterative projection algorithm for finding a local optimum of this tradeoff for discriminative prediction tasks. This algorithm achieved stateoftheart performance