Results 11 
18 of
18
PACBayes Bounds for the Risk of the Majority Vote and the Variance of the Gibbs Classifier
"... We propose new PACBayes bounds for the risk of the weighted majority vote that depend on the mean and variance of the error of its associated Gibbs classifier. We show that these bounds can be smaller than the risk of the Gibbs classifier and can be arbitrarily close to zero even if the risk of the ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
We propose new PACBayes bounds for the risk of the weighted majority vote that depend on the mean and variance of the error of its associated Gibbs classifier. We show that these bounds can be smaller than the risk of the Gibbs classifier and can be arbitrarily close to zero even if the risk of the Gibbs classifier is close to 1/2. Moreover, we show that these bounds can be uniformly estimated on the training data for all possible posteriors Q. Moreover, they can be improved by using a large sample of unlabelled data. 1
Competing with wild prediction rules
 Machine Learning
"... We consider the problem of online prediction competitive with a benchmark class of continuous but highly irregular prediction rules. It is known that if the benchmark class is a reproducing kernel Hilbert space, there exists a prediction algorithm whose average loss over the first N examples does n ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We consider the problem of online prediction competitive with a benchmark class of continuous but highly irregular prediction rules. It is known that if the benchmark class is a reproducing kernel Hilbert space, there exists a prediction algorithm whose average loss over the first N examples does not exceed the average loss of any prediction rule in the class plus a “regret term ” of O(N −1/2). The elements of some natural benchmark classes, however, are so irregular that these classes are not Hilbert spaces. In this paper we develop Banachspace methods to construct a prediction algorithm with a regret term of O(N −1/p), where p ∈ [2, ∞) and p − 2 reflects the degree to which the benchmark class fails to be a Hilbert space. Only the square loss function is considered. 1
Generalised Pinsker Inequalities
"... We generalise the classical Pinsker inequality which relates variational divergence to KullbackLiebler divergence in two ways: we consider arbitrary fdivergences in place of KL divergence, and we assume knowledge of a sequence of values of generalised variational divergences. We then develop a bes ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We generalise the classical Pinsker inequality which relates variational divergence to KullbackLiebler divergence in two ways: we consider arbitrary fdivergences in place of KL divergence, and we assume knowledge of a sequence of values of generalised variational divergences. We then develop a best possible inequality for this doubly generalised situation. Specialising our result to the classical case provides a new and tight explicit bound relating KL to variational divergence (solving a problem posed by Vajda some 40 years ago). The solution relies on exploiting a connection between divergences and the Bayes risk of a learning problem via an integral representation. 1
epssamples for kernels
 Proceedings 24th Annual ACMSIAM Symposium on Discrete Algorithms
, 2013
"... We study the worst case error of kernel density estimates via subset approximation. A kernel density estimate of a distribution is the convolution of that distribution with a fixed kernel (e.g. Gaussian kernel). Given a subset (i.e. a point set) of the input distribution, we can compare the kernel d ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We study the worst case error of kernel density estimates via subset approximation. A kernel density estimate of a distribution is the convolution of that distribution with a fixed kernel (e.g. Gaussian kernel). Given a subset (i.e. a point set) of the input distribution, we can compare the kernel density estimates of the input distribution with that of the subset and bound the worst case error. If the maximum error is ε, then this subset can be thought of as an εsample (aka an εapproximation) of the range space defined with the input distribution as the ground set and the fixed kernel representing the family of ranges. Interestingly, in this case the ranges are not binary, but have a continuous range (for simplicity we focus on kernels with range of [0, 1]); these allow for smoother notions of range spaces. It turns out, the use of this smoother family of range spaces has an added benefit of greatly decreasing the size required for εsamples. For instance, in the plane the size is O((1/ε 4/3) log 2/3 (1/ε)) for disks (based on VCdimension arguments) but is only O((1/ε) √ log(1/ε)) for Gaussian kernels and for kernels with bounded slope that only affect a bounded domain. These bounds are accomplished by studying the discrepancy of these “kernel ” range spaces, and here the improvement in bounds are even more pronounced. In the plane, we show the discrepancy is O ( √ log n) for these kernels, whereas for
Geometric Decision Rules for High Dimensions
"... In this paper we report on a new approach to the instancebased learning problem. ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
In this paper we report on a new approach to the instancebased learning problem.
PACBayesian Analysis of Coclustering with Extensions to Matrix Trifactorization, Graph Clustering, Pairwise Clustering, and Graphical Models
 JOURNAL OF MACHINE LEARNING RESEARCH
"... This paper promotes a novel point of view on unsupervised learning. We argue that the goal of unsupervised learning is to facilitate a solution of some higher level task, and that it should be evaluated in terms of its contribution to the solution of this task. We present an example of such an analy ..."
Abstract
 Add to MetaCart
This paper promotes a novel point of view on unsupervised learning. We argue that the goal of unsupervised learning is to facilitate a solution of some higher level task, and that it should be evaluated in terms of its contribution to the solution of this task. We present an example of such an analysis for the case of coclustering, which is a widely used approach to the analysis of data matrices. This paper identifies two possible highlevel tasks in matrix data analysis: discriminative prediction of the missing entries and estimation of the joint probability distribution of row and column variables. We derive PACBayesian generalization bounds for the expected outofsample performance of coclusteringbased solutions for these two tasks. The analysis yields regularization terms that have not been part of previous formulations of coclustering. The bounds suggest that the expected performance of coclustering is governed by a tradeoff between its empirical performance and the mutual information preserved by the cluster variables on row and column IDs. We derive an iterative projection algorithm for finding a local optimum of this tradeoff for discriminative prediction tasks. This algorithm achieved stateoftheart performance
IN FULFILMENT OF THE
"... c ○ DÁVID PÁL 2009I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. Lorem ipsum dolor sit amet, consectetur ..."
Abstract
 Add to MetaCart
c ○ DÁVID PÁL 2009I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. iii Acknowledgements I am indebted to my supervisor Prof. Shai BenDavid for his patience with which he has guided me through my PhD studies. I am very much enjoyed all the discussions we had together about our research, and computer science and mathematics in general. Most of the results in this thesis were obtained trough collaboration with my advisor, Dr. Ulrike von Luxburg from MaxPlanck Institute in Tübingen, Prof. Hans Ulrich Simon from RuhrUniversität Bochum, and Tyler Lu from University of Waterloo. It has been a great pleasure to work with them and I thank them all. I also thank Shalev BenDavid for providing the proof of Lemma 7.5. Thanks to Michael Spriggs, Steve Bahun, and especially Mustaq Ahmed for being so great office mates. I also thank to all the teachers for the beautiful lectures that I attended.