Results 1  10
of
21
On the Generalization Ability of Online Learning Algorithms
 IEEE Transactions on Information Theory
, 2001
"... In this paper we show that online algorithms for classification and regression can be naturally used to obtain hypotheses with good datadependent tail bounds on their risk. Our results are proven without requiring complicated concentrationofmeasure arguments and they hold for arbitrary onlin ..."
Abstract

Cited by 133 (8 self)
 Add to MetaCart
In this paper we show that online algorithms for classification and regression can be naturally used to obtain hypotheses with good datadependent tail bounds on their risk. Our results are proven without requiring complicated concentrationofmeasure arguments and they hold for arbitrary online learning algorithms. Furthermore, when applied to concrete online algorithms, our results yield tail bounds that in many cases are comparable or better than the best known bounds.
PACBayes & Margins
 Advances in Neural Information Processing Systems 15
, 2002
"... We show two related things: (1) Given a classi er which consists of a weighted sum of features with a large margin, we can construct a stochastic classi er with negligibly larger training error rate. The stochastic classi er has a future error rate bound that depends on the margin distributio ..."
Abstract

Cited by 58 (9 self)
 Add to MetaCart
We show two related things: (1) Given a classi er which consists of a weighted sum of features with a large margin, we can construct a stochastic classi er with negligibly larger training error rate. The stochastic classi er has a future error rate bound that depends on the margin distribution and is independent of the size of the base hypothesis class.
Simplified PACBayesian margin bounds
 In COLT
, 2003
"... Abstract. The theoretical understanding of support vector machines is largely based on margin bounds for linear classifiers with unitnorm weight vectors and unitnorm feature vectors. Unitnorm margin bounds have been proved previously using fatshattering arguments and Rademacher complexity. Recen ..."
Abstract

Cited by 52 (3 self)
 Add to MetaCart
Abstract. The theoretical understanding of support vector machines is largely based on margin bounds for linear classifiers with unitnorm weight vectors and unitnorm feature vectors. Unitnorm margin bounds have been proved previously using fatshattering arguments and Rademacher complexity. Recently Langford and ShaweTaylor proved a dimensionindependent unitnorm margin bound using a relatively simple PACBayesian argument. Unfortunately, the LangfordShaweTaylor bound is stated in a variational form making direct comparison to fatshattering bounds difficult. This paper provides an explicit solution to the variational problem implicit in the LangfordShaweTaylor bound and shows that the PACBayesian margin bounds are significantly tighter. Because a PACBayesian bound is derived from a particular prior distribution over hypotheses, a PACBayesian margin bound also seems to provide insight into the nature of the learning bias underlying the bound. 1
Discriminative, Generative and Imitative Learning
, 2002
"... I propose a common framework that combines three different paradigms in machine learning: generative, discriminative and imitative learning. A generative probabilistic distribution is a principled way to model many machine learning and machine perception problems. Therein, one provides domain specif ..."
Abstract

Cited by 34 (1 self)
 Add to MetaCart
I propose a common framework that combines three different paradigms in machine learning: generative, discriminative and imitative learning. A generative probabilistic distribution is a principled way to model many machine learning and machine perception problems. Therein, one provides domain specific knowledge in terms of structure and parameter priors over the joint space of variables. Bayesian networks and Bayesian statistics provide a rich and flexible language for specifying this knowledge and subsequently refining it with data and observations. The final result is a distribution that is a good generator of novel exemplars.
Generalization Error Bounds for Bayesian Mixture Algorithms
 Journal of Machine Learning Research
, 2003
"... Bayesian approaches to learning and estimation have played a significant role in the Statistics literature over many years. While they are often provably optimal in a frequentist setting, and lead to excellent performance in practical applications, there have not been many precise characterizations ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
Bayesian approaches to learning and estimation have played a significant role in the Statistics literature over many years. While they are often provably optimal in a frequentist setting, and lead to excellent performance in practical applications, there have not been many precise characterizations of their performance for finite sample sizes under general conditions. In this paper we consider the class of Bayesian mixture algorithms, where an estimator is formed by constructing a datadependent mixture over some hypothesis space. Similarly to what is observed in practice, our results demonstrate that mixture approaches are particularly robust, and allow for the construction of highly complex estimators, while avoiding undesirable overfitting effects. Our results, while being datadependent in nature, are insensitive to the underlying model assumptions, and apply whether or not these hold. At a technical level, the approach applies to unbounded functions, constrained only by certain moment conditions. Finally, the bounds derived can be directly applied to nonBayesian mixture approaches such as Boosting and Bagging. 1.
A PACBayesian approach to adaptive classification
 Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris 6 and Paris 7
, 2003
"... Abstract. This is meant to be a selfcontained presentation of adaptive classification seen from the PACBayesian point of view. Although most of the results are original, some review materials about the VC dimension and support vector machines are also included. This study falls in the field of sta ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
Abstract. This is meant to be a selfcontained presentation of adaptive classification seen from the PACBayesian point of view. Although most of the results are original, some review materials about the VC dimension and support vector machines are also included. This study falls in the field of statistical learning theory, where complex data have to be analyzed from a limited amount of informations, drawn from a finite sample. It relies on non asymptotic deviation inequalities, where the complexity of models is captured through the use of prior measures. The main improvements brought here are more localized bounds and the use of exchangeable prior distributions. Interesting consequences are drawn for the generalization properties of support vector machines and the design of new classification algorithms.
Maximum Entropy Discrimination Markov Networks
, 2008
"... Standard maxmargin structured prediction methods concentrate directly on the inputoutput mapping, and the lack of an elegant probabilistic interpretation causes limitations. In this paper, we present a novel framework called Maximum Entropy Discrimination Markov Networks (MaxEntNet) to do Bayesian ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
Standard maxmargin structured prediction methods concentrate directly on the inputoutput mapping, and the lack of an elegant probabilistic interpretation causes limitations. In this paper, we present a novel framework called Maximum Entropy Discrimination Markov Networks (MaxEntNet) to do Bayesian maxmargin structured learning by using expected margin constraints to define a feasible distribution subspace and applying the maximum entropy principle to choose the best distribution from this subspace. We show that MaxEntNet subsumes the standard maxmargin Markov networks (M 3 N) as a spacial case where the predictive model is assumed to be linear and the parameter prior is a standard normal. Based on this understanding, we propose the Laplace maxmargin Markov networks (LapM 3 N) which use the Laplace prior instead of the standard normal. We show that the adoption of a Laplace prior of the parameter makes LapM 3 N enjoy properties expected from a sparsified M 3 N. Unlike the L1regularized maximum likelihood estimation which sets small weights to zeros to achieve sparsity, LapM 3 N posteriorly weights the parameters and features with smaller weights are shrunk more. This posterior weighting effect makes LapM 3 N more stable with respect to the magnitudes of the regularization coefficients and more generalizable. To
Mistake bounds for maximum entropy discrimination
 In Advances in Neural Information Processing Systems
, 2004
"... We establish a mistake bound for an ensemble method for classification based on maximizing the entropy of voting weights subject to margin constraints. The bound is the same as a general bound proved for the Weighted Majority Algorithm, and similar to bounds for other variants of Winnow. We prove a ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We establish a mistake bound for an ensemble method for classification based on maximizing the entropy of voting weights subject to margin constraints. The bound is the same as a general bound proved for the Weighted Majority Algorithm, and similar to bounds for other variants of Winnow. We prove a more refined bound that leads to a nearly optimal algorithm for learning disjunctions, again, based on the maximum entropy principle. We describe a simplification of the online maximum entropy method in which, after each iteration, the margin constraints are replaced with a single linear inequality. The simplified algorithm, which takes a similar form to Winnow, achieves the same mistake bounds. 1