Results 1  10
of
94
On the mathematical foundations of learning
 Bulletin of the American Mathematical Society
, 2002
"... The problem of learning is arguably at the very core of the problem of intelligence, both biological and arti cial. T. Poggio and C.R. Shelton ..."
Abstract

Cited by 284 (12 self)
 Add to MetaCart
(Show Context)
The problem of learning is arguably at the very core of the problem of intelligence, both biological and arti cial. T. Poggio and C.R. Shelton
Signal reconstruction from noisy random projections
 IEEE Trans. Inform. Theory
, 2006
"... Recent results show that a relatively small number of random projections of a signal can contain most of its salient information. It follows that if a signal is compressible in some orthonormal basis, then a very accurate reconstruction can be obtained from random projections. We extend this type of ..."
Abstract

Cited by 210 (26 self)
 Add to MetaCart
(Show Context)
Recent results show that a relatively small number of random projections of a signal can contain most of its salient information. It follows that if a signal is compressible in some orthonormal basis, then a very accurate reconstruction can be obtained from random projections. We extend this type of result to show that compressible signals can be accurately recovered from random projections contaminated with noise. We also propose a practical iterative algorithm for signal reconstruction, and briefly discuss potential applications to coding, A/D conversion, and remote wireless sensing. Index Terms sampling, signal reconstruction, random projections, denoising, wireless sensor networks
Optimal aggregation of classifiers in statistical learning
 Ann. Statist
, 2004
"... Classification can be considered as nonparametric estimation of sets, where the risk is defined by means of a specific distance between sets associated with misclassification error. It is shown that the rates of convergence of classifiers depend on two parameters: the complexity of the class of cand ..."
Abstract

Cited by 190 (6 self)
 Add to MetaCart
Classification can be considered as nonparametric estimation of sets, where the risk is defined by means of a specific distance between sets associated with misclassification error. It is shown that the rates of convergence of classifiers depend on two parameters: the complexity of the class of candidate sets and the margin parameter. The dependence is explicitly given, indicating that optimal fast rates approaching O(n−1) can be attained, where n is the sample size, and that the proposed classifiers have the property of robustness to the margin. The main result of the paper concerns optimal aggregation of classifiers: we suggest a classifier that automatically adapts both to the complexity and to the margin, and attains the optimal fast rates, up to a logarithmic factor. 1. Introduction. Let (Xi,Yi)
Empirical margin distributions and bounding the generalization error of combined classifiers
 Ann. Statist
, 2002
"... Dedicated to A.V. Skorohod on his seventieth birthday We prove new probabilistic upper bounds on generalization error of complex classifiers that are combinations of simple classifiers. Such combinations could be implemented by neural networks or by voting methods of combining the classifiers, such ..."
Abstract

Cited by 149 (8 self)
 Add to MetaCart
Dedicated to A.V. Skorohod on his seventieth birthday We prove new probabilistic upper bounds on generalization error of complex classifiers that are combinations of simple classifiers. Such combinations could be implemented by neural networks or by voting methods of combining the classifiers, such as boosting and bagging. The bounds are in terms of the empirical distribution of the margin of the combined classifier. They are based on the methods of the theory of Gaussian and empirical processes (comparison inequalities, symmetrization method, concentration inequalities) and they improve previous results of Bartlett (1998) on bounding the generalization error of neural networks in terms of ℓ1norms of the weights of neurons and of Schapire, Freund, Bartlett and Lee (1998) on bounding the generalization error of boosting. We also obtain rates of convergence in Lévy distance of empirical margin distribution to the true margin distribution uniformly over the classes of classifiers and prove the optimality of these rates.
Some PACBayesian Theorems
 Machine Learning
, 1998
"... This paper gives PAC guarantees for "Bayesian" algorithms  algorithms that optimize risk minimization expressions involving a prior probability and a likelihood for the training data. PACBayesian algorithms are motivated by a desire to provide an informative prior encoding informat ..."
Abstract

Cited by 127 (4 self)
 Add to MetaCart
(Show Context)
This paper gives PAC guarantees for "Bayesian" algorithms  algorithms that optimize risk minimization expressions involving a prior probability and a likelihood for the training data. PACBayesian algorithms are motivated by a desire to provide an informative prior encoding information about the expected experimental setting but still having PAC performance guarantees over all IID settings. The PACBayesian theorems given here apply to an arbitrary prior measure on an arbitrary concept space. These theorems provide an alternative to the use of VC dimension in proving PAC bounds for parameterized concepts. 1 INTRODUCTION Much of modern learning theory can be divided into two seemingly separate areas  Bayesian inference and PAC learning. Both areas study learning algorithms which take as input training data and produce as output a concept or model which can then be tested on test data. In both areas learning algorithms are associated with correctness theorems. PAC correct...
A tutorial introduction to the minimum description length principle
 in Advances in Minimum Description Length: Theory and Applications. 2005
"... ..."
(Show Context)
PACBayesian Model Averaging
 In Proceedings of the Twelfth Annual Conference on Computational Learning Theory
, 1999
"... PACBayesian learning methods combine the informative priors of Bayesian methods with distributionfree PAC guarantees. Building on earlier methods for PACBayesian model selection, this paper presents a method for PACBayesian model averaging. The main result is a bound on generalization error of a ..."
Abstract

Cited by 90 (3 self)
 Add to MetaCart
(Show Context)
PACBayesian learning methods combine the informative priors of Bayesian methods with distributionfree PAC guarantees. Building on earlier methods for PACBayesian model selection, this paper presents a method for PACBayesian model averaging. The main result is a bound on generalization error of an arbitrary weighted mixture of concepts that depends on the empirical error of that mixture and the KLdivergence of the mixture from the prior. A simple characterization is also given for the error bound achieved by the optimal weighting. 1
Improving Regression Estimation: Averaging Methods for Variance Reduction with Extensions to General Convex Measure Optimization
, 1993
"... ..."
Theory of classification: A survey of some recent advances
, 2005
"... The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results. ..."
Abstract

Cited by 80 (3 self)
 Add to MetaCart
The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results.