Results 1 
8 of
8
Schapire R., Bounds on the Sample Complexity of Bayesian Learning Using Information Theory and the VC Dimension
"... ..."
(Show Context)
Active learning using arbitrary binary valued queries
 Machine Learning
, 1993
"... The original and most widely studied PAC model for learning assumes a passive learner in the sense that the learner plays no role in obtaining information about the unknown concept. That is, the samples are simply drawn independently from some probability distribution. Some work has been done on stu ..."
Abstract

Cited by 33 (1 self)
 Add to MetaCart
(Show Context)
The original and most widely studied PAC model for learning assumes a passive learner in the sense that the learner plays no role in obtaining information about the unknown concept. That is, the samples are simply drawn independently from some probability distribution. Some work has been done on studying more powerful oracles and how they affect learnability. To find bounds on the improvement that can be expected from using oracles, we consider active learning in the sense that the learner has complete choice in the information received. Specifically, we allow the learner to ask arbitrary yes/no questions. We consider both active learning under a fixed distribution and distributionfree active learning. In the case of active learning, the underlying probability distribution is used only to measure distance between concepts. For learnability with respect to a fixed distribution, active learning does not enlarge the set of learnable concept classes, but can improve the sample complexity. For distributionfree learning, it is shown that a concept class is actively learnable iff it is finite, so that active learning is in fact less powerful than the usual passive learning model. We also consider a form of distributionfree learning in which the learner knows the distribution being used, so that 'distributionfree ' refers only to the requirement that a bound on the number of queries can be obtained uniformly over all distributions. Even with the side information of the distribution being used, a concept class is actively learnable iff it has finite VC dimension, so that active learning with the side information still does not enlarge the set of learnable concept classes.
Part 1: Overview of the Probably Approximately Correct (PAC) Learning Framework
, 1995
"... Here we survey some recent theoretical results on the efficiency of machine learning algorithms. The main tool described is the notion of Probably Approximately Correct (PAC) learning, introduced by Valiant. We define this learning model and then look at some of the results obtained in it. We then c ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
Here we survey some recent theoretical results on the efficiency of machine learning algorithms. The main tool described is the notion of Probably Approximately Correct (PAC) learning, introduced by Valiant. We define this learning model and then look at some of the results obtained in it. We then consider some criticisms of the PAC model and the extensions proposed to address these criticisms. Finally, we look briefly at other models recently proposed in computational learning theory.
Bounds on the generalization ability of Bayesian Inference and Gibbs algorithms
 Proceedings of Icann 2001
, 2001
"... Recent theoretical works applying the methods of statistical learning theory have put into relief the interest of old well known learning paradigms such as Bayesian inference and Gibbs algorithms. Sample complexity bounds have been given for such paradigms in the zero error case. This paper stud ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Recent theoretical works applying the methods of statistical learning theory have put into relief the interest of old well known learning paradigms such as Bayesian inference and Gibbs algorithms. Sample complexity bounds have been given for such paradigms in the zero error case. This paper studies the behavior of these algorithms without this assumption. Results include uniform convergence of Gibbs algorithm towards Bayesian inference, rate of convergence of the empirical loss towards the generalization loss, convergence of the generalization error towards the optimal loss in the underlying class of functions.
Contribution of statistical learning to validation of association rules
, 2001
"... Many measures aim at evaluating the interest of association rules. The subject of this article is the detailed study of confidence intervals associated to the evaluation of these measures. The following difficulties arise: Samples being finite, we restrict our attention to nonasymptotic bound ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Many measures aim at evaluating the interest of association rules. The subject of this article is the detailed study of confidence intervals associated to the evaluation of these measures. The following difficulties arise: Samples being finite, we restrict our attention to nonasymptotic bounds. The number of tested rules can be large. So, it is not statistically possible to treat the rules separately: risks accumulate and one could thus "validate" absurd rules. We do not only work on rules without exception; rules with confidence lower than 1 can be important. The solution we propose is based upon VCdimension, a classical tool of learning theory.
Polynomial Uniform Convergence and PolynomialSample Learnability
, 1992
"... In the PAC model, polynomialsample learnability in the distribution dependent framework has been characterized in terms of minimun cardinality of fflcovers. In this paper we propose another approach to the problem by investigating the relationship between polynomialsample learnability and unifo ..."
Abstract
 Add to MetaCart
In the PAC model, polynomialsample learnability in the distribution dependent framework has been characterized in terms of minimun cardinality of fflcovers. In this paper we propose another approach to the problem by investigating the relationship between polynomialsample learnability and uniform convergence, in analogy to what was done for the distribution free setting. First of all, we introduce the notion of polynomial uniform convergence, giving a characterization for it in terms of an entropic measure, then we study its relationship with polynomialsample learnability. We show that, contrarily to what happens in the distribution independent setting, polynomial uniform convergence is a sufficient but not necessary condition for polynomialsample learnability. This research was partly supported by CNR, grant 92.01568.PF.69, project Sistemi Informatici e Calcolo Parallelo. An extended abstract of this paper appeared in Proc. 5th Annual ACM Workshop on Computational Learning ...
Abstract
"... In this paper we study a Bayesian or averagecase model of concept learning with a twofold goal: to provide more precise characterizations of learning curve (sample complexity) behavior that depend on properties of both the prior distribution over concepts and the sequence of instances seen by the l ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper we study a Bayesian or averagecase model of concept learning with a twofold goal: to provide more precise characterizations of learning curve (sample complexity) behavior that depend on properties of both the prior distribution over concepts and the sequence of instances seen by the learner, and to smoothly unite in a common framework the popular statistical physics and VC dimension theories of learning curves. To achieve this, we undertake a systematic investigation and comparison of two fundamental quantities in learning and information theory: the probability ofan incorrect prediction for an optimal learning algorithm, and the Shannon information gain. This study leads to a new understanding of the sample complexity of learning in several existing models. 1
'Laboratory for Information and Decision Systems, M.I.T.
, 1990
"... The original and most widely studied PAC model for learning assumes a passive learner in the sense that the learner plays no role in obtaining information about the unknown concept. That is, the samples are simply drawn independently from some probability distribution. Some work has been done on stu ..."
Abstract
 Add to MetaCart
(Show Context)
The original and most widely studied PAC model for learning assumes a passive learner in the sense that the learner plays no role in obtaining information about the unknown concept. That is, the samples are simply drawn independently from some probability distribution. Some work has been done on studying more powerful oracles and how they affect learnability. To find bounds on the improvement that can be expected from using oracles, we consider active learning in the sense that the learner has complete choice in the information received. Specifically, we allow the learner to ask arbitrary yes/no questions. We consider both active learning under a fixed distribution and distributionfree active learning. In the case of active learning, the underlying probability distribution is used only to measure distance between concepts. For learnability with respect to a fixed distribution, active learning does not enlarge the set of learnable concept classes, but can improve the sample complexity. For distributionfree learning, it is shown that a concept class is actively learnable iff it is finite, so that active learning is in fact less powerful than the usual passive learning model. We also consider a form of distributionfree learning in which the learner knows the distribution being used, so that 'distributionfree ' refers only to the requirement that a bound on the number of queries can be obtained uniformly over all distributions. Even with the side information of the distribution being used, a concept class is actively learnable iff it has finite VC dimension, so that active learning with the side information still does not enlarge the set of learnable concept classes. *''IThis work was supported by the U.S. Army Research Office under Contract DAAL0386K10171, by the Department.