Results 1  10
of
21
Efficient Distributionfree Learning of Probabilistic Concepts
 Journal of Computer and System Sciences
, 1993
"... In this paper we investigate a new formal model of machine learning in which the concept (boolean function) to be learned may exhibit uncertain or probabilistic behaviorthus, the same input may sometimes be classified as a positive example and sometimes as a negative example. Such probabilistic c ..."
Abstract

Cited by 197 (8 self)
 Add to MetaCart
In this paper we investigate a new formal model of machine learning in which the concept (boolean function) to be learned may exhibit uncertain or probabilistic behaviorthus, the same input may sometimes be classified as a positive example and sometimes as a negative example. Such probabilistic concepts (or pconcepts) may arise in situations such as weather prediction, where the measured variables and their accuracy are insufficient to determine the outcome with certainty. We adopt from the Valiant model of learning [27] the demands that learning algorithms be efficient and general in the sense that they perform well for a wide class of pconcepts and for any distribution over the domain. In addition to giving many efficient algorithms for learning natural classes of pconcepts, we study and develop in detail an underlying theory of learning pconcepts. 1 Introduction Consider the following scenarios: A meteorologist is attempting to predict tomorrow's weather as accurately as pos...
Some PACBayesian Theorems
 Machine Learning
, 1998
"... This paper gives PAC guarantees for "Bayesian" algorithms  algorithms that optimize risk minimization expressions involving a prior probability and a likelihood for the training data. PACBayesian algorithms are motivated by a desire to provide an informative prior encoding information about ..."
Abstract

Cited by 103 (4 self)
 Add to MetaCart
This paper gives PAC guarantees for "Bayesian" algorithms  algorithms that optimize risk minimization expressions involving a prior probability and a likelihood for the training data. PACBayesian algorithms are motivated by a desire to provide an informative prior encoding information about the expected experimental setting but still having PAC performance guarantees over all IID settings. The PACBayesian theorems given here apply to an arbitrary prior measure on an arbitrary concept space. These theorems provide an alternative to the use of VC dimension in proving PAC bounds for parameterized concepts. 1 INTRODUCTION Much of modern learning theory can be divided into two seemingly separate areas  Bayesian inference and PAC learning. Both areas study learning algorithms which take as input training data and produce as output a concept or model which can then be tested on test data. In both areas learning algorithms are associated with correctness theorems. PAC correct...
On the Complexity of Teaching
 Journal of Computer and System Sciences
, 1992
"... While most theoretical work in machine learning has focused on the complexity of learning, recently there has been increasing interest in formally studying the complexity of teaching . In this paper we study the complexity of teaching by considering a variant of the online learning model in which a ..."
Abstract

Cited by 101 (2 self)
 Add to MetaCart
While most theoretical work in machine learning has focused on the complexity of learning, recently there has been increasing interest in formally studying the complexity of teaching . In this paper we study the complexity of teaching by considering a variant of the online learning model in which a helpful teacher selects the instances. We measure the complexity of teaching a concept from a given concept class by a combinatorial measure we call the teaching dimension. Informally, the teaching dimension of a concept class is the minimum number of instances a teacher must reveal to uniquely identify any target concept chosen from the class. A preliminary version of this paper appeared in the Proceedings of the Fourth Annual Workshop on Computational Learning Theory, pages 303314. August 1991. Most of this research was carried out while both authors were at MIT Laboratory for Computer Science with support provided by ARO Grant DAAL0386K0171, DARPA Contract N0001489J1988, NSF Gr...
PACBayesian stochastic model selection
 Machine Learning
, 2003
"... Abstract PACBayesian learning methods combine the informative priors of Bayesian methods with distributionfree PAC guarantees. Stochastic model selection predicts a class label by stochastically sampling a classifier according to a "posterior distribution " on classifiers. This paper giv ..."
Abstract

Cited by 59 (2 self)
 Add to MetaCart
Abstract PACBayesian learning methods combine the informative priors of Bayesian methods with distributionfree PAC guarantees. Stochastic model selection predicts a class label by stochastically sampling a classifier according to a "posterior distribution " on classifiers. This paper gives a PACBayesian performance guarantee for stochastic model selection that is superior to analogous guarantees for deterministic model selection. The guarantee is stated in terms of the training error of the stochastic classifier and the KLdivergence of the posterior from the prior. It is shown that the posterior optimizing the performance guarantee is a Gibbs distribution. Simpler posterior distributions are also derived that have nearly optimal performance guarantees.
FROM FINDING MAXIMUM FEASIBLE SUBSYSTEMS OF LINEAR SYSTEMS TO FEEDFORWARD NEURAL NETWORK DESIGN
, 1994
"... ..."
Using Approximate Models as Source of Contextual Information for . . .
 In Proc. of the ICCV'95 Workshop on ContextBased Vision
, 1995
"... Most computer vision algorithms are based on strong assumptions about the objects and the actions depicted in the image. To safely apply those algorithms in real world image sequences, it is necessary to verify that their assumptions are satisfied in the context of the visual process. We propose the ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
Most computer vision algorithms are based on strong assumptions about the objects and the actions depicted in the image. To safely apply those algorithms in real world image sequences, it is necessary to verify that their assumptions are satisfied in the context of the visual process. We propose the use of approximate world models  coarse descriptions of objects and actions in the world  as the appropriate representation for contextual information. The approximate world models are employed to verify the applicability of a vision routine in a given situation. Under these conditions, a task module can reliably use the outputs of the contextuallysafe vision routines, without having to refer to an accurate reconstruction of the world. We are using approximate world models in a project to control cameras in a TV studio. In our Intelligent Studio automatic cameras respond to verbal requests for shots from the TV director. Contextual information is obtained from the script of the TV sho...
The Complexity of Theory Revision
 In Proceedings of IJCAI95
, 1998
"... A knowledgebased system uses its database (a.k.a. its "theory") to produce answers to the queries it receives. Unfortunately, these answers may be incorrect if the underlying theory is faulty. Standard "theory revision" systems use a given set of "labeled queries" (each a query paired with its corr ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
A knowledgebased system uses its database (a.k.a. its "theory") to produce answers to the queries it receives. Unfortunately, these answers may be incorrect if the underlying theory is faulty. Standard "theory revision" systems use a given set of "labeled queries" (each a query paired with its correct answer) to transform the given theory, by adding and/or deleting either rules and/or antecedents, into a related theory that is as accurate as possible. After formally defining the theory revision task, this paper provides both sample and computational complexity bounds for this process. It first specifies the number of labeled queries necessary to identify a revised theory whose error is close to minimal with high probability. It then considers the computational complexity of finding this best theory, and proves that, unless P = NP , no polynomial time algorithm can identify this nearoptimal revision, even given the exact distribution of queries, except in certain simple situation. It ...
A Framework for Structural Risk Minimisation
, 1996
"... The paper introduces a framework for studying structural risk minimisation. The model views structural risk minimisation in a PAC context. It then considers the more general case when the hierarchy of classes is chosen in response to the data. This theoretically explains the impressive performance o ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
The paper introduces a framework for studying structural risk minimisation. The model views structural risk minimisation in a PAC context. It then considers the more general case when the hierarchy of classes is chosen in response to the data. This theoretically explains the impressive performance of the maximal margin hyperplane algorithm of Vapnik. It may also provide a general technique for exploiting serendipitous simplicity in observed data to obtain better prediction accuracy from small training sets.
Sequential PAC Learning
 In Proceedigs of COLT95
, 1995
"... We consider the use of "online" stopping rules to reduce the number of training examples needed to paclearn. Rather than collect a large training sample that can be proved sufficient to eliminate all bad hypotheses a priori, the idea is instead to observe training examples oneatatime and decid ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
We consider the use of "online" stopping rules to reduce the number of training examples needed to paclearn. Rather than collect a large training sample that can be proved sufficient to eliminate all bad hypotheses a priori, the idea is instead to observe training examples oneatatime and decide "online" whether to stop and return a hypothesis, or continue training. The primary benefit of this approach is that we can detect when a hypothesizer has actually "converged," and halt training before the standard fixedsamplesize bounds. This paper presents a series of such sequential learning procedures for: distributionfree paclearning, "mistakebounded to pac" conversion, and distributionspecific paclearning, respectively. We analyze the worst case expected training sample size of these procedures, and show that this is often smaller than existing fixed sample size bounds  while providing the exact same worst case pacguarantees. We also provide lower bounds that show these r...
Learning by Canonical Smooth Estimation, Part I: Simultaneous Estimation
 IEEE Transactions on Automatic Control
, 1996
"... This paper examines the problem of learning from examples in a framework that is based on, but more general than, Valiant's Probably Approximately Correct (PAC) model for learning. In our framework, the learner observes examples that consist of sample points drawn and labeled according to a fixed, u ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
This paper examines the problem of learning from examples in a framework that is based on, but more general than, Valiant's Probably Approximately Correct (PAC) model for learning. In our framework, the learner observes examples that consist of sample points drawn and labeled according to a fixed, unknown probability distribution. Based on this empirical data, the learner must select, from a set of candidate functions, a particular function, or "hypothesis," that will accurately predict the labels of future sample points. The expected mismatch between a hypothesis' prediction and the label of a new sample point is called the hypothesis' "generalization error." Following the pioneering work of Vapnik and Chervonenkis, others have attacked this sort of learning problem by finding hypotheses that minimize the relative frequencybased empirical error estimate. We generalize this approach by examining the "simultaneous estimation" problem: When does some procedure exist for estimating the g...