Results 1  10
of
132
Efficient noisetolerant learning from statistical queries
 JOURNAL OF THE ACM
, 1998
"... In this paper, we study the problem of learning in the presence of classification noise in the probabilistic learning model of Valiant and its variants. In order to identify the class of “robust” learning algorithms in the most general way, we formalize a new but related model of learning from stat ..."
Abstract

Cited by 288 (6 self)
 Add to MetaCart
In this paper, we study the problem of learning in the presence of classification noise in the probabilistic learning model of Valiant and its variants. In order to identify the class of “robust” learning algorithms in the most general way, we formalize a new but related model of learning from statistical queries. Intuitively, in this model, a learning algorithm is forbidden to examine individual examples of the unknown target function, but is given access to an oracle providing estimates of probabilities over the sample space of random examples. One of our main results shows that any class of functions learnable from statistical queries is in fact learnable with classification noise in Valiant’s model, with a noise rate approaching the informationtheoretic barrier of 1/2. We then demonstrate the generality of the statistical query model, showing that practically every class learnable in Valiant’s model and its variants can also be learned in the new model (and thus can be learned in the presence of noise). A notable exception to this statement is the class of parity functions, which we prove is not learnable from statistical queries, and for which no noisetolerant algorithm is known.
Efficient Distributionfree Learning of Probabilistic Concepts
 Journal of Computer and System Sciences
, 1993
"... In this paper we investigate a new formal model of machine learning in which the concept (boolean function) to be learned may exhibit uncertain or probabilistic behaviorthus, the same input may sometimes be classified as a positive example and sometimes as a negative example. Such probabilistic c ..."
Abstract

Cited by 197 (8 self)
 Add to MetaCart
In this paper we investigate a new formal model of machine learning in which the concept (boolean function) to be learned may exhibit uncertain or probabilistic behaviorthus, the same input may sometimes be classified as a positive example and sometimes as a negative example. Such probabilistic concepts (or pconcepts) may arise in situations such as weather prediction, where the measured variables and their accuracy are insufficient to determine the outcome with certainty. We adopt from the Valiant model of learning [27] the demands that learning algorithms be efficient and general in the sense that they perform well for a wide class of pconcepts and for any distribution over the domain. In addition to giving many efficient algorithms for learning natural classes of pconcepts, we study and develop in detail an underlying theory of learning pconcepts. 1 Introduction Consider the following scenarios: A meteorologist is attempting to predict tomorrow's weather as accurately as pos...
Toward efficient agnostic learning
 In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory
, 1992
"... Abstract. In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed agnostic learning, in which we make virtua ..."
Abstract

Cited by 195 (7 self)
 Add to MetaCart
Abstract. In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed agnostic learning, in which we make virtually no assumptions on the target function. The name derives from the fact that as designers of learning algorithms, we give up the belief that Nature (as represented by the target function) has a simple or succinct explanation. We give a number of positive and negative results that provide an initial outline of the possibilities for agnostic learning. Our results include hardness results for the most obvious generalization of the PAC model to an agnostic setting, an efficient and general agnostic learning method based on dynamic programming, relationships between loss functions for agnostic learning, and an algorithm for a learning problem that involves hidden variables.
On the learnability of discrete distributions
 In The 25th Annual ACM Symposium on Theory of Computing
, 1994
"... We introduce and investigate a new model of learning probability distributions from independent draws. Our model is inspired by the popular Probably Approximately Correct (PAC) model for learning boolean functions from labeled ..."
Abstract

Cited by 93 (11 self)
 Add to MetaCart
We introduce and investigate a new model of learning probability distributions from independent draws. Our model is inspired by the popular Probably Approximately Correct (PAC) model for learning boolean functions from labeled
Learning Boolean Concepts in the Presence of Many Irrelevant Features
 Artificial Intelligence
, 1994
"... In many domains, an appropriate inductive bias is the MINFEATURES bias, which prefers consistent hypotheses definable over as few features as possible. This paper defines and studies this bias in Boolean domains. First, it is shown that any learning algorithm implementing the MINFEATURES bias requ ..."
Abstract

Cited by 93 (0 self)
 Add to MetaCart
In many domains, an appropriate inductive bias is the MINFEATURES bias, which prefers consistent hypotheses definable over as few features as possible. This paper defines and studies this bias in Boolean domains. First, it is shown that any learning algorithm implementing the MINFEATURES bias requires \Theta( 1 ffl ln 1 ffi + 1 ffl [2 p + p ln n]) training examples to guarantee PAClearning a concept having p relevant features out of n available features. This bound is only logarithmic in the number of irrelevant features. For implementing the MINFEATURES bias, the paper presents five algorithms that identify a subset of features sufficient to construct a hypothesis consistent with the training examples. FOCUS1 is a straightforward algorithm that returns a minimal and sufficient subset of features in quasipolynomial time. FOCUS2 does the same task as FOCUS1 but is empirically shown to be substantially faster than FOCUS1. Finally, the SimpleGreedy, MutualInformationG...
Learning polynomials with queries: The highly noisy case
, 1995
"... Given a function f mapping nvariate inputs from a finite Kearns et. al. [21] (see also [27, 28, 22]). In the setting of agfieldFintoF, we consider the task of reconstructing a list nostic learning, the learner is to make no assumptions regarding of allnvariate degreedpolynomials which agree withf ..."
Abstract

Cited by 87 (18 self)
 Add to MetaCart
Given a function f mapping nvariate inputs from a finite Kearns et. al. [21] (see also [27, 28, 22]). In the setting of agfieldFintoF, we consider the task of reconstructing a list nostic learning, the learner is to make no assumptions regarding of allnvariate degreedpolynomials which agree withfon a the natural phenomena underlying the input/output relationship tiny but nonnegligible fraction, , of the input space. We give a of the function, and the goal of the learner is to come up with a randomized algorithm for solving this task which accessesfas a simple explanation which best fits the examples. Therefore the black box and runs in time polynomial in1;nand exponential in best explanation may account for only part of the phenomena. d, provided is(pd=jFj). For the special case whend=1, In some situations, when the phenomena appears very irregular, we solve this problem for jFj>0. In this case the providing an explanation which fits only part of it is better than nothing. Interestingly, Kearns et. al. did not consider the use of running time of our algorithm is bounded by a polynomial queries (but rather examples drawn from an arbitrary distribuand exponential ind. Our algorithm generalizes a previously tion) as they were skeptical that queries could be of any help. known algorithm, due to Goldreich and Levin, that solves this We show that queries do seem to help (see below). task for the case whenF=GF(2)(andd=1).
Theory and Applications of Agnostic PACLearning with Small Decision Trees
, 1995
"... We exhibit a theoretically founded algorithm T2 for agnostic PAClearning of decision trees of at most 2 levels, whose computation time is almost linear in the size of the training set. We evaluate the performance of this learning algorithm T2 on 15 common "realworld" datasets, and show that for mo ..."
Abstract

Cited by 75 (2 self)
 Add to MetaCart
We exhibit a theoretically founded algorithm T2 for agnostic PAClearning of decision trees of at most 2 levels, whose computation time is almost linear in the size of the training set. We evaluate the performance of this learning algorithm T2 on 15 common "realworld" datasets, and show that for most of these datasets T2 provides simple decision trees with little or no loss in predictive power (compared with C4.5). In fact, for datasets with continuous attributes its error rate tends to be lower than that of C4.5. To the best of our knowledge this is the first time that a PAClearning algorithm is shown to be applicable to "realworld" classification problems. Since one can prove that T2 is an agnostic PAClearning algorithm, T2 is guaranteed to produce close to optimal 2level decision trees from sufficiently large training sets for any (!) distribution of data. In this regard T2 differs strongly from all other learning algorithms that are considered in applied machine learning, for w...
Tracking drifting concepts by minimizing disagreements
 Machine Learning
, 1994
"... Abstract. In this paper we consider the problem of tracking a subset of a domain (called the target) which changes gradually over time. A single (unknown) probability distribution over the domain is used to generate random examples for the learning algorithm and measure the speed at which the target ..."
Abstract

Cited by 68 (3 self)
 Add to MetaCart
Abstract. In this paper we consider the problem of tracking a subset of a domain (called the target) which changes gradually over time. A single (unknown) probability distribution over the domain is used to generate random examples for the learning algorithm and measure the speed at which the target changes. Clearly, the more rapidly the target moves, the harder it is for the algorithm to maintain a good approximation of the target. Therefore we evaluate algorithms based on how much movement of the target can be tolerated between examples while predicting with accuracy e. Furthermore, the complexity of the class 7/of possible targets, as measured by d, its VCdimension, also effects the difficulty of tracking the target concept. We show that if the problem of minimizing the number of disagreements with a sample from among concepts in a class 7 { can be approximated to within a factor k, then there is a simple tracking algorithm for 7t which can achieve a probability e of making a mistake if the target movement rate is at most a constant times e2/(k(d + k) In 1), where d is the VapnikChervonenkis dimension of 7t. Also, we show that if 7 / is properly PAClearnable, then there is an efficient (randomized) algorithm that with high probability approximately minimizes disagreements to within a factor of 7d + 1, yielding an efficient tracking algorithm for 7I which tolerates drift rates up to a constant times e2/(d 2 In ). In addition, we prove complementary results for the classes of halfspaces and axisaligned hyperrectangles showing that the maximum rate of drift that any algorithm (even with unlimited computational power) can tolerate is a constant times e2/d.
Learning to reason
 Journal of the ACM
, 1994
"... Abstract. We introduce a new framework for the study of reasoning. The Learning (in order) to Reason approach developed here views learning as an integral part of the inference process, and suggests that learning and reasoning should be studied together. The Learning to Reason framework combines the ..."
Abstract

Cited by 57 (24 self)
 Add to MetaCart
Abstract. We introduce a new framework for the study of reasoning. The Learning (in order) to Reason approach developed here views learning as an integral part of the inference process, and suggests that learning and reasoning should be studied together. The Learning to Reason framework combines the interfaces to the world used by known learning models with the reasoning task and a performance criterion suitable for it. In this framework, the intelligent agent is given access to its favorite learning interface, and is also given a grace period in which it can interact with this interface and construct a representation KB of the world W. The reasoning performance is measured only after this period, when the agent is presented with queries � from some query language, relevant to the world, and has to answer whether W implies �. The approach is meant to overcome the main computational difficulties in the traditional treatment of reasoning which stem from its separation from the “world”. Since the agent interacts with the world when constructing its knowledge representation it can choose a representation that is useful for the task at hand. Moreover, we can now make explicit the dependence of the reasoning performance on the environment the agent interacts with. We show how previous results from learning theory and reasoning fit into this framework and
Learning Simple Concepts Under Simple Distributions
 SIAM JOURNAL OF COMPUTING
, 1991
"... We aim at developing a learning theory where `simple' concepts are easily learnable. In Valiant's learning model, many concepts turn out to be too hard (like NP hard) to learn. Relatively few concept classes were shown to be learnable polynomially. In daily life, it seems that things we care to le ..."
Abstract

Cited by 56 (3 self)
 Add to MetaCart
We aim at developing a learning theory where `simple' concepts are easily learnable. In Valiant's learning model, many concepts turn out to be too hard (like NP hard) to learn. Relatively few concept classes were shown to be learnable polynomially. In daily life, it seems that things we care to learn are usually learnable. To model the intuitive notion of learning more closely, we do not require that the learning algorithm learns (polynomially) under all distributions, but only under all simple distributions. A distribution is simple if it is dominated by an enumerable distrib...