Results 1  10
of
15
Fatshattering and the learnability of realvalued functions
 Journal of Computer and System Sciences
, 1996
"... We consider the problem of learning realvalued functions from random examples when the function values are corrupted with noise. With mild conditions on independent observation noise, we provide characterizations of the learnability of a realvalued function class in terms of a generalization of th ..."
Abstract

Cited by 82 (10 self)
 Add to MetaCart
(Show Context)
We consider the problem of learning realvalued functions from random examples when the function values are corrupted with noise. With mild conditions on independent observation noise, we provide characterizations of the learnability of a realvalued function class in terms of a generalization of the VapnikChervonenkis dimension, the fatshattering function, introduced by Kearns and Schapire. We show that, given some restrictions on the noise, a function class is learnable in our model if and only if its fatshattering function is finite. With different (also quite mild) restrictions, satisfied for example by gaussian noise, we show that a function class is learnable from polynomially many examples if and only if its fatshattering function grows polynomially. We prove analogous results in an agnostic setting, where there is no assumption of an underlying function class. 1
Statistical Queries and Faulty PAC Oracles
 In Proceedings of the Sixth Annual ACM Workshop on Computational Learning Theory
, 1993
"... In this paper we study learning in the PAC model of Valiant [18] in which the example oracle used for learning may be faulty in one of two ways: either by misclassifying the example or by distorting the distribution of examples. We first consider models in which examples are misclassified. Kearns [1 ..."
Abstract

Cited by 39 (6 self)
 Add to MetaCart
(Show Context)
In this paper we study learning in the PAC model of Valiant [18] in which the example oracle used for learning may be faulty in one of two ways: either by misclassifying the example or by distorting the distribution of examples. We first consider models in which examples are misclassified. Kearns [12] recently showed that efficient learning in a new model using statistical queries is a sufficient condition for PAC learning with classification noise. We show that efficient learning with statistical queries is sufficient for learning in the PAC model with malicious error rate proportional to the required statistical query accuracy. One application of this result is a new lower bound for tolerable malicious error in learning monomials of k literals. This is the first such bound which is independent of the number of irrelevant attributes n. We also use the statistical query model to give sufficient conditions for using distribution specific algorithms on distributions outside their prescr...
Learning Changing Concepts by Exploiting the Structure of Change
, 1996
"... This paper examines learning problems in which the target function is allowed to change. The learner sees a sequence of random examples, labelled according to a sequence of functions, and must provide an accurate estimate of the target function sequence. We consider a variety of restrictions on how ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
This paper examines learning problems in which the target function is allowed to change. The learner sees a sequence of random examples, labelled according to a sequence of functions, and must provide an accurate estimate of the target function sequence. We consider a variety of restrictions on how the target function is allowed to change, including infrequent but arbitrary changes, sequences that correspond to slow walks on a graph whose nodes are functions, and changes that are small on average, as measured by the probability of disagreements between consecutive functions. We first study estimation, in which the learner sees a batch of examples and is then required to give an accurate estimate of the function sequence. Our results provide bounds on the sample complexity and allowable drift rate for these problems. We also study prediction, in which the learner must produce online a hypothesis after each labelled example and the average misclassification probability over this hypothes...
Probabilistic Analysis of Learning in Artificial Neural Networks: The PAC Model and its Variants
, 1997
"... There are a number of mathematical approaches to the study of learning and generalization in artificial neural networks. Here we survey the `probably approximately correct' (PAC) model of learning and some of its variants. These models provide a probabilistic framework for the discussion of gen ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
(Show Context)
There are a number of mathematical approaches to the study of learning and generalization in artificial neural networks. Here we survey the `probably approximately correct' (PAC) model of learning and some of its variants. These models provide a probabilistic framework for the discussion of generalization and learning. This survey concentrates on the sample complexity questions in these models; that is, the emphasis is on how many examples should be used for training. Computational complexity considerations are briefly discussed for the basic PAC model. Throughout, the importance of the VapnikChervonenkis dimension is highlighted. Particular attention is devoted to describing how the probabilistic models apply in the context of neural network learning, both for networks with binaryvalued output and for networks with realvalued output.
Learning Under Persistent Drift
, 1997
"... . In this paper we study learning algorithms for environments which are changing over time. Unlike most previous work, we are interested in the case where the changes might be rapid but their "direction" is relatively constant. We model this type of change by assuming that the target distr ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
. In this paper we study learning algorithms for environments which are changing over time. Unlike most previous work, we are interested in the case where the changes might be rapid but their "direction" is relatively constant. We model this type of change by assuming that the target distribution is changing continuously at a constant rate from one extreme distribution to another. We show in this case how to use a simple weighting scheme to estimate the error of an hypothesis, and using this estimate, to minimize the error of the prediction. 1 Introduction One of the oversimplifying assumptions made in the PAC model [Val84] is that all the examples are drawn from the same distribution, and that the target function does not change with time. The drawbacks of this assumption have been widely recognized, and a considerable amount of work was devoted to study the cases where either the distribution [Bar92, BL96] or the target function [HL94, BBDK96] changes over time. Clearly, without con...
On the Complexity of Learning from Drifting Distributions
 In Proceedings of the Workshop on Computational Learning Theory
, 1996
"... We consider two models of online learning of binaryvalued functions from drifting distributions due to Bartlett. We show that if each example is drawn from a joint distribution which changes in total variation distance by at most O(ffl 3 =(d log(1=ffl))) between trials, then an algorithm can ach ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
We consider two models of online learning of binaryvalued functions from drifting distributions due to Bartlett. We show that if each example is drawn from a joint distribution which changes in total variation distance by at most O(ffl 3 =(d log(1=ffl))) between trials, then an algorithm can achieve a probability of a mistake at most ffl worse than the best function in a class of VCdimension d. We prove a corresponding necessary condition of O(ffl 3 =d). Finally, in the case that a fixed function is to be learned from noisefree examples, we show that if the distributions on the domain generating the examples change by at most O(ffl 2 =(d log(1=ffl))), then any consistent algorithm learns to within accuracy ffl. 1 Introduction In prediction models [7, 11] like that studied in this paper, learning proceeds in trials, where in the tth trial, the algorithm (1) is given x t chosen from some set X , (2) is required to output a prediction y t 2 f0; 1g, and (3) discovers y t 2 f0;...
Exploiting Random Walks for Learning
, 1994
"... In this paper we consider an approach to passive learning. In contrast to the classical PAC model we do not assume that the examples are independently drawn according to an underlying distribution, but that they are generated by a timedriven process. We define deterministic and probabilistic learni ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
In this paper we consider an approach to passive learning. In contrast to the classical PAC model we do not assume that the examples are independently drawn according to an underlying distribution, but that they are generated by a timedriven process. We define deterministic and probabilistic learning models of this sort and investigate the relationships between them and with other models. The fact that successive examples are related can often be used to gain additional information similar to the information gained by membership queries. We show that this can be used to design online prediction algorithms. In particular, we present efficient algorithms for exactly identifying Boolean threshold functions, 2term RSE, and 2termDNF, when the examples are generated by a random walk on f0; 1g n . 1 INTRODUCTION In the classical PAC model as introduced by Valiant in [14], information about the unknown target concept is available through labeled examples which are independently drawn....
Evolution with Drifting Targets
"... We consider the question of the stability of evolutionary algorithms to gradual changes, or drift, in the target concept. We define an algorithm to be resistant to drift if, for some inverse polynomial drift rate in the target function, it converges to accuracy 1 − ǫ with polynomial resources, and t ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
(Show Context)
We consider the question of the stability of evolutionary algorithms to gradual changes, or drift, in the target concept. We define an algorithm to be resistant to drift if, for some inverse polynomial drift rate in the target function, it converges to accuracy 1 − ǫ with polynomial resources, and then stays within that accuracy indefinitely, except with probability ǫ at any one time. We show that every evolution algorithm, in the sense of Valiant [20], can be converted using the Correlational Query technique of Feldman [9], into such a drift resistant algorithm. For certain evolutionary algorithms, such as for Boolean conjunctions, we give bounds on the rates of drift that they can resist. We develop some new evolution algorithms that are resistant to significant drift. In particular, we give an algorithm for evolving linear separators over the spherically symmetric distribution that is resistant to a drift rate of O(ǫ/n), and another algorithm over the more general product normal distributions that resists a smaller drift rate. Theabovetranslationresultcanbealsointerpretedasoneontherobustnessofthenotionof evolvability itself under changes of definition. As a second result in that direction we show that every evolution algorithm can be converted to a quasimonotonic one that can evolve from any starting point without the performance ever dipping significantly below that of the starting point. This permits the somewhat unnatural feature of arbitrary performance degradations to be removed from several known robustness translations. 1
New Analysis and Algorithm for Learning with Drifting Distributions
"... Abstract. We present a new analysis of the problem of learning with drifting distributions in the batch setting using the notion of discrepancy. We prove learning bounds based on the Rademacher complexity of the hypothesis set and the discrepancy of distributions both for a drifting PAC scenario and ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
Abstract. We present a new analysis of the problem of learning with drifting distributions in the batch setting using the notion of discrepancy. We prove learning bounds based on the Rademacher complexity of the hypothesis set and the discrepancy of distributions both for a drifting PAC scenario and a tracking scenario. Our bounds are always tighter and in some cases substantially improve upon previous ones based on the L1 distance. We also present a generalization of the standard online to batch conversion to the drifting scenario in terms of the discrepancy and arbitrary convex combinations of hypotheses. We introduce a new algorithm exploiting these learning guarantees, which we show can be formulated as a simple QP. Finally, we report the results of preliminary experiments demonstrating the benefits of this algorithm.
Gaining degrees of freedom in subsymbolic learning
 Journal of Theoretical Computer Science
, 2001
"... ABSTRACT. We provide some theoretical results on sample complexity of PAC learning when the hypotheses are given by subsymbolical devices such as neural networks. In this framework we give new foundations to the notion of degrees of freedom of a statistic and relate it to the complexity of a concept ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
(Show Context)
ABSTRACT. We provide some theoretical results on sample complexity of PAC learning when the hypotheses are given by subsymbolical devices such as neural networks. In this framework we give new foundations to the notion of degrees of freedom of a statistic and relate it to the complexity of a concept class. Thus, for a given concept class and a given sample size, we discuss the efficiency of subsymbolical learning algorithms in terms of degrees of freedom of the computed statistic. In this setting we appraise the sample complexity overhead coming from relying on approximate hypotheses and display an increase in the degrees of freedom yield by embedding available formal knowledge into the algorithm. For known sample distribution, these quantities are related to the learning approximation goal and a special production prize is shown. Finally, we prove that testing the approximation capability of a neural network generally demands smaller sample size than training it.