Results 1  10
of
17
Statistical Queries and Faulty PAC Oracles
 In Proceedings of the Sixth Annual ACM Workshop on Computational Learning Theory
, 1993
"... In this paper we study learning in the PAC model of Valiant [18] in which the example oracle used for learning may be faulty in one of two ways: either by misclassifying the example or by distorting the distribution of examples. We first consider models in which examples are misclassified. Kearns [1 ..."
Abstract

Cited by 44 (6 self)
 Add to MetaCart
In this paper we study learning in the PAC model of Valiant [18] in which the example oracle used for learning may be faulty in one of two ways: either by misclassifying the example or by distorting the distribution of examples. We first consider models in which examples are misclassified. Kearns [12] recently showed that efficient learning in a new model using statistical queries is a sufficient condition for PAC learning with classification noise. We show that efficient learning with statistical queries is sufficient for learning in the PAC model with malicious error rate proportional to the required statistical query accuracy. One application of this result is a new lower bound for tolerable malicious error in learning monomials of k literals. This is the first such bound which is independent of the number of irrelevant attributes n. We also use the statistical query model to give sufficient conditions for using distribution specific algorithms on distributions outside their prescr...
Active learning using arbitrary binary valued queries
 Machine Learning
, 1993
"... The original and most widely studied PAC model for learning assumes a passive learner in the sense that the learner plays no role in obtaining information about the unknown concept. That is, the samples are simply drawn independently from some probability distribution. Some work has been done on stu ..."
Abstract

Cited by 33 (1 self)
 Add to MetaCart
The original and most widely studied PAC model for learning assumes a passive learner in the sense that the learner plays no role in obtaining information about the unknown concept. That is, the samples are simply drawn independently from some probability distribution. Some work has been done on studying more powerful oracles and how they affect learnability. To find bounds on the improvement that can be expected from using oracles, we consider active learning in the sense that the learner has complete choice in the information received. Specifically, we allow the learner to ask arbitrary yes/no questions. We consider both active learning under a fixed distribution and distributionfree active learning. In the case of active learning, the underlying probability distribution is used only to measure distance between concepts. For learnability with respect to a fixed distribution, active learning does not enlarge the set of learnable concept classes, but can improve the sample complexity. For distributionfree learning, it is shown that a concept class is actively learnable iff it is finite, so that active learning is in fact less powerful than the usual passive learning model. We also consider a form of distributionfree learning in which the learner knows the distribution being used, so that 'distributionfree ' refers only to the requirement that a bound on the number of queries can be obtained uniformly over all distributions. Even with the side information of the distribution being used, a concept class is actively learnable iff it has finite VC dimension, so that active learning with the side information still does not enlarge the set of learnable concept classes.
Generalization error bounds using unlabeled data
 in Learning Theory: 18th Annual Conference on Learning Theory, COLT 2005
, 2005
"... Abstract. We present two new methods for obtaining generalization error bounds in a semisupervised setting. Both methods are based on approximating the disagreement probability of pairs of classifiers using unlabeled data. The first method works in the realizable case. It suggests how the ERM princ ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
(Show Context)
Abstract. We present two new methods for obtaining generalization error bounds in a semisupervised setting. Both methods are based on approximating the disagreement probability of pairs of classifiers using unlabeled data. The first method works in the realizable case. It suggests how the ERM principle can be refined using unlabeled data and has provable optimality guarantees when the number of unlabeled examples is large. Furthermore, the technique extends easily to cover active learning. A downside is that the method is of little use in practice due to its limitation to the realizable case. The idea in our second method is to use unlabeled data to transform bounds for randomized classifiers into bounds for simpler deterministic classifiers. As a concrete example of how the general method works in practice, we apply it to a bound based on crossvalidation. The result is a semisupervised bound for classifiers learned based on all the labeled data. The bound is easy to implement and apply and should be tight whenever crossvalidation makes sense. Applying the bound to SVMs on the MNIST benchmark data set gives results that suggest that the bound may be tight enough to be useful in practice. 1
A General Dimension for Query Learning
, 2002
"... We introduce a new combinatorial dimension that characterizes the number of queries needed to learn, no matter what set of queries is used. This new dimension generalizes previous dimensions providing upper and lower bounds on the query complexity for all sorts of queries, and not for just examp ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
We introduce a new combinatorial dimension that characterizes the number of queries needed to learn, no matter what set of queries is used. This new dimension generalizes previous dimensions providing upper and lower bounds on the query complexity for all sorts of queries, and not for just examplebased queries as in previous works. Moreover, the new characterization is not only valid for exact learning but also for approximate learning. We present several
Efficient Learning from Faulty Data
, 1995
"... Learning systems are often provided with imperfect or noisy data. Therefore, researchers have formalized various models of learning with noisy data, and have attempted to delineate the boundaries of learnability in these models. In this thesis, we describe a general framework for the construction of ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Learning systems are often provided with imperfect or noisy data. Therefore, researchers have formalized various models of learning with noisy data, and have attempted to delineate the boundaries of learnability in these models. In this thesis, we describe a general framework for the construction of efficient learning algorithms in noise tolerant variants of Valiant's PAC learning model. By applying this framework, we also obtain many new results for specific learning problems in various settings with faulty data. The central tool used in this thesis is the specification of learning algorithms in Kearns' Statistical Query (SQ) learning model, in which statistics, as opposed to labelled examples, are requested by the learner. These SQ learning algorithms are then converted into PAC algorithms which tolerate various types of faulty data. We develop this framework in three major parts: 1. We design automatic compilations of SQ algorithms into PAC algorithms which tolerate various types of data errors. These results include improvements to Kearns' classification noise compilation, and the first such compilations for malicious errors, attribute noise and new classes of &quot;hybrid &quot; noise composed of multiple noise types. 2. We prove nearly tight bounds on the required complexity of SQ algorithms. The upper bounds are based on a constructive technique which allows one to achieve this complexity even when it is not initially achieved by a given SQ algorithm. 3. We define and employ an improved model of SQ learning which yields noise tolerant PAC algorithms that are more efficient than those derived from standard SQ algorithms. Together, these results provide a unified and intuitive framework for noise tolerant learning that allows the algorithm designer to achieve efficient, and often optimal, fault tolerant learning.
Learning with Limited Visibility
 CDAM Research Reports Series
, 1998
"... This paper surveys recent studies of learning problems in which the learner faces restrictions on the amount of information he can extract from each example he encounters. Our main framework for the analysis of such scenarios is the RFA (Restricted Focus of Attention) model. While being a natural re ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
This paper surveys recent studies of learning problems in which the learner faces restrictions on the amount of information he can extract from each example he encounters. Our main framework for the analysis of such scenarios is the RFA (Restricted Focus of Attention) model. While being a natural refinement of the PAC learning model, some of the fundamental PAClearning results and techniques fail in the RFA paradigm; learnability in the RFA model is no longer characterized by the VC dimension, and many PAC learning algorithms are not applicable in the RFA setting. Hence, the RFA formulation reflects the need for new techniques and tools to cope with some fundamental constraints of realistic learning problems. We also present some paradigms and algorithms that may serve as a first step towards answering this need. Two main types of restrictions can be considered in the general RFA setting: In the more stringent one, called kRFA, only k of the n attributes of each example are revealed ...
A General Dimension for Approximately Learning Boolean Functions
 IN ALGORITHMIC LEARNING THEORY, 13TH INTERNATIONAL CONFERENCE, ALT 2002
, 2002
"... We extend the notion of general dimension, a combinatorial characterization of learning complexity for arbitrary query protocols, to encompass approximate learning. This immediately yields a characterization of the learning complexity in the statistical query model. As a further application, we ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We extend the notion of general dimension, a combinatorial characterization of learning complexity for arbitrary query protocols, to encompass approximate learning. This immediately yields a characterization of the learning complexity in the statistical query model. As a further application, we consider approximate learning of DNF formulas and we derive close upper and lower bounds on the number of statistical queries needed. In particular, we show that with respect to the uniform distribution, and for any constant error parameter " < 1=2, the number of statistical queries needed to approximately learn DNF formulas (over n variables and s terms) with tolerance = (1=s) is n .
Learning Fixeddimension Linear Thresholds From Fragmented Data
 in Procs of the 1999 Conference on Computational Learning Theory
, 1999
"... We investigate PAClearning in a situation in which examples (consisting of an input vector and 0/1 label) have some of the components of the input vector concealed from the learner. This is a special case of Restricted Focus of Attention (RFA) learning. Our interest here is in 1RFA learning, where ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
We investigate PAClearning in a situation in which examples (consisting of an input vector and 0/1 label) have some of the components of the input vector concealed from the learner. This is a special case of Restricted Focus of Attention (RFA) learning. Our interest here is in 1RFA learning, where only a single component of an input vector is given, for each example. We argue that 1RFA learning merits special consideration within the wider eld of RFA learning. It is the most restrictive form of RFA learning (so that positive results apply in general), and it models a typical \data fusion" scenario, where we have sets of observations from a number of separate sensors, but these sensors are uncorrelated sources. Within this setting we study the wellknown class of linear threshold functions, the characteristic functions of Euclidean halfspaces. The sample complexity (i.e. samplesize requirement as a function of the parameters) of this learning problem is aected by the input distri...
Characterizing Statistical Query Learning: Simplified Notions and Proofs?
"... Abstract. The Statistical Query model was introduced in [6] to handle noise in the wellknown PAC model. In this model the learner gains information about the target concept by asking for various statistics about it. Characterizing the number of queries required by learning a given concept class un ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. The Statistical Query model was introduced in [6] to handle noise in the wellknown PAC model. In this model the learner gains information about the target concept by asking for various statistics about it. Characterizing the number of queries required by learning a given concept class under fixed distribution was already considered in [3] for weak learning; then in [8] strong learnability was also characterized. However, the proofs for these results in [3, 10, 8] (and for strong learnability even the characterization itself) are rather complex; our main goal is to present a simple approach that works for both problems. Additionally, we strengthen the result on strong learnability by showing that a class is learnable with polynomially many queries iff all consistent algorithms use polynomially many queries, and by showing that proper and improper learning are basically equivalent. As an example, we apply our results on conjunctions under the uniform distribution. 1