Results 1 -
9 of
9
Statistical Queries and Faulty PAC Oracles
- In Proceedings of the Sixth Annual ACM Workshop on Computational Learning Theory
, 1993
"... In this paper we study learning in the PAC model of Valiant [18] in which the example oracle used for learning may be faulty in one of two ways: either by misclassifying the example or by distorting the distribution of examples. We first consider models in which examples are misclassified. Kearns [1 ..."
Abstract
-
Cited by 37 (6 self)
- Add to MetaCart
In this paper we study learning in the PAC model of Valiant [18] in which the example oracle used for learning may be faulty in one of two ways: either by misclassifying the example or by distorting the distribution of examples. We first consider models in which examples are misclassified. Kearns [12] recently showed that efficient learning in a new model using statistical queries is a sufficient condition for PAC learning with classification noise. We show that efficient learning with statistical queries is sufficient for learning in the PAC model with malicious error rate proportional to the required statistical query accuracy. One application of this result is a new lower bound for tolerable malicious error in learning monomials of k literals. This is the first such bound which is independent of the number of irrelevant attributes n. We also use the statistical query model to give sufficient conditions for using distribution specific algorithms on distributions outside their prescr...
Active learning using arbitrary binary valued queries
- Machine Learning
, 1993
"... Abstract. The original and most widely studied PAC model for learning assumes a passive learner in the sense that the learner plays no role in obtaining information about the unknown concept. That is, the samples are simply drawn independently from some probability distribution. Some work has been d ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
Abstract. The original and most widely studied PAC model for learning assumes a passive learner in the sense that the learner plays no role in obtaining information about the unknown concept. That is, the samples are simply drawn independently from some probability distribution. Some work has been done on studying more powerful oracles and how they affect learnability. To find bounds on the improvement in sample complexity that can be expected from using oracles, we consider active learning in the sense that the learner has complete control over the information received. Specifically, we allow the learner to ask arbitrary yes/no questions. We consider both active learning under a fixed distribution and distribution-free active learning. In the case of active learning, the underlying probability distribution is used only to measure distance between concepts. For learnability with respect to a fixed distribution, active learning does not enlarge the set of learnable concept classes, but can improve the sample complexity. For distribution-free learning, it is shown that a concept class is actively learnable iff it is finite, so that active learning is in fact less powerful than the usual passive learning model. We also consider a form of distribution-free learning in which the learner knows the distribution being used, so that "distributionfree" refers only to the requirement that a bound on the number of queries can be obtained uniformly over all distributions. Even with the side information of the distribution being used, a concept class is actively learnable iff it has finite VC dimension, so that active learning with the side information still does not enlarge the set of learnable concept classes. Keywords: PAC-learning, active learning, queries, oracles 1.
Generalization error bounds using unlabeled data
- in Learning Theory: 18th Annual Conference on Learning Theory, COLT 2005
, 2005
"... Abstract. We present two new methods for obtaining generalization error bounds in a semi-supervised setting. Both methods are based on approximating the disagreement probability of pairs of classifiers using unlabeled data. The first method works in the realizable case. It suggests how the ERM princ ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Abstract. We present two new methods for obtaining generalization error bounds in a semi-supervised setting. Both methods are based on approximating the disagreement probability of pairs of classifiers using unlabeled data. The first method works in the realizable case. It suggests how the ERM principle can be refined using unlabeled data and has provable optimality guarantees when the number of unlabeled examples is large. Furthermore, the technique extends easily to cover active learning. A downside is that the method is of little use in practice due to its limitation to the realizable case. The idea in our second method is to use unlabeled data to transform bounds for randomized classifiers into bounds for simpler deterministic classifiers. As a concrete example of how the general method works in practice, we apply it to a bound based on cross-validation. The result is a semi-supervised bound for classifiers learned based on all the labeled data. The bound is easy to implement and apply and should be tight whenever cross-validation makes sense. Applying the bound to SVMs on the MNIST benchmark data set gives results that suggest that the bound may be tight enough to be useful in practice. 1
Efficient Learning from Faulty Data
, 1995
"... Learning systems are often provided with imperfect or noisy data. Therefore, researchers have formalized various models of learning with noisy data, and have attempted to delineate the boundaries of learnability in these models. In this thesis, we describe a general framework for the construction of ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Learning systems are often provided with imperfect or noisy data. Therefore, researchers have formalized various models of learning with noisy data, and have attempted to delineate the boundaries of learnability in these models. In this thesis, we describe a general framework for the construction of efficient learning algorithms in noise tolerant variants of Valiant's PAC learning model. By applying this framework, we also obtain many new results for specific learning problems in various settings with faulty data. The central tool used in this thesis is the specification of learning algorithms in Kearns' Statistical Query (SQ) learning model, in which statistics, as opposed to labelled examples, are requested by the learner. These SQ learning algorithms are then converted into PAC algorithms which tolerate various types of faulty data. We develop this framework in three major parts: 1. We design automatic compilations of SQ algorithms into PAC algorithms which tolerate various types of data errors. These results include improvements to Kearns' classification noise compilation, and the first such compilations for malicious errors, attribute noise and new classes of "hybrid " noise composed of multiple noise types. 2. We prove nearly tight bounds on the required complexity of SQ algorithms. The upper bounds are based on a constructive technique which allows one to achieve this complexity even when it is not initially achieved by a given SQ algorithm. 3. We define and employ an improved model of SQ learning which yields noise tolerant PAC algorithms that are more efficient than those derived from standard SQ algorithms. Together, these results provide a unified and intuitive framework for noise tolerant learning that allows the algorithm designer to achieve efficient, and often optimal, fault tolerant learning.
Learning with Limited Visibility
- CDAM Research Reports Series
, 1998
"... This paper surveys recent studies of learning problems in which the learner faces restrictions on the amount of information he can extract from each example he encounters. Our main framework for the analysis of such scenarios is the RFA (Restricted Focus of Attention) model. While being a natural re ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper surveys recent studies of learning problems in which the learner faces restrictions on the amount of information he can extract from each example he encounters. Our main framework for the analysis of such scenarios is the RFA (Restricted Focus of Attention) model. While being a natural refinement of the PAC learning model, some of the fundamental PAC-learning results and techniques fail in the RFA paradigm; learnability in the RFA model is no longer characterized by the VC dimension, and many PAC learning algorithms are not applicable in the RFA setting. Hence, the RFA formulation reflects the need for new techniques and tools to cope with some fundamental constraints of realistic learning problems. We also present some paradigms and algorithms that may serve as a first step towards answering this need. Two main types of restrictions can be considered in the general RFA setting: In the more stringent one, called k-RFA, only k of the n attributes of each example are revealed ...
A General Dimension for Query Learning
"... We introduce a new combinatorial dimension that characterizes the number of queries needed to learn, no matter what set of queries is used. This new dimension generalizes previous dimensions providing upper and lower bounds on the query complexity for all sorts of queries, and not for just examp ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
We introduce a new combinatorial dimension that characterizes the number of queries needed to learn, no matter what set of queries is used. This new dimension generalizes previous dimensions providing upper and lower bounds on the query complexity for all sorts of queries, and not for just example-based queries as in previous works. Moreover, the new characterization is not only valid for exact learning but also for approximate learning. We present several Results from sections 4 and 5 were presented at COLT/EUROCOLT 2001 [4]; results from sections 7 and 8 were presented at ALT 2002 [24].
A General Dimension for Approximately Learning Boolean Functions
- IN ALGORITHMIC LEARNING THEORY, 13TH INTERNATIONAL CONFERENCE, ALT 2002
, 2002
"... We extend the notion of general dimension, a combinatorial characterization of learning complexity for arbitrary query protocols, to encompass approximate learning. This immediately yields a characterization of the learning complexity in the statistical query model. As a further application, we ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
We extend the notion of general dimension, a combinatorial characterization of learning complexity for arbitrary query protocols, to encompass approximate learning. This immediately yields a characterization of the learning complexity in the statistical query model. As a further application, we consider approximate learning of DNF formulas and we derive close upper and lower bounds on the number of statistical queries needed. In particular, we show that with respect to the uniform distribution, and for any constant error parameter " < 1=2, the number of statistical queries needed to approximately learn DNF formulas (over n variables and s terms) with tolerance = (1=s) is n .
Learning Fixed-dimension Linear Thresholds From Fragmented Data
- in Procs of the 1999 Conference on Computational Learning Theory
, 1999
"... We investigate PAC-learning in a situation in which examples (consisting of an input vector and 0/1 label) have some of the components of the input vector concealed from the learner. This is a special case of Restricted Focus of Attention (RFA) learning. Our interest here is in 1-RFA learning, where ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We investigate PAC-learning in a situation in which examples (consisting of an input vector and 0/1 label) have some of the components of the input vector concealed from the learner. This is a special case of Restricted Focus of Attention (RFA) learning. Our interest here is in 1-RFA learning, where only a single component of an input vector is given, for each example. We argue that 1-RFA learning merits special consideration within the wider eld of RFA learning. It is the most restrictive form of RFA learning (so that positive results apply in general), and it models a typical \data fusion" scenario, where we have sets of observations from a number of separate sensors, but these sensors are uncorrelated sources. Within this setting we study the well-known class of linear threshold functions, the characteristic functions of Euclidean half-spaces. The sample complexity (i.e. sample-size requirement as a function of the parameters) of this learning problem is aected by the input distri...
A General Dimension for Query Learning
, 2002
"... We introduce a new combinatorial dimension that characterizes the number of queries needed to learn, no matter what set of queries is used. This new dimension generalizes previous dimensions providing upper and lower bounds on the query complexity for all sorts of queries, and not for just examp ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We introduce a new combinatorial dimension that characterizes the number of queries needed to learn, no matter what set of queries is used. This new dimension generalizes previous dimensions providing upper and lower bounds on the query complexity for all sorts of queries, and not for just example-based queries as in previous works. Moreover, the new characterization is not only valid for exact learning but also for approximate learning. We present several

