Results 1  10
of
18
A bound on the label complexity of agnostic active learning
 In Proc. of the 24th international conference on Machine learning
, 2007
"... We study the label complexity of poolbased active learning in the agnostic PAC model. Specifically, we derive general bounds on the number of label requests made by the A 2 algorithm proposed by Balcan, Beygelzimer & Langford (Balcan et al., 2006). This represents the first nontrivial generalpurpo ..."
Abstract

Cited by 63 (9 self)
 Add to MetaCart
We study the label complexity of poolbased active learning in the agnostic PAC model. Specifically, we derive general bounds on the number of label requests made by the A 2 algorithm proposed by Balcan, Beygelzimer & Langford (Balcan et al., 2006). This represents the first nontrivial generalpurpose upperboundonlabelcomplexityintheagnostic PAC model. 1.
Generalized binary search
 In Proceedings of the 46th Allerton Conference on Communications, Control, and Computing
, 2008
"... This paper addresses the problem of noisy Generalized Binary Search (GBS). GBS is a wellknown greedy algorithm for determining a binaryvalued hypothesis through a sequence of strategically selected queries. At each step, a query is selected that most evenly splits the hypotheses under consideratio ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
This paper addresses the problem of noisy Generalized Binary Search (GBS). GBS is a wellknown greedy algorithm for determining a binaryvalued hypothesis through a sequence of strategically selected queries. At each step, a query is selected that most evenly splits the hypotheses under consideration into two disjoint subsets, a natural generalization of the idea underlying classic binary search. GBS is used in many applications, including fault testing, machine diagnostics, disease diagnosis, job scheduling, image processing, computer vision, and active learning. In most of these cases, the responses to queries can be noisy. Past work has provided a partial characterization of GBS, but existing noisetolerant versions of GBS are suboptimal in terms of query complexity. This paper presents an optimal algorithm for noisy GBS and demonstrates its application to learning multidimensional threshold functions. 1
Teaching dimension and the complexity of active learning
 In Proceedings of the 20th Conference on Learning Theory
, 2007
"... Abstract. We study the label complexity of poolbased active learning in the PAC model with noise. Taking inspiration from extant literature on Exact learning with membership queries, we derive upper and lower bounds on the label complexity in terms of generalizations of extended teaching dimension. ..."
Abstract

Cited by 26 (7 self)
 Add to MetaCart
Abstract. We study the label complexity of poolbased active learning in the PAC model with noise. Taking inspiration from extant literature on Exact learning with membership queries, we derive upper and lower bounds on the label complexity in terms of generalizations of extended teaching dimension. Among the contributions of this work is the first nontrivial general upper bound on label complexity in the presence of persistent classification noise. 1 Overview of Main Results In supervised machine learning, it is becoming increasingly apparent that welldesigned interactive learning algorithms can provide valuable improvements over passive algorithms in learning performance while reducing the amount of effort required of a human annotator. In particular, there is presently much interest in the poolbased active learning setting, in which a learner can request the label of any example in a large pool of unlabeled examples. In this case, one crucial quantity is the number of label requests required by a learning algorithm: the label complexity. This quantity is sometimes significantly smaller than the sample complexity of passive learning. A thorough theoretical understanding of these improvements seems essential to fully exploit the potential of active learning. In particular, active learning is formalized in the PAC model as follows. The pool of m unlabeled examples are sampled i.i.d. according to some distribution D. A binary label is assigned to each example by a (possibly randomized) oracle, but is hidden from the learner unless it requests the label. The error rate of a classifier h is defined as the probability of h disagreeing with the oracle on a fresh example X ∼ D. A learning algorithm outputs a classifier ˆ h from a concept space C, and we refer to the infimum error rate over classifiers in C as the noise rate, denoted ν. For ǫ,δ,η ∈ (0,1), we define the label complexity, denoted #LQ(C, D,ǫ,δ,η), as the smallest number q such that there is an algorithm that outputs a classifier ˆ h ∈ C, and for sufficiently large m, for any oracle with ν ≤ η, with probability at least 1 − δ over the sample and internal randomness, the algorithm makes at most q label requests and ˆ h has error rate at most ν + ǫ. 1
Property Testing: A Learning Theory Perspective
"... Property testing deals with tasks where the goal is to distinguish between the case that an object (e.g., function or graph) has a prespecified property (e.g., the function is linear or the graph is bipartite) and the case that it differs significantly from any such object. The task should be perfor ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
Property testing deals with tasks where the goal is to distinguish between the case that an object (e.g., function or graph) has a prespecified property (e.g., the function is linear or the graph is bipartite) and the case that it differs significantly from any such object. The task should be performed by observing only a very small part of the object, in particular by querying the object, and the algorithm is allowed a small failure probability. One view of property testing is as a relaxation of learning the object (obtaining an approximate representation of the object). Thus property testing algorithms can serve as a preliminary step to learning. That is, they can be applied in order to select, very efficiently, what hypothesis class to use for learning. This survey takes the learningtheory point of view and focuses on results for testing properties of functions that are of interest to the learning theory community. In particular, we cover results for testing algebraic properties of functions such as linearity, testing properties defined by concise representations, such as having a small DNF representation, and more. 1
Learning active classifiers
 Proceedings of the Thirteenth International Conference on Machine Learning (ICML96
, 1996
"... Most classification algorithms are "passive", in that they assign a classlabel to each instance based only on the description given, even if that description is incomplete. By contrast, an active classifier can  at some cost  obtain the values of missing attributes, before deciding upon a class ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
Most classification algorithms are "passive", in that they assign a classlabel to each instance based only on the description given, even if that description is incomplete. By contrast, an active classifier can  at some cost  obtain the values of missing attributes, before deciding upon a class label. This can be useful when considering, for example, whether to extract some information from the web for a critical decision or whether to gather information for a medical test or experiment. The expected utility of using an active classifier depends on both the cost required to obtain the additional attribute values and the penalty incurred if the classifier outputs the wrong classification. This paper analyzes the problem of learning optimal active classifiers, using a variant of the probablyapproximatelycorrect (PAC) model. After defining the framework, we show that this task can be achieved efficiently when the active classifier is allowed to perform only (at most) a constant number of tests. We then show that, in more general environments, the task is often intractable.
Theoretical Foundations of Active Learning
, 2009
"... are those of the author and should not be interpreted as representing the official policies, either expressed or implied, I study the informational complexity of active learning in a statistical learning theory framework. Specifically, I derive bounds on the rates of convergence achievable by active ..."
Abstract

Cited by 16 (8 self)
 Add to MetaCart
are those of the author and should not be interpreted as representing the official policies, either expressed or implied, I study the informational complexity of active learning in a statistical learning theory framework. Specifically, I derive bounds on the rates of convergence achievable by active learning, under various noise models and under general conditions on the hypothesis class. I also study the theoretical advantages of active learning over passive learning, and develop procedures for transforming passive learning algorithms into active learning algorithms with asymptotically superior label complexity. Finally, I study generalizations of active learning to more general forms of interactive statistical learning. viAcknowledgments There are so many people I am indebted to for helping to make this thesis, and indeed my entire career, possible. To begin, I am grateful to the faculty of Webster University, where my journey into science truly began. Support from the teachers I was privileged to have there, including Gary Coffman, BrittMarie Schiller, Ed and Anna B. Sakurai, and John Aleshunas, to name a few, inspired in me a deep curiosity
The Informational Complexity of Learning from Examples
, 1996
"... This thesis attempts to quantify the amount of information needed to learn certain tasks. The tasks chosen vary from learning functions in a Sobolev space using radial basis function networks to learning grammars in the principles and parameters framework of modern linguistic theory. These problem ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
This thesis attempts to quantify the amount of information needed to learn certain tasks. The tasks chosen vary from learning functions in a Sobolev space using radial basis function networks to learning grammars in the principles and parameters framework of modern linguistic theory. These problems are analyzed from the perspective of computational learning theory and certain unifying perspectives emerge. Copyright c fl Massachusetts Institute of Technology, 1996 This report describes research done within the Center for Biological and Computational Learning in the Department of Brain and Cognitive Sciences and at the Artificial Intelligence Laboratory at the Massachusetts Institute of Technology. This research is sponsored by a grant from the National Science Foundation under contract ASC9217041 (this award includes funds from ARPA provided under the HPCC program); and by a grant from ARPA/ONR under contract N0001492J1879. Additional support has been provided by Siemens Co...
Active sampling for multiple output identification
 In The 19th Annual Conf. on Learning Theory
, 2006
"... Abstract. We study functions with multiple output values, and use active sampling to identify an example for each of the possible output values. Our results for this setting include: (1) Efficient active sampling algorithms for simple geometric concepts, such as intervals on a line and axis parallel ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Abstract. We study functions with multiple output values, and use active sampling to identify an example for each of the possible output values. Our results for this setting include: (1) Efficient active sampling algorithms for simple geometric concepts, such as intervals on a line and axis parallel boxes. (2) A characterization for the case of binary output value in a transductive setting. (3) An analysis of active sampling with uniform distribution in the plane. (4) An efficient algorithm for the Boolean hypercube when each output value is a monomial. 1
Simulating Access to Hidden Information while Learning
 Proceedings of the 26th Annual ACM Symposium on the Theory of Computation
, 1994
"... We introduce a new technique which enables a learner without access to hidden information to learn nearly as well as a learner with access to hidden information. We apply our technique to solve an open problem of Maass and Tur'an [18], showing that for any concept class F , the least number of queri ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
We introduce a new technique which enables a learner without access to hidden information to learn nearly as well as a learner with access to hidden information. We apply our technique to solve an open problem of Maass and Tur'an [18], showing that for any concept class F , the least number of queries sufficient for learning F by an algorithm which has access only to arbitrary equivalence queries is at most a factor of 1= log 2 (4=3) more than the least number of queries sufficient for learning F by an algorithm which has access to both arbitrary equivalence queries and membership queries. Previously known results imply that the 1= log 2 (4=3) in our bound is best possible. We describe analogous results for two generalizations of this model to function learning, and apply those results to bound the difficulty of learning in the harder of these models in terms of the difficulty of learning in the easier model. We bound the difficulty of learning unions of k concepts from a class F in t...