Results 1  10
of
34
A bound on the label complexity of agnostic active learning
 In Proc. of the 24th international conference on Machine learning
, 2007
"... We study the label complexity of poolbased active learning in the agnostic PAC model. Specifically, we derive general bounds on the number of label requests made by the A 2 algorithm proposed by Balcan, Beygelzimer & Langford (Balcan et al., 2006). This represents the first nontrivial generalp ..."
Abstract

Cited by 97 (11 self)
 Add to MetaCart
(Show Context)
We study the label complexity of poolbased active learning in the agnostic PAC model. Specifically, we derive general bounds on the number of label requests made by the A 2 algorithm proposed by Balcan, Beygelzimer & Langford (Balcan et al., 2006). This represents the first nontrivial generalpurpose upperboundonlabelcomplexityintheagnostic PAC model. 1.
Generalized binary search
 In Proceedings of the 46th Allerton Conference on Communications, Control, and Computing
, 2008
"... This paper addresses the problem of noisy Generalized Binary Search (GBS). GBS is a wellknown greedy algorithm for determining a binaryvalued hypothesis through a sequence of strategically selected queries. At each step, a query is selected that most evenly splits the hypotheses under consideratio ..."
Abstract

Cited by 58 (0 self)
 Add to MetaCart
(Show Context)
This paper addresses the problem of noisy Generalized Binary Search (GBS). GBS is a wellknown greedy algorithm for determining a binaryvalued hypothesis through a sequence of strategically selected queries. At each step, a query is selected that most evenly splits the hypotheses under consideration into two disjoint subsets, a natural generalization of the idea underlying classic binary search. GBS is used in many applications, including fault testing, machine diagnostics, disease diagnosis, job scheduling, image processing, computer vision, and active learning. In most of these cases, the responses to queries can be noisy. Past work has provided a partial characterization of GBS, but existing noisetolerant versions of GBS are suboptimal in terms of query complexity. This paper presents an optimal algorithm for noisy GBS and demonstrates its application to learning multidimensional threshold functions. 1
Property Testing: A Learning Theory Perspective
"... Property testing deals with tasks where the goal is to distinguish between the case that an object (e.g., function or graph) has a prespecified property (e.g., the function is linear or the graph is bipartite) and the case that it differs significantly from any such object. The task should be perfor ..."
Abstract

Cited by 49 (9 self)
 Add to MetaCart
Property testing deals with tasks where the goal is to distinguish between the case that an object (e.g., function or graph) has a prespecified property (e.g., the function is linear or the graph is bipartite) and the case that it differs significantly from any such object. The task should be performed by observing only a very small part of the object, in particular by querying the object, and the algorithm is allowed a small failure probability. One view of property testing is as a relaxation of learning the object (obtaining an approximate representation of the object). Thus property testing algorithms can serve as a preliminary step to learning. That is, they can be applied in order to select, very efficiently, what hypothesis class to use for learning. This survey takes the learningtheory point of view and focuses on results for testing properties of functions that are of interest to the learning theory community. In particular, we cover results for testing algebraic properties of functions such as linearity, testing properties defined by concise representations, such as having a small DNF representation, and more. 1
Every linear threshold function has a lowweight approximator
, 2006
"... Given any linear threshold function f on n Boolean variables, we construct a linear thresholdfunction g which disagrees with f on at most an ffl fraction of inputs and has integer weights each of magnitude at most pn * 2 ~O(1/ffl 2). We show that the construction is optimal in terms of its dependen ..."
Abstract

Cited by 36 (14 self)
 Add to MetaCart
Given any linear threshold function f on n Boolean variables, we construct a linear thresholdfunction g which disagrees with f on at most an ffl fraction of inputs and has integer weights each of magnitude at most pn * 2 ~O(1/ffl 2). We show that the construction is optimal in terms of its dependence on n by proving a lower bound of \Omega (pn) on the weights required to approximatea particular linear threshold function. We give two applications. The first is a deterministic algorithm for approximately countingthe fraction of satisfying assignments to an instance of the zeroone knapsack problem to within an additive +ffl. The algorithm runs in time polynomial in n (but exponential in 1/ffl2).In our second application, we show that any linear threshold function f is specified to withinerror ffl by estimates of its Chow parameters (degree 0 and 1 Fourier coefficients) which are accurate to within an additive +1/(n * 2 ~O(1/ffl 2)). This is the first such accuracy bound which is inverse polynomial in n (previous work of Goldberg [12] gave a 1/quasipoly(n) bound), andgives the first polynomial bound (in terms of
Testing halfspaces
 IN PROC. 20TH ANNUAL SYMPOSIUM ON DISCRETE ALGORITHMS (SODA
, 2009
"... This paper addresses the problem of testing whether a Booleanvalued function f is a halfspace, i.e. a function of the form f(x) = sgn(w ·x−θ). We consider halfspaces over the continuous domain R n (endowed with the standard multivariate Gaussian distribution) as well as halfspaces over the Boolean ..."
Abstract

Cited by 34 (15 self)
 Add to MetaCart
(Show Context)
This paper addresses the problem of testing whether a Booleanvalued function f is a halfspace, i.e. a function of the form f(x) = sgn(w ·x−θ). We consider halfspaces over the continuous domain R n (endowed with the standard multivariate Gaussian distribution) as well as halfspaces over the Boolean cube {−1, 1} n (endowed with the uniform distribution). In both cases we give an algorithm that distinguishes halfspaces from functions that are ǫfar from any halfspace using only poly ( 1) queries, independent of ǫ the dimension n. Two simple structural results about halfspaces are at the heart of our approach for the Gaussian distribution: the first gives an exact relationship between the expected value of a halfspace f and the sum of the squares of f’s degree1 Hermite coefficients, and the second shows that any function that approximately satisfies this relationship is close to a halfspace. We prove analogous results for the Boolean cube {−1, 1} n (with Fourier coefficients in place of Hermite coefficients) for balanced halfspaces in which all degree1 Fourier coefficients are small. Dealing with general halfspaces over {−1, 1} n poses significant additional complications and requires other ingredients. These include “crossconsistency ” versions of the results mentioned above for pairs of halfspaces with the same weights but different thresholds; new structural results relating the largest degree1 Fourier coefficient and the largest weight in unbalanced halfspaces; and algorithmic techniques from recent work on testing juntas [FKR+ 02].
Teaching dimension and the complexity of active learning
 In Proceedings of the 20th Conference on Learning Theory
, 2007
"... Abstract. We study the label complexity of poolbased active learning in the PAC model with noise. Taking inspiration from extant literature on Exact learning with membership queries, we derive upper and lower bounds on the label complexity in terms of generalizations of extended teaching dimension. ..."
Abstract

Cited by 33 (8 self)
 Add to MetaCart
(Show Context)
Abstract. We study the label complexity of poolbased active learning in the PAC model with noise. Taking inspiration from extant literature on Exact learning with membership queries, we derive upper and lower bounds on the label complexity in terms of generalizations of extended teaching dimension. Among the contributions of this work is the first nontrivial general upper bound on label complexity in the presence of persistent classification noise. 1 Overview of Main Results In supervised machine learning, it is becoming increasingly apparent that welldesigned interactive learning algorithms can provide valuable improvements over passive algorithms in learning performance while reducing the amount of effort required of a human annotator. In particular, there is presently much interest in the poolbased active learning setting, in which a learner can request the label of any example in a large pool of unlabeled examples. In this case, one crucial quantity is the number of label requests required by a learning algorithm: the label complexity. This quantity is sometimes significantly smaller than the sample complexity of passive learning. A thorough theoretical understanding of these improvements seems essential to fully exploit the potential of active learning. In particular, active learning is formalized in the PAC model as follows. The pool of m unlabeled examples are sampled i.i.d. according to some distribution D. A binary label is assigned to each example by a (possibly randomized) oracle, but is hidden from the learner unless it requests the label. The error rate of a classifier h is defined as the probability of h disagreeing with the oracle on a fresh example X ∼ D. A learning algorithm outputs a classifier ˆ h from a concept space C, and we refer to the infimum error rate over classifiers in C as the noise rate, denoted ν. For ǫ,δ,η ∈ (0,1), we define the label complexity, denoted #LQ(C, D,ǫ,δ,η), as the smallest number q such that there is an algorithm that outputs a classifier ˆ h ∈ C, and for sufficiently large m, for any oracle with ν ≤ η, with probability at least 1 − δ over the sample and internal randomness, the algorithm makes at most q label requests and ˆ h has error rate at most ν + ǫ. 1
Theoretical Foundations of Active Learning
, 2009
"... are those of the author and should not be interpreted as representing the official policies, either expressed or implied, I study the informational complexity of active learning in a statistical learning theory framework. Specifically, I derive bounds on the rates of convergence achievable by active ..."
Abstract

Cited by 26 (9 self)
 Add to MetaCart
are those of the author and should not be interpreted as representing the official policies, either expressed or implied, I study the informational complexity of active learning in a statistical learning theory framework. Specifically, I derive bounds on the rates of convergence achievable by active learning, under various noise models and under general conditions on the hypothesis class. I also study the theoretical advantages of active learning over passive learning, and develop procedures for transforming passive learning algorithms into active learning algorithms with asymptotically superior label complexity. Finally, I study generalizations of active learning to more general forms of interactive statistical learning. viAcknowledgments There are so many people I am indebted to for helping to make this thesis, and indeed my entire career, possible. To begin, I am grateful to the faculty of Webster University, where my journey into science truly began. Support from the teachers I was privileged to have there, including Gary Coffman, BrittMarie Schiller, Ed and Anna B. Sakurai, and John Aleshunas, to name a few, inspired in me a deep curiosity
Learning pattern classification  A survey
 IEEE TRANS. INFORM. THEORY
, 1998
"... Classical and recent results in statistical pattern recognition and learning theory are reviewed in a twoclass pattern classification setting. This basic model best illustrates intuition and analysis techniques while still containing the essential features and serving as a prototype for many applic ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
Classical and recent results in statistical pattern recognition and learning theory are reviewed in a twoclass pattern classification setting. This basic model best illustrates intuition and analysis techniques while still containing the essential features and serving as a prototype for many applications. Topics discussed include nearest neighbor, kernel, and histogram methods, Vapnik–Chervonenkis theory, and neural networks. The presentation and the large (thogh nonexhaustive) list of references is geared to provide a useful overview of this field for both specialists and nonspecialists.
Learning active classifiers
 Proceedings of the Thirteenth International Conference on Machine Learning (ICML96
, 1996
"... Most classification algorithms are "passive", in that they assign a classlabel to each instance based only on the description given, even if that description is incomplete. By contrast, an active classifier can  at some cost  obtain the values of missing attributes, before deciding up ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
(Show Context)
Most classification algorithms are "passive", in that they assign a classlabel to each instance based only on the description given, even if that description is incomplete. By contrast, an active classifier can  at some cost  obtain the values of missing attributes, before deciding upon a class label. This can be useful when considering, for example, whether to extract some information from the web for a critical decision or whether to gather information for a medical test or experiment. The expected utility of using an active classifier depends on both the cost required to obtain the additional attribute values and the penalty incurred if the classifier outputs the wrong classification. This paper analyzes the problem of learning optimal active classifiers, using a variant of the probablyapproximatelycorrect (PAC) model. After defining the framework, we show that this task can be achieved efficiently when the active classifier is allowed to perform only (at most) a constant number of tests. We then show that, in more general environments, the task is often intractable.