Results 1  10
of
113
Active learning literature survey
, 2010
"... The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labeled training instances if it is allowed to choose the data from which is learns. An active learner may ask queries in the form of unlabeled instances to be labeled by an oracle (e.g., ..."
Abstract

Cited by 132 (1 self)
 Add to MetaCart
The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labeled training instances if it is allowed to choose the data from which is learns. An active learner may ask queries in the form of unlabeled instances to be labeled by an oracle (e.g., a human annotator). Active learning is wellmotivated in many modern machine learning problems, where unlabeled data may be abundant but labels are difficult, timeconsuming, or expensive to obtain. This report provides a general introduction to active learning and a survey of the literature. This includes a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date. An analysis of the empirical and theoretical evidence for active learning, a summary of several problem setting variants, and a discussion
A general agnostic active learning algorithm
, 2007
"... We present a simple, agnostic active learning algorithm that works for any hypothesis class of bounded VC dimension, and any data distribution. Our algorithm extends a scheme of Cohn, Atlas, and Ladner to the agnostic setting, by (1) reformulating it using a reduction to supervised learning and (2) ..."
Abstract

Cited by 72 (13 self)
 Add to MetaCart
We present a simple, agnostic active learning algorithm that works for any hypothesis class of bounded VC dimension, and any data distribution. Our algorithm extends a scheme of Cohn, Atlas, and Ladner to the agnostic setting, by (1) reformulating it using a reduction to supervised learning and (2) showing how to apply generalization bounds even for the noni.i.d. samples that result from selective sampling. We provide a general characterization of the label complexity of our algorithm. This quantity is never more than the usual PAC sample complexity of supervised learning, and is exponentially smaller for some hypothesis classes and distributions. We also demonstrate improvements experimentally.
Analysis of perceptronbased active learning
 In COLT
, 2005
"... Abstract. We start by showing that in an active learning setting, the Perceptron algorithm needs \Omega ( 1ffl2) labels to learn linear separators within generalization error ffl. We then present a simple selective sampling algorithm for this problem, which combines a modification of the perceptron ..."
Abstract

Cited by 69 (12 self)
 Add to MetaCart
Abstract. We start by showing that in an active learning setting, the Perceptron algorithm needs \Omega ( 1ffl2) labels to learn linear separators within generalization error ffl. We then present a simple selective sampling algorithm for this problem, which combines a modification of the perceptron update with an adaptive filtering rule for deciding which points to query. For data distributed uniformly over the unit sphere, we show that our algorithm reaches generalization error ffl after asking for just ~O(d log 1ffl) labels. This exponential improvement over the usual sample complexity of supervised learning has previously been demonstrated only for the computationally more complex querybycommittee algorithm. 1 Introduction In many machine learning applications, unlabeled data is abundant but labelingis expensive. This distinction is not captured in the standard PAC or online models of supervised learning, and has motivated the field of active learning, inwhich the labels of data points are initially hidden, and the learner must pay for each label it wishes revealed. If query points are chosen randomly, the numberof labels needed to reach a target generalization error ffl, at a target confidencelevel 1 ffi, is similar to the sample complexity of supervised learning. The hopeis that there are alternative querying strategies which require significantly fewer
A bound on the label complexity of agnostic active learning
 In Proc. of the 24th international conference on Machine learning
, 2007
"... We study the label complexity of poolbased active learning in the agnostic PAC model. Specifically, we derive general bounds on the number of label requests made by the A 2 algorithm proposed by Balcan, Beygelzimer & Langford (Balcan et al., 2006). This represents the first nontrivial generalpurpo ..."
Abstract

Cited by 63 (9 self)
 Add to MetaCart
We study the label complexity of poolbased active learning in the agnostic PAC model. Specifically, we derive general bounds on the number of label requests made by the A 2 algorithm proposed by Balcan, Beygelzimer & Langford (Balcan et al., 2006). This represents the first nontrivial generalpurpose upperboundonlabelcomplexityintheagnostic PAC model. 1.
Minimax bounds for active learning
 In COLT
, 2007
"... Abstract. This paper aims to shed light on achievable limits in active learning. Using minimax analysis techniques, we study the achievable rates of classification error convergence for broad classes of distributions characterized by decision boundary regularity and noise conditions. The results cle ..."
Abstract

Cited by 58 (5 self)
 Add to MetaCart
Abstract. This paper aims to shed light on achievable limits in active learning. Using minimax analysis techniques, we study the achievable rates of classification error convergence for broad classes of distributions characterized by decision boundary regularity and noise conditions. The results clearly indicate the conditions under which one can expect significant gains through active learning. Furthermore we show that the learning rates derived are tight for “boundary fragment ” classes in ddimensional feature spaces when the feature marginal density is bounded from above and below. 1
Importance Weighted Active Learning
"... We present a practical and statistically consistent scheme for actively learning binary classifiers under general loss functions. Our algorithm uses importance weighting to correct sampling bias, and by controlling the variance, we are able to give rigorous label complexity bounds for the learning p ..."
Abstract

Cited by 53 (6 self)
 Add to MetaCart
We present a practical and statistically consistent scheme for actively learning binary classifiers under general loss functions. Our algorithm uses importance weighting to correct sampling bias, and by controlling the variance, we are able to give rigorous label complexity bounds for the learning process. 1.
Margin based active learning
 Proc. of the 20 th Conference on Learning Theory
, 2007
"... Abstract. We present a framework for margin based active learning of linear separators. We instantiate it for a few important cases, some of which have been previously considered in the literature. We analyze the effectiveness of our framework both in the realizable case and in a specific noisy sett ..."
Abstract

Cited by 42 (9 self)
 Add to MetaCart
Abstract. We present a framework for margin based active learning of linear separators. We instantiate it for a few important cases, some of which have been previously considered in the literature. We analyze the effectiveness of our framework both in the realizable case and in a specific noisy setting related to the Tsybakov small noise condition. 1
The True Sample Complexity of Active Learning
"... We describe and explore a new perspective on the sample complexity of active learning. In many situations where it was generally believed that active learning does not help, we find that active learning does help in the limit, often with exponential improvements in sample complexity. This contrasts ..."
Abstract

Cited by 42 (13 self)
 Add to MetaCart
We describe and explore a new perspective on the sample complexity of active learning. In many situations where it was generally believed that active learning does not help, we find that active learning does help in the limit, often with exponential improvements in sample complexity. This contrasts with the traditional analysis of active learning problems such as nonhomogeneous linear separators or depthlimited decision trees, in which Ω(1/ɛ) lower bounds are common; we point out that such results must be interpreted carefully, and that finding an ɛgood classifier can always be accomplished with a number of samples asymptotically smaller than any such bound. These new insights arise from a subtle variation on the traditional definition of sample complexity, not previously recognized in the active learning literature. 1
Active learning in the nonrealizable case
 NIPS Workshop on Foundations of Active Learning
, 2006
"... Abstract. Most of the existing active learning algorithms are based on the realizability assumption: The learner’s hypothesis class is assumed to contain a target function that perfectly classifies all training and test examples. This assumption can hardly ever be justified in practice. In this pape ..."
Abstract

Cited by 41 (0 self)
 Add to MetaCart
Abstract. Most of the existing active learning algorithms are based on the realizability assumption: The learner’s hypothesis class is assumed to contain a target function that perfectly classifies all training and test examples. This assumption can hardly ever be justified in practice. In this paper, we study how relaxing the realizability assumption affects the sample complexity of active learning. First, we extend existing results on query learning to show that any active learning algorithm for the realizable case can be transformed to tolerate random bounded rate class noise. Thus, bounded rate class noise adds little extra complications to active learning, and in particular exponential label complexity savings over passive learning are still possible. However, it is questionable whether this noise model is any more realistic in practice than assuming no noise at all. Our second result shows that if we move to the truly nonrealizable model of statistical learning theory, then the label complexity of active learning has the same dependence Ω(1/ǫ 2) on the accuracy parameter ǫ as the passive learning label complexity. More specifically, we show that under the assumption that the best classifier in the learner’s hypothesis class has generalization error at most β> 0, the label complexity of active learning is Ω(β 2 /ǫ 2 log(1/δ)), where the accuracy parameter ǫ measures how close to optimal within the hypothesis class the active learner has to get and δ is the confidence parameter. The implication of this lower bound is that exponential savings should not be expected in realistic models of active learning, and thus the label complexity goals in active learning should be refined. 1
Hierarchical sampling for active learning
 Proceedings of the 25th International Conference on Machine learning
, 2008
"... We present an active learning scheme that exploits cluster structure in data. 1. ..."
Abstract

Cited by 37 (5 self)
 Add to MetaCart
We present an active learning scheme that exploits cluster structure in data. 1.