Results 1  10
of
27
Agnostic active learning
 In ICML
, 2006
"... We state and analyze the first active learning algorithm which works in the presence of arbitrary forms of noise. The algorithm, A2 (for Agnostic Active), relies only upon the assumption that the samples are drawn i.i.d. from a fixed distribution. We show that A2 achieves an exponential improvement ..."
Abstract

Cited by 188 (16 self)
 Add to MetaCart
We state and analyze the first active learning algorithm which works in the presence of arbitrary forms of noise. The algorithm, A2 (for Agnostic Active), relies only upon the assumption that the samples are drawn i.i.d. from a fixed distribution. We show that A2 achieves an exponential improvement (i.e., requires only O � ln 1 ɛ samples to find an ɛoptimal classifier) over the usual sample complexity of supervised learning, for several settings considered before in the realizable case. These include learning threshold classifiers and learning homogeneous linear separators with respect to an input distribution which is uniform over the unit sphere. 1.
Agnostic Active Learning Without Constraints
"... We present and analyze an agnostic active learning algorithm that works without keeping a version space. This is unlike all previous approaches where a restricted set of candidate hypotheses is maintained throughout learning, and only hypotheses from this set are ever returned. By avoiding this vers ..."
Abstract

Cited by 48 (6 self)
 Add to MetaCart
(Show Context)
We present and analyze an agnostic active learning algorithm that works without keeping a version space. This is unlike all previous approaches where a restricted set of candidate hypotheses is maintained throughout learning, and only hypotheses from this set are ever returned. By avoiding this version space approach, our algorithm sheds the computational burden and brittleness associated with maintaining version spaces, yet still allows for substantial improvements over supervised learning for classification. 1
Activized Learning: Transforming Passive to Active with Improved Label Complexity
"... Active learning methods often achieve improved performance using fewer labels compared to passive learning methods. A variety of practically successful active learning algorithms use a passive learning algorithm as a subroutine, and the essential role of the active component is to construct data set ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
Active learning methods often achieve improved performance using fewer labels compared to passive learning methods. A variety of practically successful active learning algorithms use a passive learning algorithm as a subroutine, and the essential role of the active component is to construct data sets to feed into the passive subroutine. This general idea is appealing for a variety of reasons, as it may be able
Plugin approach to active learning
 Journal of Machine Learning Research
"... We present a new active learning algorithm based on nonparametric estimators of the regression function. Our investigation provides probabilistic bounds for the rates of convergence of the generalization error achievable by proposed method over a broad class of underlying distributions. We also prov ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
We present a new active learning algorithm based on nonparametric estimators of the regression function. Our investigation provides probabilistic bounds for the rates of convergence of the generalization error achievable by proposed method over a broad class of underlying distributions. We also prove minimax lower bounds which show that the obtained rates are almost tight.
Active learning using smooth relative regret approximations with applications (full version
 In arXiv:1110.2136
, 2012
"... The disagreement coefficient of Hanneke has become a central data independent invariant in proving active learning rates. It has been shown in various ways that a concept class with low complexity together with a bound on the disagreement coefficient at an optimal solution allows active learning rat ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
The disagreement coefficient of Hanneke has become a central data independent invariant in proving active learning rates. It has been shown in various ways that a concept class with low complexity together with a bound on the disagreement coefficient at an optimal solution allows active learning rates that are superior to passive learning ones. We present a different tool for pool based active learning which follows from the existence of a certain uniform version of low disagreement coefficient, but is not equivalent to it. In fact, we present two fundamental active learning problems of significant interest for which our approach allows nontrivial active learning bounds. However, any general purpose method relying on the disagreement coefficient bounds only fails to guarantee any useful bounds for these problems. The applications of interest are: Learning to rank from pairwise preferences, and clustering with side information (a.k.a. semisupervised clustering). The tool we use is based on the learner’s ability to compute an estimator of the difference between the loss of any hypothesis and some fixed “pivotal ” hypothesis to within an absolute error of at most ε times the disagreement measure (ℓ1 distance) between the two hypotheses. We prove that such an estimator implies the existence of a learning algorithm which, at each iteration, reduces its inclass excess risk to within a constant factor. Each iteration replaces the current pivotal hypothesis with the minimizer of the estimated loss difference function with respect to the previous pivotal hypothesis. The label complexity essentially becomes that of computing this estimator.
Lower Bounds for Passive and Active Learning
"... We develop unified informationtheoretic machinery for deriving lower bounds for passive and active learning schemes. Our bounds involve the socalled Alexander’s capacity function. The supremum of this function has been recently rediscovered by Hanneke in the context of active learning under the na ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
We develop unified informationtheoretic machinery for deriving lower bounds for passive and active learning schemes. Our bounds involve the socalled Alexander’s capacity function. The supremum of this function has been recently rediscovered by Hanneke in the context of active learning under the name of “disagreement coefficient. ” For passive learning, our lower bounds match the upper bounds of Giné and Koltchinskii up to constants and generalize analogous results of Massart and Nédélec. For active learning, we provide first known lower bounds based on the capacity function rather than the disagreement coefficient. 1
The Power of Localization for Efficiently Learning Linear Separators with Malicious Noise
, 2013
"... In this paper we put forward new techniques for designing efficient algorithms for learning linear separators in the challenging malicious noise model, where an adversary may corrupt both the labels and the feature part of an η fraction of the examples. Our main result is a polynomialtime algorithm ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
In this paper we put forward new techniques for designing efficient algorithms for learning linear separators in the challenging malicious noise model, where an adversary may corrupt both the labels and the feature part of an η fraction of the examples. Our main result is a polynomialtime algorithm for learning linear separators in ℜd under the uniform distribution that can handle a noise rate of η = O
A Statistical Theory of Active Learning
, 2013
"... Active learning is a protocol for supervised machine learning, in which a learning algorithm sequentially requests the labels of selected data points from a large pool of unlabeled data. This contrasts with passive learning, where the labeled data are taken at random. The objective in active learnin ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Active learning is a protocol for supervised machine learning, in which a learning algorithm sequentially requests the labels of selected data points from a large pool of unlabeled data. This contrasts with passive learning, where the labeled data are taken at random. The objective in active learning is to produce a highlyaccurate classifier, ideally using fewer labels than the number of random labeled data sufficient for passive learning to achieve the same. This article describes recent advances in our understanding of the theoretical benefits of active learning, and implications for the design of effective active learning algorithms. Much of the article focuses on a particular technique, namely disagreement based active learning, which by now has amassed a mature and coherent literature. It also briefly surveys several alternative approaches from the literature. The emphasis is on theorems regarding the performance of a few general algorithms, including rigorous proofs where appropriate. However, the presentation is intended to be pedagogical, focusing
Active Property Testing
, 2011
"... One of the motivations for property testing of boolean functions is the idea that testing can serve as a preprocessing step before learning. However, in most machine learning applications, the ability to query functions at arbitrary points in the input space is considered highly unrealistic. Instead ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
One of the motivations for property testing of boolean functions is the idea that testing can serve as a preprocessing step before learning. However, in most machine learning applications, the ability to query functions at arbitrary points in the input space is considered highly unrealistic. Instead, the dominant query paradigm in applied machine learning has been that of active learning, where the algorithm may ask for examples to be labeled, but only from among those that exist in nature. That is, the algorithm may make a polynomial number of draws from the underlying distribution D and then query for labels, but only of points in its sample. In this work, we bring this wellstudied model in learning to the domain of testing. We show that for a number of important properties for learning, testing can still yield substantial benefits in this setting. This includes testing whether data satisfies the “cluster assumption”, testing linear separators, testing the largemargin assumption in lowdimensional spaces, and testing unions of intervals. In most of these cases, we show active testing requires substantially fewer label requests than passive testing (where the algorithm must pay for labels on every example drawn from D), or active or passive learning. For example, testing the cluster assumption can be done with O(1) label requests using active testing, but requires Ω ( √ N) labeled examples for passive testing and Ω(N) for learning, where N is the number of clusters; a similar pattern holds for unions of
A Theory of Transfer Learning with Applications to Active Learning
"... Abstract. We explore a transfer learning setting, in which a finite sequence of target concepts are sampled independently with an unknown distribution from a known family. We study the total number of labeled examples required to learn all targets to an arbitrary specified expected accuracy, focusin ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Abstract. We explore a transfer learning setting, in which a finite sequence of target concepts are sampled independently with an unknown distribution from a known family. We study the total number of labeled examples required to learn all targets to an arbitrary specified expected accuracy, focusing on the asymptotics in the number of tasks and the desired accuracy. Our primary interest is formally understanding the fundamental benefits of transfer learning, compared to learning each target independently from the others. Our approach to the transfer problem is general, in the sense that it can be used with a variety of learning protocols. As a particularly interesting application, we study in detail the benefits of transfer for selfverifying active learning; in this setting, we find that the number of labeled examples required for learning with transfer is often significantly smaller than that required for learning each target independently. 1