Results 1  10
of
92
Agnostic active learning
 In ICML
, 2006
"... We state and analyze the first active learning algorithm which works in the presence of arbitrary forms of noise. The algorithm, A2 (for Agnostic Active), relies only upon the assumption that the samples are drawn i.i.d. from a fixed distribution. We show that A2 achieves an exponential improvement ..."
Abstract

Cited by 188 (16 self)
 Add to MetaCart
We state and analyze the first active learning algorithm which works in the presence of arbitrary forms of noise. The algorithm, A2 (for Agnostic Active), relies only upon the assumption that the samples are drawn i.i.d. from a fixed distribution. We show that A2 achieves an exponential improvement (i.e., requires only O � ln 1 ɛ samples to find an ɛoptimal classifier) over the usual sample complexity of supervised learning, for several settings considered before in the realizable case. These include learning threshold classifiers and learning homogeneous linear separators with respect to an input distribution which is uniform over the unit sphere. 1.
Agnostic Active Learning Without Constraints
"... We present and analyze an agnostic active learning algorithm that works without keeping a version space. This is unlike all previous approaches where a restricted set of candidate hypotheses is maintained throughout learning, and only hypotheses from this set are ever returned. By avoiding this vers ..."
Abstract

Cited by 48 (6 self)
 Add to MetaCart
(Show Context)
We present and analyze an agnostic active learning algorithm that works without keeping a version space. This is unlike all previous approaches where a restricted set of candidate hypotheses is maintained throughout learning, and only hypotheses from this set are ever returned. By avoiding this version space approach, our algorithm sheds the computational burden and brittleness associated with maintaining version spaces, yet still allows for substantial improvements over supervised learning for classification. 1
RATES OF CONVERGENCE IN ACTIVE LEARNING
 SUBMITTED TO THE ANNALS OF STATISTICS
"... We study the rates of convergence in generalization error achievable by active learning under various types of label noise. Additionally, we study the general problem of model selection for active learning with a nested hierarchy of hypothesis classes, and propose an algorithm whose error rate prova ..."
Abstract

Cited by 39 (4 self)
 Add to MetaCart
We study the rates of convergence in generalization error achievable by active learning under various types of label noise. Additionally, we study the general problem of model selection for active learning with a nested hierarchy of hypothesis classes, and propose an algorithm whose error rate provably converges to the best achievable error among classifiers in the hierarchy at a rate adaptive to both the complexity of the optimal classifier and the noise conditions. In particular, we state sufficient conditions for these rates to be dramatically faster than those achievable by passive learning.
Efficient learning with partially observed attributes
 In ICML, 2010. Kun
"... We describe and analyze efficient algorithms for learning a linear predictor from examples when the learner can only view a few attributes of each training example. This is the case, for instance, in medical research, where each patient participating in the experiment is only willing to go through a ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
(Show Context)
We describe and analyze efficient algorithms for learning a linear predictor from examples when the learner can only view a few attributes of each training example. This is the case, for instance, in medical research, where each patient participating in the experiment is only willing to go through a small number of tests. Our analysis bounds the number of additional examples sufficient to compensate for the lack of full information on each training example. We demonstrate the efficiency of our algorithms by showing that when running on digit recognition data, they obtain a high prediction accuracy even when the learner gets to see only four pixels of each image. 1.
Rademacher complexities and bounding the excess risk in active learning
, 2009
"... Sequential algorithms of active learning based on the estimation of the level sets of the empirical risk are discussed in the paper. Localized Rademacher complexities are used in the algorithms to estimate the sample sizes needed to achieve the required accuracy of learning in an adaptive way. Proba ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
Sequential algorithms of active learning based on the estimation of the level sets of the empirical risk are discussed in the paper. Localized Rademacher complexities are used in the algorithms to estimate the sample sizes needed to achieve the required accuracy of learning in an adaptive way. Probabilistic bounds on the number of active examples have been proved and several applications to binary classification problems are considered.
Learning Bounds for Importance Weighting
"... This paper presents an analysis of importance weighting for learning from finite samples and gives a series of theoretical and algorithmic results. We point out simple cases where importance weighting can fail, which suggests the need for an analysis of the properties of this technique. We then give ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
This paper presents an analysis of importance weighting for learning from finite samples and gives a series of theoretical and algorithmic results. We point out simple cases where importance weighting can fail, which suggests the need for an analysis of the properties of this technique. We then give both upper and lower bounds for generalization with bounded importance weights and, more significantly, give learning guarantees for the more common case of unbounded importance weights under the weak assumption that the second moment is bounded, aconditionrelatedtotheRényidivergenceofthetrainingand test distributions. These results are based on a series of novel and general boundswederiveforunbounded loss functions, which are of independent interest. We use these bounds to guide the definition of an alternative reweighting algorithm andreporttheresults of experiments demonstrating its benefits. Finally, we analyze the properties of normalized importance weights which are also commonly used. 1
Two faces of active learning
"... An active learner has a collection of data points, each with a label that is initially hidden but can be obtained at some cost. Without spending too much, it wishes to find a classifier that will accurately map points to labels. There are two common intuitions about how this learning process should ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
An active learner has a collection of data points, each with a label that is initially hidden but can be obtained at some cost. Without spending too much, it wishes to find a classifier that will accurately map points to labels. There are two common intuitions about how this learning process should be organized: (i) bychoosing querypoints that shrinkthe space of candidate classifiers as rapidly as possible; and (ii) by exploitingnatural clusters inthe (unlabeled)data set. Recentresearch has yieldedlearning algorithms for both paradigms that are efficient, work with generic hypothesis classes, and have rigorously characterized labeling requirements. Here we survey these advances by focusing on two representative algorithms and discussing their mathematical properties and empirical performance. 1
Active Risk Estimation
, 2010
"... We address the problem of evaluating the risk of a given model accurately at minimal labeling costs. This problem occurs in situations in which risk estimates cannot be obtained from heldout training data, because the training data are unavailable or do not reflect the desired test distribution. We ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
We address the problem of evaluating the risk of a given model accurately at minimal labeling costs. This problem occurs in situations in which risk estimates cannot be obtained from heldout training data, because the training data are unavailable or do not reflect the desired test distribution. We study active risk estimation processes in which instances are actively selected by a sampling process from a pool of unlabeled test instances and their labels are queried. We derive the sampling distribution that minimizes the estimation error of the active risk estimator when used to select instances from the pool. An analysis of the distribution that governs the estimator leads to confidence intervals. We empirically study conditions under which the active risk estimate is more accurate than a standard risk estimate that draws equally many instances from the test distribution. 1.
Averagecase active learning with costs
 THEORY. UNIVERSITY OF PORTO
, 2009
"... We analyze the expected cost of a greedy active learning algorithm. Our analysis extends previous work to a more general setting in which different queries have different costs. Moreover, queries may have more than two possible responses and the distribution over hypotheses may be non uniform. Speci ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
(Show Context)
We analyze the expected cost of a greedy active learning algorithm. Our analysis extends previous work to a more general setting in which different queries have different costs. Moreover, queries may have more than two possible responses and the distribution over hypotheses may be non uniform. Specific applications include active learning with label costs, active learning for multiclass and partial label queries, and batch mode active learning. We also discuss an approximate version of interest when there are very many queries.
S.: Streambased joint explorationexploitation active learning
 In: CVPR (2012
"... 1 st table 2 nd table k th table New table n1 n + ↵ 5 ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
(Show Context)
1 st table 2 nd table k th table New table n1 n + ↵ 5