#### DMCA

## Active Learning with a Drifting Distribution

### Cached

### Download Links

Citations: | 1 - 0 self |

### Citations

965 | Estimation of dependences based on empirical data - Vapnik - 1982 |

772 | Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm
- Littlestone
- 1988
(Show Context)
Citation Context ...earning under distribution drift, with fixed target concept. There are several branches of the literature that are highly relevant to this, including domain adaptation [MMR09, MMR08], online learning =-=[Lit88]-=-, learning with concept drift, and empirical processes for independent but not identically distributed data [vdG00]. Streamed-based Active Learning with a Fixed Distribution [DKM09] show that a certai... |

727 | Learnability and the Vapnik-Chervonenkis dimension
- Blumer, Ehrenfeucht, et al.
- 1989
(Show Context)
Citation Context ...neke (2011b), but reformulated for this stream-based model. Let Vt denote the set of classifiers h ∈ C with êr(h;Qt) = 0 (with V0 = C). Classic results from statistical learning theory (Vapnik, 1982; =-=Blumer, Ehrenfeucht, Haussler, and Warmuth, 1989-=-) imply that for t > d, with probability at least 1−δ, diamt(Vt−1) ≤ cd log(2e(t−1)/d)+log(4/δ) , (1) t−1 4for some universal constant c ∈ (1,∞). In particular, for d < t ≤ T, since the probability C... |

544 | Improving generalization with active learning,”
- Cohn, Atlas, et al.
- 1994
(Show Context)
Citation Context ...2, CMEDV10], the minimax number of mistakes (or excess number of mistakes, in the noisy case) can be sublinear in the number of samples. We specifically study the classic CAL active learning strategy =-=[CAL94]-=- in this context, and bound the number of mistakes and label requests the algorithm makes in the realizable case, under conditions on the concept space and the family of possible distributions. We als... |

416 |
Neural Network Learning - Theoretical Foundations
- Anthony, Bartlett
- 1999
(Show Context)
Citation Context ...h∗ (X), and let Vi = C[Li]. Then ∀t > Tǫ, E [ diamk(t)(Vǫ) ] ≤ E [ diamk(t)(Vk(t)) ] + ∑ ‖Ds −Pk(s)‖ ≤ E [ diamk(t)(Vk(t)) ] +L(ǫ)ǫ. s≤Ti:k(s)=k(t) 5By classic results in the theory of PAC learning (=-=Anthony and Bartlett, 1999-=-; Vapnik, 1982), Combining the above arguments, [ T∑ ] E diamt(C[Zt−1]) ≤ Tǫ + t=1 ∀t > Tǫ,E [ diam k(t)(V k(t)) ] ≤ √ ǫ. T∑ t=Tǫ+1 ≤ Tǫ +ǫT +L(ǫ)ǫT + E[diamt(Vǫ)] ≤ Tǫ +ǫT + T∑ t=Tǫ+1 ≤ Tǫ +ǫT +L(ǫ)ǫ... |

146 | Smooth discrimination analysis.
- Mammen, Tsybakov
- 1999
(Show Context)
Citation Context ... in noisy scenarios where the noise distribution remains fixed over time but the marginal distribution on X may shift. In particular, we upper bound these quantities under Tsybakov’s noise conditions =-=[MT99]-=-. We also prove minimax lower bounds under these same conditions, though there is a gap between our upper and lower bounds. 2 Definition and Notations As in the usual statistical learning problem, the... |

126 | Local Rademacher complexities and oracle inequalities in risk minimization. - Koltchinskii - 2006 |

120 | Coarse sample complexity bounds for active learning,”
- Dasgupta
- 2005
(Show Context)
Citation Context ...s the tradeoff between the number of label requests and the number of unlabeled examples needed. In the realizable case, that trade-off is tightly characterized by Dasgupta’s splitting index analysis =-=[Das05]-=-. It would be interesting to determine whether the splitting index tightly characterizes the mistakesvs-queries trade-off in this stream-based setting as well. In the batch setting, in which unlabeled... |

113 | A general agnostic active learning algorithm.
- Dasgupta, Hsu, et al.
- 2007
(Show Context)
Citation Context .... We formalize this in the following assumption. Assumption 5 Assumption 4 is satisfied for all D ∈ D, with the same c and α values. 5.2 Agnostic CAL The following algorithm is essentially take from (=-=Dasgupta, Hsu, and Monteleoni, 2007-=-; Hanneke, 2011b), adapted here for this stream-based setting. It is based on a subroutine: Learn(L,Q) = { argminh∈C:êr(h;L)=0êr(h;Q), if minh∈Cêr(h;L) = 0 . ∅, otherwise ACAL 1. t ← 0, Lt ← ∅, Qt ← ∅... |

96 | A bound on the label complexity of agnostic active learning.
- Hanneke
- 2007
(Show Context)
Citation Context ...rk for detailed studies of the one-inclusion graph prediction strategy. t)} 4.1 Learning with a Fixed Distribution We begin the discussion with the simplest case: namely, when |D| = 1. Definition 1. (=-=Hanneke, 2007-=-, 2011b) Define the disagreement coefficient of h ∗ under a distribution P as θP(ǫ) = sup r>ǫ P (DIS(BP(h∗,r))) . r Theorem 1. For any distribution P on X, if D = {P}, then running CAL with A = A1IG a... |

79 | Analysis of perceptron-based active learning.
- Dasgupta, Kalai, et al.
- 2005
(Show Context)
Citation Context ...], online learning [Lit88], learning with concept drift, and empirical processes for independent but not identically distributed data [vdG00]. Streamed-based Active Learning with a Fixed Distribution =-=[DKM09]-=- show that a certain modified perceptron-like active learning algorithm can achieve a mistake bound O(d log(T )) and query bound Õ(d log(T )), when learning a linear separator under a uniform distrib... |

71 | Knows What It Knows: A Framework for Self-Aware Learning. In
- Li, Littman, et al.
- 2008
(Show Context)
Citation Context ...s alternative framework essentially separates out the mistakes due to over-confidence from the mistakes due to recognized uncertainty. In some sense, this is related to the KWIK model of learning of (=-=Li, Littman, and Walsh, 2008-=-). Analyzing the above procedures in this alternative model yields several interesting details. Specifically, consider the following natural modifications to the above procedures. We refer to the algo... |

63 | The true sample complexity of active learning.
- Balcan, Hanneke, et al.
- 2008
(Show Context)
Citation Context ... trade-off in this stream-based setting as well. In the batch setting, in which unlabeled examples are considered free, and performance is only measured as a function of the number of label requests, =-=[BHV10]-=- have found that there is an important distinction between the verifiable label complexity and the unverifiable label complexity. In particular, while the former is sometimes no better than passive le... |

60 |
Predicting {0, 1} functions on randomly drawn points.
- Haussler, Littlestone, et al.
- 1994
(Show Context)
Citation Context ...djacent in the one-inclusion graph, and we choose the one toward which the edge is directed and use the label for xt+1 in the corresponding labeling of U as our prediction for the label of xt+1. See (=-=Haussler, Littlestone, and Warmuth, 1994-=-) and subsequent work for detailed studies of the one-inclusion graph prediction strategy. t)} 4.1 Learning with a Fixed Distribution We begin the discussion with the simplest case: namely, when |D| =... |

56 | Domain adaptation with multiple sources. - Mansour - 2009 |

45 | Domain adaptation: Learning bounds and algorithms. - Mansour, Mohri, et al. - 2009 |

40 | Rates of Convergence in Active Learning.
- Hanneke
- 2009
(Show Context)
Citation Context ... δ), and (letting Zǫ = {j ∈ Z : 2j ≥ ǫ}) Êt(L,Q) = inf { ǫ > 0 : ∀j ∈ Zǫ,min m∈N Ût(ǫ, δ⌊log(t)⌋;L,Q) ≤ 2j−4 } . 5.3 Learning with a Fixed Distribution The following results essentially follow from =-=[Han11]-=-, adapted to this stream-based setting. Theorem 5. For any strictly benign (P, η), if 2−2i ≪ δi ≪ 2−i/i, ACAL achieves an expected excess number of mistakes M̄T − M∗T = o(T ), and if θP (ǫ) = o(1/ǫ), ... |

20 | Robust selective sampling from single and multiple teachers
- Dekel, Gentile, et al.
- 2010
(Show Context)
Citation Context ...ve learning algorithm can achieve a mistake bound O(d log(T )) and query bound Õ(d log(T )), when learning a linear separator under a uniform distribution on the unit sphere, in the realizable case. =-=[DGS10]-=- also analyze the problem of learning linear separators under a uniform distribution, but allowing Tsybakov noise. They find that with Q̄T = Õ ( d 2α α+2T 2 α+2 ) queries, it is possible to achieve a... |

15 | Learning with a slowly changing distribution
- Bartlett
- 1992
(Show Context)
Citation Context ...t T examples in the stream. In particular, we study scenarios in which the distribution may drift within a fixed totally bounded family of distributions. Unlike previous models of distribution drift (=-=Bartlett, 1992-=-; Crammer, Mansour, Even-Dar, and Vaughan, 2010), the minimax number of mistakes (or excess number of mistakes, in the noisy case) can be sublinear in the number of samples. We specifically study the ... |

12 | On the complexity of learning from drifting distributions - Barve, Long - 1997 |

12 | Learning under persistent drift
- Freund, Mansour
- 1997
(Show Context)
Citation Context ...ss into smaller and smaller regions where the algorithm is uncertain of the target’s behavior, so that the number of mistakes grows linearly in the number of samples in the worst case. More recently, =-=[FM97]-=- have investigated learning when the distribution changes as a linear function of time. They present algorithms that estimate the error of functions, using knowledge of this linear drift. 4 Active Lea... |

11 | Activized learning: Transforming passive to active with improved label complexity. J Mach Learn Res. 2012;13:1469–587. Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No spa - Hanneke |

8 | Smooth discrimination analysis. The Annals of Statistics 27(6):1808–1829 - Mammen, Tsybakov - 1999 |

7 |
Regret minimization with concept drift
- Crammer, Mansour, et al.
- 2010
(Show Context)
Citation Context ...the stream. In particular, we study scenarios in which the distribution may drift within a fixed totally bounded family of distributions. Unlike previous models of distribution drift (Bartlett, 1992; =-=Crammer, Mansour, Even-Dar, and Vaughan, 2010-=-), the minimax number of mistakes (or excess number of mistakes, in the noisy case) can be sublinear in the number of samples. We specifically study the classic CAL active learning strategy in this co... |

5 | Rates of convergence in active learning. The Annals of Statistics - Hanneke |