#### DMCA

## Practical Online Active Learning for Classification (2007)

### Cached

### Download Links

- [people.csail.mit.edu]
- [www.csail.mit.edu]
- [eprints.pascal-network.org]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Online Learning for Classification Workshop |

Citations: | 12 - 1 self |

### Citations

735 | Support vector machine active learning with applications to text classification.
- Tong, Koller
- 2002
(Show Context)
Citation Context ...wn to work in practice, for example Lewis and Gale’s sequential algorithm for text classification [14], which has batch access to the remaining unlabeled datapoints at each iteration. Tong and Koller =-=[17]-=- introduced several active learning algorithms, that use a support vector machine (SVM) as the underlying classifier, which work well in practice. At each step, the SVM algorithm is used to update the... |

631 | A Sequential Algorithm for Training Text Classifiers
- Lewis, Gale
- 1994
(Show Context)
Citation Context ...bels for the entire input space. It has been shown in domains such as OCR and text that the synthetic examples on which the learner has the most uncertainty may be difficult even for a human to label =-=[14]-=-. In the selective sampling model (originally introduced by [7]) the learner receives unlabeled data and may request certain labels to be revealed, at a constant cost per label. We will operate in thi... |

628 | Making large-scale support vector machine learning practical
- Joachims
- 1998
(Show Context)
Citation Context ...e experimented on 7 binary classification problems, 5 from MNIST and two from USPS, each consisting of approximately 10,000 examples. All the problems but two were linearly separable. (Using svmLight =-=[12]-=- we were unable to find separating hyperplanes for the problem {1,4,7} vs. all other characters, both in the MNIST and USPS versions). Since the algorithms access the data in one pass, in a sequential... |

544 | Improving generalization with active learning,” - Cohn, Atlas, et al. - 1994 |

433 | Selective sampling using the query by committee algorithm. - Freund, Seung, et al. - 1997 |

432 | Query by committee. In:
- Seung, Opper, et al.
- 1992
(Show Context)
Citation Context ...hms that can actually be implemented. Under Bayesian assumptions, Freund et al. [10] gave an upper bound on label-complexity for learning linear separators under the uniform, using Query By Committee =-=[16]-=-, a computationally complex algorithm that has recently been simplified to yield encouraging empirical results [11]. CesaBianchi, Conconi and Gentile provided regret bounds on an active learning algor... |

190 | Agnostic active learning.
- Balcan, Beygelzimer, et al.
- 2009
(Show Context)
Citation Context ...alcan, Beygelzimer and Langford provided a labelcomplexity upper bound for learning linear separators under the uniform input distribution, that relies on an algorithm that is computationally complex =-=[2]-=-. Several formal guarantees have been shown for active learning algorithms that can actually be implemented. Under Bayesian assumptions, Freund et al. [10] gave an upper bound on label-complexity for ... |

120 | Coarse sample complexity bounds for active learning,”
- Dasgupta
- 2005
(Show Context)
Citation Context ...esian, realizable setting (i.e. there exists a perfect separator for the data, in the concept class over which the learning is perfomed) for a scheme that requires exponential storage and computation =-=[8]-=-. In a non-Bayesian, agnostic setting, Balcan, Beygelzimer and Langford provided a labelcomplexity upper bound for learning linear separators under the uniform input distribution, that relies on an al... |

92 |
The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/,
- LeCun, Cortes, et al.
- 1998
(Show Context)
Citation Context ...ince OCR on small devices could stand to benefit from strongly online active learning solutions. Additionally, these datasets are known to be non-uniformly distributed over inputs. We used both MNIST =-=[13]-=- and USPS in order to experiment with multiple datasets and dimensionalities (d = 784 for MNIST, d = 256 for USPS). We experimented on 7 binary classification problems, 5 from MNIST and two from USPS,... |

79 | Analysis of perceptron-based active learning.
- Dasgupta, Kalai, et al.
- 2005
(Show Context)
Citation Context ...iting time before halving the active learning threshold. ing that are strongly online, and whose performance has been analyzed formally under various assumptions. Dasgupta, Kalai and Monteleoni (DKM) =-=[9]-=- provided an online active learning algorithm with a label-complexity upper bound for learning linear separators under the uniform input distribution, in a non-Bayesian, realizable setting. We will ex... |

65 |
Queries revisited.
- Angluin
- 2004
(Show Context)
Citation Context ...beled examples required to learn a concept via active learning, can be significantly lower than the PAC sample complexity. While the query learning model has been well studied theoretically (see e.g. =-=[1]-=-), it is often unrealistic in practice, as it requires access to labels for the entire input space. It has been shown in domains such as OCR and text that the synthetic examples on which the learner h... |

56 | Margin based active learning.
- Balcan, Broder, et al.
- 2007
(Show Context)
Citation Context ...ext query is selected from the pool of remaining unlabeled data by optimizing a margin-based heuristic that varies between different versions of the algorithm. Although, as very recent work has shown =-=[3]-=-, it is possible to define and analyze variants of these heuristics that operate in the sequential setting, the use of the SVM sub-algorithm breaks the online constraints on time and memory. Thus, nei... |

52 | Worst-case analysis of selective sampling for linear classification.
- Cesa-Bianchi, Gentile, et al.
- 2006
(Show Context)
Citation Context ...ly appropriate for strongly online active learning. We compare DKM’s performance to another state-of-the-art strongly online active learning algorithm due to Cesa-Bianchi, Gentile and Zaniboni (CBGZ) =-=[6]-=-, which has regret bounds in the individual sequence prediction context. 3. Algorithms The algorithms we consider are both for learning linear separators through the origin, in the online active learn... |

32 | Learning probabilistic linearthreshold classifiers via selective sampling
- Cesa-Bianchi, Conconi, et al.
- 2003
(Show Context)
Citation Context ...m that has recently been simplified to yield encouraging empirical results [11]. CesaBianchi, Conconi and Gentile provided regret bounds on an active learning algorithm for learning linear thresholds =-=[5]-=- from a stream of iid examples corrupted by random class noise whose rate scales with the examples’ margins. Both algorithms store all previously labeled points, and thus are not online. We focus on t... |

29 |
The perceptron algorithm is fast for nonmalicious distributions. Neural Comput
- Baum
- 1990
(Show Context)
Citation Context ...The error rate decreases exponentially with the number of mistakes (the mistake bound is logarithmic in 1 ɛ ), whereas the mistake bound for the Perceptron update in this setting is polynomial in 1 ɛ =-=[4]-=-. They also provide a polynomial lower bound on mistakes (and therefore labels for the active setting) on Perceptron with any active learning rule, under the uniform input distribution. The active lea... |

29 | Query by committee made real.
- Gilad-Bachrach, Navot, et al.
- 2005
(Show Context)
Citation Context ...omplexity for learning linear separators under the uniform, using Query By Committee [16], a computationally complex algorithm that has recently been simplified to yield encouraging empirical results =-=[11]-=-. CesaBianchi, Conconi and Gentile provided regret bounds on an active learning algorithm for learning linear thresholds [5] from a stream of iid examples corrupted by random class noise whose rate sc... |

6 | Learning with online constraints: shifting concepts and active learning
- Monteleoni
- 2006
(Show Context)
Citation Context ...input distributions that are λ-similar to uniform, i.e. for any subset of the input space, the ratio between the distribution’s measure and the uniform measure is upper and lower bounded by constants =-=[15]-=-. 3.2. Application to the non-uniform setting In applying the algorithm to the non-uniform setting we changed the initial setting of the active learning threshold. Dasgupta et al. used s1 = 1 √ in the... |