## Toward Optimal Active Learning through Sampling Estimation of Error Reduction (2001)

### Cached

### Download Links

- [www.cs.wustl.edu]
- [www-connex.lip6.fr]
- [www-poleia.lip6.fr]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proc. 18th International Conf. on Machine Learning |

Citations: | 253 - 2 self |

### BibTeX

@INPROCEEDINGS{Roy01towardoptimal,

author = {Nicholas Roy and Andrew Mccallum},

title = {Toward Optimal Active Learning through Sampling Estimation of Error Reduction},

booktitle = {In Proc. 18th International Conf. on Machine Learning},

year = {2001},

pages = {441--448},

publisher = {Morgan Kaufmann}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper presents an active learning method that directly optimizes expected future error. This is in contrast to many other popular techniques that instead aim to reduce version space size. These other methods are popular because for many learning models, closed form calculation of the expected future error is intractable. Our approach is made feasible by taking a sampling approach to estimating the expected reduction in error due to the labeling of a query. In experimental results on two real-world data sets we reach high accuracy very quickly, sometimes with four times fewer labeled examples than competing methods. 1.

### Citations

2509 | Bagging predictors
- Breiman
- 1996
(Show Context)
Citation Context ...ning class tends to be very close to 1, and the losing classes have probabilities close to 0. We address this problem with a sampling-based approach to variance reduction, otherwise known as baggings(=-=Breiman, 1996-=-). From our original labeled training set of size s, a different training set is created by sampling s times with replacement from the original. The learner then creates a new classifier from this sam... |

1706 | Text categorization with support vector machines: Learning with many relevant features
- Joachims
- 1998
(Show Context)
Citation Context ...ne such classification method that performs surprisingly well given its simplicity is naive Bayes. Naive Bayes is not always the best performing classification algorithm for text (Nigam et al., 1999; =-=Joachims, 1998-=-), but it continues to be widely used for the purpose because it is efficient and simple to implement, and even against significantly more complex methods, it rarely trails far behind in accuracy. Thi... |

762 | A comparison of event models for naive bayes text classification - McCallum, Nigam - 1998 |

552 |
Generalization as Search
- Mitchell
- 1982
(Show Context)
Citation Context ..., 1994) selects the example on which the current learner has lowest certainty; Query-by-Committee (Seung et al., 1992; Freund et al., 1997) selects examples that reduce the size of the version space (=-=Mitchell, 1982-=-) (the size of the subset of parameter space that correctly classifies the labeled examples). Tong and Koller's Support Vector Machine method (2000a) is also based on reducing version space size. None... |

548 | Distributional clustering of english words - Pereira, Tishby, et al. - 1993 |

530 | Active learning with statistical models - Cohn, Ghahramani, et al. - 1995 |

509 | Support vector machine active learning with applications to text classification - Tong, Koller - 2000 |

475 | A sequential algorithm for training text classifiers - Lewis, Gale - 1994 |

414 | Divergence measures based on the shannon entropy
- Lin
- 1991
(Show Context)
Citation Context ...on-Engelson and Dagan (1999) suggest using a probabilistic measure based on vote-entropy of the committee, whereas McCallum & Nigam explicitly measure disagreement using the JensenShannon divergence (=-=Lin, 1991-=-; Pereira et al., 1993). However, they recognize that this error metric does not measure the impact that a labeled document had on classifier uncertainty on other unlabeled documents. They therefore f... |

336 | Selective sampling using the query by committee algorithm
- Freund, Seung, et al.
- 1997
(Show Context)
Citation Context ...different, non-optimal criterion. For example, uncertainty sampling (Lewis & Gale, 1994) selects the example on which the current learner has lowest certainty; Query-by-Committee (Seung et al., 1992; =-=Freund et al., 1997-=-) selects examples that reduce the size of the version space (Mitchell, 1982) (the size of the subset of parameter space that correctly classifies the labeled examples). Tong and Koller's Support Vect... |

318 | Query by committee - Seung, Opper, et al. - 1992 |

262 | Using maximum entropy for text classification - Nigam, Lafferty, et al. - 1999 |

258 | Employing EM and poolbased active learning for text classification - McCallum, Nigam - 1998 |

171 | Incremental and decremental support vector machine learning - Cauwenberghs, Poggio - 2001 |

99 | Boosting in the limit: Maximizing the margin of learned ensembles - Grove, Schuurmans - 1998 |

92 | Query learning strategies using boosting and bagging - Abe, Mamitsuka - 1998 |

81 | Active learning with committees for text categorization - Liere, Tadepalli - 1997 |

61 | Selective sampling for nearest neighbor classi ers - Lindenbaum, Markovitch, et al. |

51 | I.: Committee-based sample selection for probabilistic classifiers - Argamon-Engelson, Dagan - 1999 |

41 | Bayesian averaging of classifiers and the overfitting problem
- Domingos
- 2000
(Show Context)
Citation Context ... from any individual classifier are completely extreme, the bagged posterior is more smooth and reflective of the true uncertainty. This approach has been shown not necessarily to reduce overfitting (=-=Domingos, 2000-=-), but it does certainly give better posterior probabilities. One interesting aspect of this approach is that it can be applied to any classifier---even ones that don't give class posterior probabilit... |