• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

Employing EM in Pool-Based Active Learning for Text Classification (1998)

Cached

  • Download as a PDF

Download Links

  • [www.cs.umass.edu]
  • [www.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www.kamalnigam.com]
  • [www.cs.cmu.edu]
  • [www.cs.cmu.edu]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Andrew Mccallum , Kamal Nigam
Citations:320 - 10 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Mccallum98employingem,
    author = {Andrew Mccallum and Kamal Nigam},
    title = {Employing EM in Pool-Based Active Learning for Text Classification},
    year = {1998}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

This paper shows how a text classifier's need for labeled training data can be reduced by a combination of active learning and Expectation Maximization (EM) on a pool of unlabeled data. Query-by-Committee is used to actively select documents for labeling, then EM with a naive Bayes model further improves classification accuracy by concurrently estimating probabilistic labels for the remaining unlabeled documents and using them to improve the model. We also present a metric for better measuring disagreement among committee members; it accounts for the strength of their disagreement and for the distribution of the documents. Experimental results show that our method of combining EM and active learning requires only half as many labeled training examples to achieve the same accuracy as either EM or active learning alone. Keywords: text classification active learning unsupervised learning information retrieval 1 Introduction In many settings for learning text classifiers, obtaining lab...

Keyphrases

pool-based active learning    text classification    active learning    text classifier    unlabeled document    unlabeled data    many setting    classification accuracy    measuring disagreement    labeled training data    committee member    expectation maximization    information retrieval    text classification active learning    probabilistic label    select document    experimental result    training example    naive bayes model   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University