• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Combining labeled and unlabeled data with co-training (1998)

Cached

  • Download as a PDF

Download Links

  • [l2r.cs.uiuc.edu]
  • [luthuli.cs.uiuc.edu]
  • [axon.cs.byu.edu]
  • [www.iro.umontreal.ca]
  • [www-connex.lip6.fr]
  • [www-connex.lip6.fr]
  • [www.cs.cmu.edu]
  • [www-2.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www.ri.cmu.edu]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Avrim Blum , Tom Mitchell
Citations:946 - 27 self
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

Versions

  • Version 0
  • Version 1
  • Version 2

Version History

Metadata Version 2

User correction supplied by mph

DatumValueSource
TITLE Combining labeled and unlabeled data with co-training INFERENCE
AUTHOR NAME Avrim Blum user correction
AUTHOR AFFIL School of Computer Science; Carnegie Mellon University user correction
AUTHOR ADDR Pittsburgh, PA 15213-3891 user correction
AUTHOR NAME Tom Mitchell user correction
AUTHOR AFFIL School of Computer Science; Carnegie Mellon University user correction
AUTHOR ADDR Pittsburgh, PA 15213-3891 user correction
ABSTRACT We consider the problem of using a large unlabeled sample to boost performance of a learning algorithm when only a small set of labeled examples is available. In particular, we consider a setting in which the description of each example can be partitioned into two distinct views, motivated by the task of learning to classify web pages. For example, the description of a web page can be partitioned into the words occurring on that page, and the words occurring in hyperlinks that point to that page. We assume that either view of the example would be su cient for learning if we had enough labeled data, but our goal is to use both views together to allow inexpensive unlabeled data to augment amuch smaller set of labeled examples. Speci cally, the presence of two distinct views of each example suggests strategies in which two learning algorithms are trained separately on each view, and then each algorithm's predictions on new unlabeled examples are used to enlarge the training set of the other. Our goal in this paper is to provide a PAC-style analysis for this setting, and, more broadly, a PAC-style framework for the general problem of learning from both labeled and unlabeled data. We also provide empirical results on real web-page data indicating that this use of unlabeled examples can lead to signi cant improvement of hypotheses in practice. As part of our analysis, we provide new re- user correction
YEAR 1998 INFERENCE
VENUE TYPE CONFERENCE INFERENCE
PAGES 92--100 INFERENCE
CITATIONS 17 found ParsCit 1.0
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University