• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Character-level Analysis of Semi-Structured Documents for Set Expansion

Cached

  • Download as a PDF

Download Links

  • [www-2.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www.richardwang.com]
  • [www.richwang.com]
  • [www.rcwang.com]
  • [www.cs.cmu.edu]
  • [rcwang.com]
  • [www.aclweb.org]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Richard C. Wang , William W. Cohen
Citations:9 - 5 self
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Wang_character-levelanalysis,
    author = {Richard C. Wang and William W. Cohen},
    title = {Character-level Analysis of Semi-Structured Documents for Set Expansion},
    year = {}
}

Bookmark

citeulike Connotea Bibsonomy Del.icio.us Digg Reddit

OpenURL

 

Abstract

Set expansion refers to expanding a partial set of “seed ” objects into a more complete set. One system that does set expansion is SEAL (Set Expander for Any Language), which expands entities automatically by utilizing resources from the Web in a language-independent fashion. In this paper, we illustrated in detail the construction of character-level wrappers for set expansion implemented in SEAL. We also evaluated several kinds of wrappers for set expansion and showed that character-based wrappers perform better than HTML-based wrappers. In addition, we demonstrated a technique that extends SEAL to learn binary relational concepts (e.g., “x is the mayor of the city y”) from only two seeds. We also show that the extended SEAL has good performance on our evaluation datasets, which includes English and Chinese, thus demonstrating language-independence. 1

Citations

460 Wrapper induction for information extraction - Kushmerick, Weld, et al. - 1997
283 Extracting patterns and relations from the world wide web - Brin - 1998
275 Snowball: Extracting relations from large plain-text collections - Agichtein, Gravano - 2000
205 Unsupervised named-entity extraction from the web: an experimental study - Etzioni, Cafarella, et al. - 2005
39 Webtables: exploring the power of tables on the web - Cafarella, Halevy, et al.
11 Coupling semisupervised learning of categories and relations - Carlson, Betteridge, et al. - 2009
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University