• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

Unsupervised namedentity extraction from the web: An experimental study. (2005)

Cached

  • Download as a PDF

Download Links

  • [staff.icar.cnr.it]
  • [www.cs.washington.edu]
  • [homes.cs.washington.edu]
  • [talshaked.com]
  • [homes.cs.washington.edu]
  • [www.cs.washington.edu]
  • [www.eecs.umich.edu]
  • [www.cs.washington.edu]
  • [www.cs.washington.edu]
  • [web.eecs.umich.edu]
  • [www.sis.pitt.edu]
  • [web.eecs.umich.edu]
  • [homes.cs.washington.edu]
  • [knight.cis.temple.edu]
  • [www.cis.temple.edu]
  • [www.cis.temple.edu]
  • [homes.cs.washington.edu]
  • [knight.cis.temple.edu]
  • [www.sis.pitt.edu]
  • [web.eecs.umich.edu]
  • [web.eecs.umich.edu]
  • [turing.cs.washington.edu]
  • [ai.cs.washington.edu]
  • [www.cs.washington.edu]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Oren Etzioni , Michael Cafarella , Doug Downey , Ana-Maria Popescu , Tal Shaked , Stephen Soderland , Daniel S Weld , Alexander Yates
Venue:Artificial Intelligence,
Citations:372 - 39 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@ARTICLE{Etzioni05unsupervisednamedentity,
    author = {Oren Etzioni and Michael Cafarella and Doug Downey and Ana-Maria Popescu and Tal Shaked and Stephen Soderland and Daniel S Weld and Alexander Yates},
    title = {Unsupervised namedentity extraction from the web: An experimental study.},
    journal = {Artificial Intelligence,},
    year = {2005}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

Abstract The KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, domain-independent, and scalable manner. The paper presents an overview of KNOW-ITALL's novel architecture and design principles, emphasizing its distinctive ability to extract information without any hand-labeled training examples. In its first major run, KNOW-ITALL extracted over 50,000 class instances, but suggested a challenge: How can we improve KNOWITALL's recall and extraction rate without sacrificing precision? This paper presents three distinct ways to address this challenge and evaluates their performance. Pattern Learning learns domain-specific extraction rules, which enable additional extractions. Subclass Extraction automatically identifies sub-classes in order to boost recall (e.g., "chemist" and "biologist" are identified as sub-classes of "scientist"). List Extraction locates lists of class instances, learns a "wrapper" for each list, and extracts elements of each list. Since each method bootstraps from KNOWITALL's domain-independent methods, the methods also obviate hand-labeled training examples. The paper reports on experiments, focused on building lists of named entities, that measure the relative efficacy of each method and demonstrate their synergy. In concert, our methods gave KNOWITALL a 4-fold to 8-fold increase in recall at precision of 0.90, and discovered over 10,000 cities missing from the Tipster Gazetteer.

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University