• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

DMCA

Learning field compatibilities to extract database records from unstructured text (2006)

Cached

  • Download as a PDF

Download Links

  • [www.cs.umass.edu]
  • [people.cs.umass.edu]
  • [people.cs.umass.edu]
  • [www.cs.umass.edu]
  • [people.cs.umass.edu]
  • [people.cs.umass.edu]
  • [people.cs.umass.edu]
  • [ciir.cs.umass.edu]
  • [ciir-publications.cs.umass.edu]
  • [maroo.cs.umass.edu]
  • [ciir-publications.cs.umass.edu]
  • [maroo.cs.umass.edu]
  • [www.netl.doe.gov]
  • [www.aclweb.org]
  • [acl.ldc.upenn.edu]
  • [wing.comp.nus.edu.sg]
  • [www.aclweb.org]
  • [aclweb.org]
  • [aclweb.org]
  • [aclweb.org]
  • [newdesign.aclweb.org]
  • [www.aclweb.org]
  • [wing.comp.nus.edu.sg]
  • [www.cs.umass.edu]
  • [www2.selu.edu]
  • [www.selu.edu]
  • [cs.iit.edu]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Michael Wick , Aron Culotta , Andrew Mccallum
Venue:In EMNLP
Citations:14 - 2 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@INPROCEEDINGS{Wick06learningfield,
    author = {Michael Wick and Aron Culotta and Andrew Mccallum},
    title = {Learning field compatibilities to extract database records from unstructured text},
    booktitle = {In EMNLP},
    year = {2006},
    pages = {603--611}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

Named-entity recognition systems extract entities such as people, organizations, and locations from unstructured text. Rather than extract these mentions in isolation, this paper presents a record extraction system that assembles mentions into records (i.e. database tuples). We construct a probabilistic model of the compatibility between field values, then employ graph partitioning algorithms to cluster fields into cohesive records. We also investigate compatibility functions over sets of fields, rather than simply pairs of fields, to examine how higher representational power can impact performance. We apply our techniques to the task of extracting contact records from faculty and student homepages, demonstrating a 53 % error reduction over baseline approaches. 1

Keyphrases

unstructured text    field compatibility    database record    baseline approach    database tuples    contact record    probabilistic model    student homepage    compatibility function    field value    representational power    cohesive record    error reduction    named-entity recognition system extract entity    record extraction system   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University