• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Refining Information Extraction Rules using Data Provenance

Cached

  • Download as a PDF

Download Links

  • [www.cse.ucsc.edu]
  • [cs.ucsc.edu]
  • [www.cs.ucsc.edu]
  • [www.eecs.umich.edu]
  • [sites.computer.org]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Bin Liu
Citations:2 - 0 self
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Liu_refininginformation,
    author = {Bin Liu},
    title = {Refining Information Extraction Rules using Data Provenance},
    year = {}
}

Bookmark

citeulike Connotea Bibsonomy Del.icio.us Digg Reddit

OpenURL

 

Abstract

Developing high-quality information extraction (IE) rules, or extractors, is an iterative and primarily manual process, extremely time consuming, and error prone. In each iteration, the outputs of the extractor are examined, and the erroneous ones are used to drive the refinement of the extractor in the next iteration. Data provenance explains the origins of an output data, and how it has been transformed through a query. As such, one can expect data provenance to be valuable in understanding and debugging complex IE rules. In this paper we discuss how data provenance can be used beyond understanding and debugging, to automatically refine IE rules. In particular, we overview the main ideas behind a recent provenance-based solution for suggesting a ranked list of refinements to an extractor aimed at increasing its precision, and outline several related directions for future research. 1

Citations

1548 F: Conditional Random Fields: Probabilistic models for segmenting and labeling sequence data - Lafferty, McCallum, et al. - 2001
381 GATE: A framework and graphical development environment for robust NLP tools and applications - Cunningham, Maynard, et al. - 2002
99 Provenance semirings - Green, Karvounarakis, et al. - 2007
93 Active Learning for Natural Language Parsing and Information Extraction - Thompson, Califf, et al. - 1999
76 Multi-strategy learning for information extraction - Freitag - 1998
36 Declarative information extraction using datalog with embedded extraction predicates - Shen, Doan, et al. - 2007
28 An algebraic approach to rule-based information extraction - Reiss, Raghavan, et al. - 2008
27 Provenance in databases: Past, current, and future - Tan
24 Information extraction - Sarawagi - 2008
21 Managing information extraction: state of the art and research directions (tutorial - Doan, Ramakrishnan, et al.
19 Why not - Chapman, Jagadish - 2009
19 On the provenance of non-answers to queries over extracted data - Huang, Chen, et al.
18 JAPE: a Java Annotation Patterns Engine - Cunningham, Maynard, et al. - 2000
17 Towards a semantic extraction of Named Entities - Maynard, Bontcheva, et al. - 2003
17 Tjong Kim Sang and F. De Meulder. Introduction to the conll-2003 shared task: Languageindependent named entity recognition - F - 2003
12 SystemT: An algebraic approach to declarative information extraction - Chiticariu, Krishnamurthy, et al. - 2010
12 SystemT: a system for declarative information extraction - Krishnamurthy, Li, et al. - 2009
8 Regular expression learning for information extraction - Li, Krishnamurthy, et al. - 2008
6 Explaining missing answers to SPJUA queries - Herschel, Hernández
6 Automatic rule refinement for information extraction - Liu, Chiticariu, et al. - 2010
4 Information extraction challenges in managing unstructured data - Doan, Naughton, et al.
3 Domain Adaptation of Rule-Based Annotators for Named-Entity Recognition Tasks - Chiticariu, Krishnamurthy, et al. - 2010
3 Enterprise Information Extraction: Recent Developments and Open Challenges - Chiticariu, Li, et al. - 2010
2 I4E: Interactive Investigation of Iterative Information Extraction - Sarma, Jain, et al. - 2010
1 Information Extraction and Integration: An Overview. KDD(Tutorial - Cohen, McCallum - 2003
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University