Abstract:
The objective of this work is to learn information extraction rules by applying Inductive Logic Programming (ILP) techniques to natural language data. The approach is ontology-based, which means that the extraction rules conclude with specific ontology relations that characterise the meaning of sentences in the text. An existing ILP system, FOIL, is used to learn attribute-value relations. This enables instances of these relations to be identified in the text. In specific, we explore the linguistic preprocessing of the data, the use of background knowledge in the learning process, and the practical considerations of applying a supervised learning approach to rule induction, i.e. in terms of the human effort in creating the data set, and in the inherent biases in the use of small data sets.
Citations
|
161
|
A Simple Rule-Based Part-Of-Speech Tagger
– Brill
- 1996
|
|
156
|
M.: FOIL: A Midterm Report
– Quinlan, Cameron-Jones
- 1993
|
|
126
|
Kumlien J: Constructing biological knowledge bases by extracting information from text sources
– Craven
|
|
60
|
Learning to Classify English Text with ILP Methods
– Cohen
- 1995
|
|
45
|
Building a Chemical Ontology using methontology and the Ontology Design Environment
– Fernández-López, Gómez-Pérez, et al.
- 1999
|
|
43
|
Relational learning with statistical predicate invention: Better models for hypertext
– Craven, Slattery
|
|
15
|
Ontological foundations for biology knowledge models
– Hafner, Fridman
- 1996
|
|
14
|
Robust applied morphological generation
– Minnen, Carroll, et al.
- 2000
|
|
13
|
Inductive logic programming: issues, results and the LLL challenge
– Muggleton
- 1999
|
|
10
|
Learning for semantic interpretation: Scaling up without dumbing down
– Mooney
- 1999
|
|
1
|
Carbon cycle: Fertile forest experiments
– Davidson, Hirsch
- 2001
|
|
1
|
Information extraction from HTML: Application of a general machine learning approach
– Frietag
- 1998
|
|
1
|
EcoCyc: The resource and lessons
– Karp, Riley
- 1999
|