Abstract:
this paper, we describe experiments with AutoSlog in two additional domains: joint ventures and microelectronics. We compare the performance of AutoSlog across the three domains, discuss the lessons learned about the generality of this approach, and present results from two experiments which demonstrate that novice users can generate effective dictionaries using AutoSlog. 1 Introduction Portability is a crucial concern for researchers in knowledge-based natural language processing (NLP). Knowledge-based NLP systems typically rely on a conceptual dictionary that has been manually encoded for a specific domain. Although knowledge-based systems have performed well on certain tasks (e.g., [2,4,5,11,16,23]), these systems will not be practical for real world applications until the knowledge that they need can be acquired automatically. Preprint submitted to Elsevier Preprint 21 March We have developed a system called AutoSlog that generates conceptual dictionaries for information extraction automatically. Information extraction (IE) is essentially a form of text skimming, in which specific types of information are extracted from text. There has been a lot of work recently on information extraction in conjunction with the recent message understanding conferences [26--28]. Most information extraction systems rely on a manually encoded dictionary of extraction patterns (e.g., see [12,15,1]). Using AutoSlog, the UMass/MUC-4 system was the first system that could acquire domainspecific extraction patterns automatically [17,18]. In previous work, we showed that AutoSlog could create effective extraction patterns for the domain of terrorism [30]. A dictionary generated by AutoSlog for the terrorism domain achieved 98% of the performance of a handcrafted dictionary that required a...
Citations
|
2526
|
Induction of decision trees
– Quinlan
- 1986
|
|
1196
|
Building a large annotated corpus of English: the penn treebank
– Marcus, Marcinkiewicz, et al.
- 1993
|
|
523
|
Knowledge Acquisition via Incremental Concept Formation
– Fisher
- 1987
|
|
453
|
Explanation-based generalization: A unified view
– Mitchell, Keller, et al.
- 1986
|
|
322
|
Explanation-based learning: An alternative view
– DeJong, Mooney
- 1986
|
|
151
|
Automatically Constructing a Dictionary for Information Extraction Tasks
– Riloff
- 1993
|
|
144
|
Frequency Analysis of English Usage
– Francis
- 1982
|
|
111
|
Coping with ambiguity and unknown words through probabilistic models
– Weischedel, Meteer, et al.
- 1993
|
|
100
|
Information extraction as a basis for highprecision text classification
– Riloff, Lehnert
- 1994
|
|
65
|
Id5: an incremental id3
– Utgoff
- 1988
|
|
63
|
An overview of the FRUMP system
– DeJong
- 1982
|
|
60
|
FOULUP: a program that figures out meanings of words from context
– Granger
- 1977
|
|
57
|
Construe/tis: a system for content-based indexing of a database of news stories
– Hayes, Weinstein
- 1990
|
|
56
|
Script Application: Computer Understanding of Newspaper Stories" Res.Report #116
– Cullingford
- 1978
|
|
47
|
Symbolic/Subsymbolic Sentence Analysis: Exploiting the Best of Two Worlds
– Lehnert
- 1990
|
|
37
|
Retrieval performance in FERRET: a conceptual information retrieval system
– Mauldin
- 1991
|
|
35
|
Acquiring Lexical Knowledge from Text: A Case Study
– Jacobs, Zernik
- 1988
|
|
34
|
Subjective understanding: Computer models of belief systems
– Carbonell
- 1979
|
|
34
|
University of massachusetts: Description of the CIRCUS system as used for MUC-4
– Lchncrt, Cardie, et al.
- 1992
|
|
31
|
Automatically deriving structured knowledge bases from on-line dictionaries
– Dolan, Vanderwende, et al.
- 1993
|
|
26
|
Vanderwende: Structural patterns vs. string patterns for extracting semantic information from dictionaries
– Montemagni, L
- 1992
|
|
24
|
Acquisition of Semantic Patterns for Information Extraction from Corpora
– Kim, Moldovan
- 1993
|
|
23
|
Umass/hughes: Description of the circus system used for muc-5
– Lehnert, McCarthy, et al.
- 1993
|
|
23
|
Automatically constructing a dictionary for information extraction tasks
– Rilo
- 1993
|
|
20
|
Towards a Self-Extending Parser
– Carbonell
- 1979
|
|
16
|
University of Massachusetts: MUC-4 Test Results and Analysis
– Lehnert, Cardie, et al.
- 1992
|
|
8
|
SRI International: Description of the FASTUS System Used for MUC-4
– Hobbs, Appelt, et al.
- 1992
|
|
8
|
Information extraction as a basis for high-precision text classi cation
– Rilo, Lehnert
- 1994
|
|
6
|
Information Extraction as a Basis for Portable Text Classification Systems
– Riloff
- 1994
|
|
6
|
ID5: An Incremental ID3
– Utgo
- 1988
|
|
5
|
University of Massachusetts: MUC-3 Test Results and Analysis
– Lehnert, Williams, et al.
- 1991
|
|
5
|
Automatically Acquiring Conceptual Patterns Without an Annotated Corpus
– Riloff, Shoen
- 1995
|
|
4
|
GE NLTOOLSET: Description of the System as Used for MUC-4
– Krupka, Jacobs, et al.
- 1992
|
|
4
|
UMass/Hughes: Description of the CIRCUS System as Used for MUC-5
– Cardie, Peterson, et al.
- 1993
|
|
3
|
A Dictionary Construction Experiment with Domain Experts
– Riloff, Lehnert
- 1993
|
|
1
|
BBN PLUM: Description of the PLUM System as Used for MUC-4
– Ayuso, Boisen, et al.
- 1992
|
|
1
|
University ofMassachusetts: MUC-4 Test Results and Analysis
– Lehnert, Cardie, et al.
- 1992
|
|
1
|
Information Extraction as a Basis for Portable Text Classi cation Systems
– Rilo
- 1994
|