Results 1 - 10
of
11
Automatically Constructing a Dictionary for Information Extraction Tasks
, 1993
"... Knowledge-based natural language processing systems have achieved good success with certain tasks but they are often criticized because they depend on a domain-specific dictionary that requires a great deal of manual knowledge engineering. This knowledge engineering bottleneck makes knowledge-b ..."
Abstract
-
Cited by 183 (17 self)
- Add to MetaCart
Knowledge-based natural language processing systems have achieved good success with certain tasks but they are often criticized because they depend on a domain-specific dictionary that requires a great deal of manual knowledge engineering. This knowledge engineering bottleneck makes knowledge-based NLP systems impractical for real-world applications because they cannot be easily scaled up or ported to new domains. In response to this problem, we developed a system called AutoSlog that automatically builds a domain-specific dictionary of concepts for extracting information from text. Using AutoSlog, we constructed a dictionary for the domain of terrorist event descriptions in only 5 person-hours. We then compared the AutoSlog dictionary with a hand-crafted dictionary that was built by two highly skilled graduate students and required approximately 1500 person-hours of effort. We evaluated the two dictionaries using two blind test sets of 100 texts each. Overall, the AutoSlog dictionary achieved 98% of the performance of the hand-crafted dictionary. On the first test set, the AutoSlog dictionary obtained 96.3% of the performance of the hand-crafted dictionary. On the second test set, the overall scores were virtually indistinguishable with the AutoSlog dictionary achieving 99.7% of the performance of the handcrafted dictionary.
Information extraction: techniques and challenges
- In Information Extraction (International Summer School SCIE-97
, 1997
"... This volume takes a broad view of information extraction as any method for ltering information from large volumes of text. This includes the retrieval of documents from collections and the tagging of particular terms in text. In this paper we shall use a narrower de nition: the identi cation of inst ..."
Abstract
-
Cited by 119 (4 self)
- Add to MetaCart
This volume takes a broad view of information extraction as any method for ltering information from large volumes of text. This includes the retrieval of documents from collections and the tagging of particular terms in text. In this paper we shall use a narrower de nition: the identi cation of instances of a particular class of events or relationships in a natural language text, and the extraction of the relevant arguments ofthe event or relationship. Information extraction therefore involves the creation of a structured representation (such asadata base) of selected information drawn from the text. The idea of reducing the information in a document toatabular structure is not new. Its feasibility for sublanguage texts was suggested by Zellig Harris in the 1950's, and an early implementation for medical texts was done at New York University by Naomi Sager[20]. However, the speci c notion of information extraction described here has received wide currency over the last decade through the series of Message Understanding Conferences [1, 2, 3, 4, 14]. We shall discuss these Conferences in more detail a bit later, and shall use simpli ed versions of
An Empirical Study of Automated Dictionary Construction for Information Extraction in Three Domains
- Artificial Intelligence
, 1996
"... this paper, we describe experiments with AutoSlog in two additional domains: joint ventures and microelectronics. We compare the performance of AutoSlog across the three domains, discuss the lessons learned about the generality of this approach, and present results from two experiments which demonst ..."
Abstract
-
Cited by 73 (14 self)
- Add to MetaCart
this paper, we describe experiments with AutoSlog in two additional domains: joint ventures and microelectronics. We compare the performance of AutoSlog across the three domains, discuss the lessons learned about the generality of this approach, and present results from two experiments which demonstrate that novice users can generate effective dictionaries using AutoSlog. 1 Introduction Portability is a crucial concern for researchers in knowledge-based natural language processing (NLP). Knowledge-based NLP systems typically rely on a conceptual dictionary that has been manually encoded for a specific domain. Although knowledge-based systems have performed well on certain tasks (e.g., [2,4,5,11,16,23]), these systems will not be practical for real world applications until the knowledge that they need can be acquired automatically. Preprint submitted to Elsevier Preprint 21 March We have developed a system called AutoSlog that generates conceptual dictionaries for information extraction automatically. Information extraction (IE) is essentially a form of text skimming, in which specific types of information are extracted from text. There has been a lot of work recently on information extraction in conjunction with the recent message understanding conferences [26--28]. Most information extraction systems rely on a manually encoded dictionary of extraction patterns (e.g., see [12,15,1]). Using AutoSlog, the UMass/MUC-4 system was the first system that could acquire domainspecific extraction patterns automatically [17,18]. In previous work, we showed that AutoSlog could create effective extraction patterns for the domain of terrorism [30]. A dictionary generated by AutoSlog for the terrorism domain achieved 98% of the performance of a handcrafted dictionary that required a...
Learning Semantic Grammars with Constructive Inductive Logic Programming
- In Proceedings of the Eleventh National Conference on Artificial Intelligence
, 1993
"... Automating the construction of semantic grammars is a difficult and interesting problem for machine learning. This paper shows how the semantic-grammar acquisition problem can be viewed as the learning of search-control heuristics in a logic program. Appropriate control rules are learned using a new ..."
Abstract
-
Cited by 63 (13 self)
- Add to MetaCart
Automating the construction of semantic grammars is a difficult and interesting problem for machine learning. This paper shows how the semantic-grammar acquisition problem can be viewed as the learning of search-control heuristics in a logic program. Appropriate control rules are learned using a new first-order induction algorithm that automatically invents useful syntactic and semantic categories. Empirical results show that the learned parsers generalize well to novel sentences and out-perform previous approaches based on connectionist techniques. Introduction Designing computer systems to "understand" natural language input is a difficult task. The laboriously hand-crafted computational grammars supporting natural language applications are often inefficient, incomplete and ambiguous. The difficulty in constructing adequate grammars is an example of the "knowledge acquisition bottleneck" which has motivated much research in machine learning. While numerous researchers have studied ...
Description of the UMass system as used for MUC-6
- IN PROCEEDINGS OF THE 6TH MESSAGE UNDERSTANDING CONFERENCE
, 1995
"... ..."
An Empirical Approach to Conceptual Case Frame Acquisition
- In Proceedings of the Sixth Workshop on Very Large Corpora
, 1998
"... Conceptual natural language processing systems usually rely on case frame instantiation to-recognize events and role objects in text. But generating a good set of case frames for a domain is timeconsuming, tedious, and prone to errors of omission. We have developed a corpus-based algorithm for acqui ..."
Abstract
-
Cited by 42 (1 self)
- Add to MetaCart
Conceptual natural language processing systems usually rely on case frame instantiation to-recognize events and role objects in text. But generating a good set of case frames for a domain is timeconsuming, tedious, and prone to errors of omission. We have developed a corpus-based algorithm for acquiring conceptual case frames empirically from unannotated text. Our algorithm builds on previous research on corpus-based methods for acquiring extraction patterns and semantic lexicons. Giv. en extraction patterns and a semantic lexicon for a domain, our algorithm learns semantic preferences for each extraction pattern and merges the syntactically compatible patterns to produce multi-slot case frames with selectional restrictions. The case frames generate more cohesive output and produce fewer false hits than the original extraction patterns. Our system requires only proclassified training texts and a few hours of manual review to filter the dictionar- ies, demonstrating that conceptual case frames can be acquired from unannotated text without special training resources.
Automatic Acquisition of Domain Knowledge for Information Extraction
- In Proceedings of the 18th International Conference on Computational Linguistics
, 2000
"... In developing an Information Extraction (IE) system for a new class of events or relations, one of the major tasks is identifying the many ways in which these events or relations may be expressed in text. This has generally involved the manual analysis and, in some cases, the annotation of large qua ..."
Abstract
-
Cited by 38 (4 self)
- Add to MetaCart
In developing an Information Extraction (IE) system for a new class of events or relations, one of the major tasks is identifying the many ways in which these events or relations may be expressed in text. This has generally involved the manual analysis and, in some cases, the annotation of large quantities of text involving these events. This paper presents an alternative proach, based on an automatic discovery procedure, ExDIsCO, which identifies a set of relevant documents and a set of event patterns from un-annotated text, starting from a small set of "seed patterns." We evaluate ExDISCO by comparing the performance of discovered patterns against that of manually constructed systems on actual extraction tasks.
Unsupervised Discovery of Scenario-Level Patterns for Information Extraction
- In Proceedings of Conference on Applied Natural Language Processing ANLP-NAACL
, 2000
"... Information Extraction (IE) systems are commonly based on pattern matching. Adapting an IE system to a new scenario entails the construction of a new pattern base -- a timeconsuming and expensive process. We have implemented a system for. finding patterns automatically from un-annotated text. Starti ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Information Extraction (IE) systems are commonly based on pattern matching. Adapting an IE system to a new scenario entails the construction of a new pattern base -- a timeconsuming and expensive process. We have implemented a system for. finding patterns automatically from un-annotated text. Starting with a small initial set of seed patterns proposed by the user, the system applies an incremental discovery procedure to identify new patterns. We present experiments with evaluations which show that the resulting patterns exhibit high precision and recall.
Automatically Constructing a Dictionary for Information Extraction Tasks
"... Knowledge-based natural language processing systems have achieved good success with certain tasks but they are often criticized because they depend on a domain-specific dictionary that requires a great deal of manual knowledge engineering. This knowledge engineering bottleneck makes knowledge-based ..."
Abstract
- Add to MetaCart
Knowledge-based natural language processing systems have achieved good success with certain tasks but they are often criticized because they depend on a domain-specific dictionary that requires a great deal of manual knowledge engineering. This knowledge engineering bottleneck makes knowledge-based NLP systems impractical for real-world applications because they cannot be easily scaled up or ported to new domains. In response to this problem, we developed a system called AutoSlog that automatically builds a domain-specific dictionary of concepts for extracting information from text. Using AutoSlog, we constructed a dictionary for the domain of terrorist event descriptions in only 5 person-hours. We then compared the AutoSlog dictionary with a hand-crafted dictionary that was built by two highly skilled graduate students and required approximately 1500 person-hours of effort. We evaluated the two dictionaries using two blind test sets of 100 texts each. Overall, the AutoSlog dictionary achieved 98 % of the performance of the hand-crafted dictionary. On the first test set, the Auto-Slog dictionary obtained 96.3 % of the performance of the hand-crafted dictionary. On the second test set, the overall scores were virtually indistinguishable with the AutoSlog dictionary achieving 99.7 % of the performance of the handcrafted dictionary.

