Results 1 -
5 of
5
CRYSTAL: Inducing a Conceptual Dictionary
, 1995
"... One of the central knowledge sources of an information extraction (IE) system is a dictionary of linguistic patterns that can be used to identify references to relevant information in a text. Automatic creation of conceptual dictionaries is important for portability and scalability of an IE system. ..."
Abstract
-
Cited by 136 (11 self)
- Add to MetaCart
One of the central knowledge sources of an information extraction (IE) system is a dictionary of linguistic patterns that can be used to identify references to relevant information in a text. Automatic creation of conceptual dictionaries is important for portability and scalability of an IE system. This paper describes CRYSTAL, a system which automatically induces a dictionary of "concept-node definitions" sufficient to identify relevant information from a training corpus. Each of these concept-node definitions is generalized as far as possible without producing errors, so that a minimum number of dictionary entries cover the positive training instances. Because it tests the accuracy of each proposed definition, CRYSTAL can often surpass human intuitions in creating reliable extraction rules. 1 Information Extraction An information extraction (IE) system analyzes unrestricted natural language text and produces a representation of the information from the text whichis cons...
Information Extraction as a Basis for High-Precision Text Classification
- ACM Transactions on Information Systems
, 1994
"... this article. For the purpose of text classification, the answer keys serve only as a set of correct classifications for each text. If a text has instantiated key templates associated with it in the corpus, then it should be classified as a relevant text. If a text has no instantiated key templates ..."
Abstract
-
Cited by 102 (5 self)
- Add to MetaCart
this article. For the purpose of text classification, the answer keys serve only as a set of correct classifications for each text. If a text has instantiated key templates associated with it in the corpus, then it should be classified as a relevant text. If a text has no instantiated key templates associated with it (i.e., only a dummy template) then it should be classified as an irrelevant text. This is a binary classification problem: a text is either relevant to the terrorism domain or irrelevant. The texts were selected by keyword search from a database of newswire articles 2 because they contained words associated with terrorism. However, many of them did not mention any relevant terrorist incidents. Of the 1700 texts in the MUC4 corpus, only 53% described a relevant terrorist event. Because many of the texts in the corpus were irrelevant, the MUC-4 systems had to distinguish the relevant from the irrelevant texts. Although the MUC-4 task was information extraction, information detection 4 (i.e, text classification) was an implicit subtask. To be successful in MUC-4, the information extraction systems also had to be good at detection. Our MUC-4 system did not use a separate text classification module. Instead, we extracted information from every text and relied on a discourse analysis module to discard irrelevant templates. This strategy was very effective, 5 but it was expensive. A reliable text classification module could have filtered out irrele- 1MUC-3 was the Third Message Understanding ConferenCe held in 1991 [MUC-3 Proceedings 19911
Using Decision Trees for Coreference Resolution
- IN PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 1995
"... This paper describes RESOLVE, a s>stem that uses decision trees to learn how to classify coreferent phrases in the domain of business joint ventures An experiment is presented in which the performance of RESOLVE is compared to the performance of a manually engineered set of rules for the same task T ..."
Abstract
-
Cited by 100 (1 self)
- Add to MetaCart
This paper describes RESOLVE, a s>stem that uses decision trees to learn how to classify coreferent phrases in the domain of business joint ventures An experiment is presented in which the performance of RESOLVE is compared to the performance of a manually engineered set of rules for the same task The results show that decision trees achieve higher performance than the rules in two of three evaluation metrics developed for the coreference task In addition to achieving better performance than the rules, RESOLVE provides a framework that facilitates the exploration of the types of knowledge that are useful for solving the coreference problem
Issues in Inductive Learning of Domain-Specific Text Extraction Rules
, 1995
"... Domain-speci#c text analysis requires a dictionary of linguistic patterns that identify references to relevant information in a text. This paper describes CRYSTAL, a fully automated tool that induces such a dictionary of text extraction rules. We discuss some key issues in developing an automatic di ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Domain-speci#c text analysis requires a dictionary of linguistic patterns that identify references to relevant information in a text. This paper describes CRYSTAL, a fully automated tool that induces such a dictionary of text extraction rules. We discuss some key issues in developing an automatic dictionary induction system, using CRYSTAL as a concrete example. CRYSTAL derives text extraction rules from training instances and generalizes each rule as far as possible, testing the accuracy of each proposed rule on the training corpus. An error tolerance parameter allows CRYSTAL to manipulate a trade-o# between recall and precision. We discuss issues involved with creating training data, de#ning a domain ontology, and allowing a #exible and expressive representation while designing a search control mechanism that avoids intractability. 1 Domain-speci#c Text Analysis Considerable domain knowledge is needed by a system that analyzes unrestricted text and identi#es inform...
Integrating Case-Based Learning and Cognitive Biases for Machine Learning of Natural Language
, 1999
"... This paper shows that psychological constraints on human information processing can be used effectively to guide feature set selection for case-based learning of linguistic knowledge. Given as input a baseline case representation for a natural language learning task, our algorithm selects the rele ..."
Abstract
- Add to MetaCart
This paper shows that psychological constraints on human information processing can be used effectively to guide feature set selection for case-based learning of linguistic knowledge. Given as input a baseline case representation for a natural language learning task, our algorithm selects the relevant cognitive biases for the task and then automatically modifies the representation in response to those biases by changing, deleting, and weighting features appropriately. We apply the cognitive bias approach to feature set selection to four natural language learning problems and show that performance of the casebased learning algorithm improves significantly when relevant cognitive biases are incorporated into the baseline instance representation. We argue that the cognitive bias approach offers new possibilities for case-based learning of natural language: it simplifies the process of instance representation design and, in theory, obviates the need for separate instance represent...

