Results 1 - 10
of
217
SELECTION AND INFORMATION: A CLASS-BASED APPROACH TO LEXICAL RELATIONSHIPS
, 1993
"... Selectional constraints are limitations on the applicability of predicates to arguments. For example, the statement “The number two is blue” may be syntactically well formed, but at some level it is anomalous — BLUE is not a predicate that can be applied to numbers. According to the influential theo ..."
Abstract
-
Cited by 209 (8 self)
- Add to MetaCart
Selectional constraints are limitations on the applicability of predicates to arguments. For example, the statement “The number two is blue” may be syntactically well formed, but at some level it is anomalous — BLUE is not a predicate that can be applied to numbers. According to the influential theory of (Katz and Fodor, 1964), a predicate associates a set of defining features with each argument, expressed within a restricted semantic vocabulary. Despite the persistence of this theory, however, there is widespread agreement about its empirical shortcomings (McCawley, 1968; Fodor, 1977). As an alternative, some critics of the Katz-Fodor theory (e.g. (Johnson-Laird, 1983)) have abandoned the treatment of selectional constraints as semantic, instead treating them as indistinguishable from inferences made on the basis of factual knowledge. This provides a better match for the empirical phenomena, but it opens up a different problem: if selectional constraints are the same as inferences in general, then accounting for them will require a much more complete understanding of knowledge representation and inference than we have at present. The problem, then, is this: how can a theory of selectional constraints be elaborated without first having either an empirically adequate theory of defining features or a comprehensive theory of inference? In this dissertation, I suggest that an answer to this question lies in the representation of conceptual
Word Sense Disambiguation Using Conceptual Density
- IN PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS
, 1996
"... This paper presents a method for the resolution of lexical ambiguity of nouns and its automatic evaluation over the Brown Corpus. The method relies on the use of the wide-coverage noun taxonomy of WordNet and the notion of conceptual distance among concepts, captured by a Conceptual Density formula ..."
Abstract
-
Cited by 138 (13 self)
- Add to MetaCart
This paper presents a method for the resolution of lexical ambiguity of nouns and its automatic evaluation over the Brown Corpus. The method relies on the use of the wide-coverage noun taxonomy of WordNet and the notion of conceptual distance among concepts, captured by a Conceptual Density formula developed for this purpose. This fully automatic method requires no hand coding of lexical entries, hand tagging of text nor any kind of training process. The results of the experiments have been automatically evaluated against SeroCot, the sense-tagged version of the Brown Corpus.
Automated Text Summarization in SUMMARIST
, 1999
"... SUMMARIST is an attempt to create a robust automated text summarization system, based on the equation: summarization = topic identification interpretation generation. Each of these stages contains several independent modules, many of them trained on large corpora of text. We describe the systems ..."
Abstract
-
Cited by 112 (10 self)
- Add to MetaCart
SUMMARIST is an attempt to create a robust automated text summarization system, based on the equation: summarization = topic identification interpretation generation. Each of these stages contains several independent modules, many of them trained on large corpora of text. We describe the systems architecture and provide details of some of its modules.
A Data-Driven Methodology for Motivating a Set of Coherence Relations
, 1996
"... The notion that a text is coherent in virtue of the `relations' that hold between its component spans currently forms the basis for an active research programme in discourse linguistics. Coherence relations feature prominently in many theories of discourse structure, and have recently been used with ..."
Abstract
-
Cited by 110 (16 self)
- Add to MetaCart
The notion that a text is coherent in virtue of the `relations' that hold between its component spans currently forms the basis for an active research programme in discourse linguistics. Coherence relations feature prominently in many theories of discourse structure, and have recently been used with considerable success in text generation systems. However, while the concept of coherence relations is now common currency for discourse theorists, there remains much confusion about them, and no standard set of relations has yet emerged. The aim of this thesis is to contribute towards the development of a standard set of relations. We begin from an explicitly empirical conception of relations: they are taken to model a collection of psychological mechanisms operative during the tasks of reading and writing. This conception is fleshed out with reference to psychological theories of skilled task performance, and to Rosch's notion of the basic level of categorisation. A methodology for investi...
Adding Semantic Annotation to the Penn TreeBank
- In Proceedings of the Human Language Technology Conference
, 2002
"... This paper presents our basic approach to creating Proposition Bank, which involves adding a layer of semantic annotation to the Penn English TreeBank. Without attempting to confirm or disconfirm any particular semantic theory, our goal is to provide consistent argument labeling that will facilitate ..."
Abstract
-
Cited by 88 (1 self)
- Add to MetaCart
This paper presents our basic approach to creating Proposition Bank, which involves adding a layer of semantic annotation to the Penn English TreeBank. Without attempting to confirm or disconfirm any particular semantic theory, our goal is to provide consistent argument labeling that will facilitate the automatic extraction of relational data. An argument such as the window in John broke the window and in The window broke would receive the same label in both sentences. In order to ensure reliable human annotation, we provide our annotators with explicit guidelines for labeling all of the syntactic and semantic frames of each particular verb. We give several examples of these guidelines and discuss the inter-annotator agreement figures. We also discuss our current experiments on the automatic expansion of our verb guidelines based on verb class membership. Our current rate of progress and our consistency of annotation demonstrate the feasibility of the task.
MURAX: A Robust Linguistic Approach For Question Answering Using An On-Line Encyclopedia
, 1993
"... Robust linguistic methods are applied to the task of answering closed-class questions using a corpus of natural language. The methods are illustrated in a broad do- main: answering general-knowledge questions using an on-line encyclopedia. ..."
Abstract
-
Cited by 76 (1 self)
- Add to MetaCart
Robust linguistic methods are applied to the task of answering closed-class questions using a corpus of natural language. The methods are illustrated in a broad do- main: answering general-knowledge questions using an on-line encyclopedia.
Latent Semantic Kernels
"... Kernel methods like Support Vector Machines have successfully been used for text categorization. A standard choice of kernel function has been the inner product between the vector-space representationoftwo documents, in analogy with classical information retrieval (IR) approaches. Latent Semantic In ..."
Abstract
-
Cited by 74 (7 self)
- Add to MetaCart
Kernel methods like Support Vector Machines have successfully been used for text categorization. A standard choice of kernel function has been the inner product between the vector-space representationoftwo documents, in analogy with classical information retrieval (IR) approaches. Latent Semantic Indexing (LSI) has been successfully used for IR purposes as a technique for capturing semantic relations between terms and inserting them into the similarity measure between two documents. One of its main drawbacks, in IR, is its computational cost. In this paper we describe how the LSI approach can be implementedinakernel-de ned feature space. We provide experimental results demonstrating that the approach can significantly improve performance, and that it does not impair it.
Investigating regular sense extensions based on intersective Levin classes
- In Proceedings of COLING-ACL98
, 1998
"... classes ..."
Using a semantic concordance for sense identification
- In Proc. of ARPA Human Language Technology Workshop
, 1994
"... This paper proposes benchmarks for systems of automatic sense identification. A textual corpus in which open-class words had been tagged both syntactically and semantically was used to explore three statistical strategies for sense identification: a guessing heuristic, a most-frequent heuristic, and ..."
Abstract
-
Cited by 66 (1 self)
- Add to MetaCart
This paper proposes benchmarks for systems of automatic sense identification. A textual corpus in which open-class words had been tagged both syntactically and semantically was used to explore three statistical strategies for sense identification: a guessing heuristic, a most-frequent heuristic, and a co-occurrence heuristic. When no information about sense-frequencies was available, the guessing heuristic using the numbers of alternative senses in WordNet was correct 45 % of the time. When statistics for sensefrequancies were derived from a semantic concordance, the assumption that each word is used in its most frequently occurring sense was correct 69 % of the time; when that figure was calculated for polysemous words alone, it dropped to 58%. And when a cooccur~nce heuristic took advantage of prior occurrences of words together in the same sentences, little improvement was observed. The semantic concordance is still too small to estimate the potential limits of a co-occurrence heuristic. 1.
Learning Information Extraction Patterns From Examples
- Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing
, 1995
"... A growing population of users want to extract a growing variety of information from on-line texts. Unfortunately, current information extraction systems typically require experts to hand-build dictionaries of extraction patterns for each new type of information to be extracted. This paper presents a ..."
Abstract
-
Cited by 63 (2 self)
- Add to MetaCart
A growing population of users want to extract a growing variety of information from on-line texts. Unfortunately, current information extraction systems typically require experts to hand-build dictionaries of extraction patterns for each new type of information to be extracted. This paper presents a system that can learn dictionaries of extraction patterns directly from user-provided examples of texts and events to be extracted from them. The system, called LIEP, learns patterns that recognize relationships between key constituents based on local syntax. Sets of patterns learned by LIEP for a sample extraction task perform nearly at the level of a hand-built dictionary of patterns. 1 Introduction Although significant progress has been made on information extraction systems in recent years (for instance through the MUC conferences [MUC, 1992; MUC, 1993]), coding the knowledge these systems need to extract new kinds of information and events is an arduous and time-consuming process [Ril...

