Results 1 - 10
of
21
The Generative Lexicon
- Computational Linguistics
, 1991
"... this paper, I will discuss four major topics relating to current research in lexical semantics: methodology, descriptive coverage, adequacy of the representation, and the computational usefulness of representations. In addressing these issues, I will discuss what I think are some of the central prob ..."
Abstract
-
Cited by 727 (23 self)
- Add to MetaCart
this paper, I will discuss four major topics relating to current research in lexical semantics: methodology, descriptive coverage, adequacy of the representation, and the computational usefulness of representations. In addressing these issues, I will discuss what I think are some of the central problems facing the lexical semantics community, and suggest ways of best approaching these issues. Then, I will provide a method for the decomposition of lexical categories and outline a theory of lexical semantics embodying a notion of cocompositionality and type coercion, as well as several levels of semantic description, where the semantic load is spread more evenly throughout the lexicon. I argue that lexical decomposition is possible if it is performed generatively. Rather than assuming a fixed set of primitives, I will assume a fixed number of generative devices that can be seen as constructing semantic expressions. I develop a theory of Qualia Structure, a representation language for lexical items, which renders much lexical ambiguity in the lexicon unnecessary, while still explaining the systematic polysemy that words carry. Finally, I discuss how individual lexical structures can be integrated into the larger lexical knowledge base through a theory of lexical inheritance. This provides us with the necessary principles of global organization for the lexicon, enabling us to fully integrate our natural language lexicon into a conceptual whole
Lexical Ambiguity and Information Retrieval
- ACM TRANSACTIONS ON INFORMATION SYSTEMS
, 2000
"... Lexical ambiguity is a pervasive problem in natural language processing. However, little quantitative information is available about the extent of the problem, or about the impact that it has on information retrieval systems. We report on an analysis of lexical ambiguity in information retrieval ..."
Abstract
-
Cited by 113 (3 self)
- Add to MetaCart
Lexical ambiguity is a pervasive problem in natural language processing. However, little quantitative information is available about the extent of the problem, or about the impact that it has on information retrieval systems. We report on an analysis of lexical ambiguity in information retrieval test collections, and on experiments to determine the utility of word meanings for separating relevant from non-relevant documents. The experiments show that there is considerable ambiguity even in a specialized database. Word senses
Large-scale dictionary construction for foreign language tutoring and interlingual machine translation
- MACHINE TRANSLATION
, 1997
"... This paper describes techniques for automatic construction of dictionaries for use in large-scale foreign language tutoring (FLT) and interlingual machine translation (MT) systems. The dictionaries are based on a language-independent representation called lexical conceptual structure (LCS). A primar ..."
Abstract
-
Cited by 71 (9 self)
- Add to MetaCart
This paper describes techniques for automatic construction of dictionaries for use in large-scale foreign language tutoring (FLT) and interlingual machine translation (MT) systems. The dictionaries are based on a language-independent representation called lexical conceptual structure (LCS). A primary goal of the LCS research is to demonstrate that synonymous verb senses share distributional patterns. In this paper, we show how the syntax-semantics relation can be used to develop a lexical acquisition approach that contributes both toward the enrichment of existing online resources and toward the development of lexicons containing more complete information than is provided in any of these resources alone. We start by describing the structure of the LCS and showing how this representation is used in FLT and MT. We then focus on the problem of building LCS dictionaries for large-scale FLT and MT. First, we describe authoring tools for manual and semi-automatic construction of LCS dictionaries; we then present a more sophisticated approach that uses linguistic techniques for building word defmitions automatically. These techniques have been implemented as part of a set of lexicon-development tools used in the MILT FLT project (Dorr et al., 1995; Sams, 1995; Weinberg et al., 1995) and in the PRINCITRAN MT project (Dorr et al., 1995b).
Similarity between words computed by spreading activation on an English dictionary
- Proceedings of the European Association for Computational Linguistics
, 1993
"... This paper proposes a method for measuring semantic similarity between words as a new tool for text analysis. The similarity is measured on a semantic network constructed systematically from a subset of the English dictionary, LDOCE (Longman Dictionary of Contemporary English). Spreading activation ..."
Abstract
-
Cited by 42 (5 self)
- Add to MetaCart
This paper proposes a method for measuring semantic similarity between words as a new tool for text analysis. The similarity is measured on a semantic network constructed systematically from a subset of the English dictionary, LDOCE (Longman Dictionary of Contemporary English). Spreading activation on the network can directly compute the similarity between any two words in the Longman Defining Vocabulary, and indirectly the similarity of all the other words in LDOCE. The similarity represents the strength of lexical cohesion or semantic relation, and also provides valuable information about similarity and coherence of texts. 1
Evaluation Techniques for Automatic Semantic Extraction: Comparing Syntactic and Window Based Approaches
, 1993
"... As large on-line corpora become more prewlent, a number of attempts have been made to automatically extract thesaurus-like relations directly from text using knowledge poor methods. In the absence of any specific application, comparing the results of these attempts is difficult. Here we propose an e ..."
Abstract
-
Cited by 42 (0 self)
- Add to MetaCart
As large on-line corpora become more prewlent, a number of attempts have been made to automatically extract thesaurus-like relations directly from text using knowledge poor methods. In the absence of any specific application, comparing the results of these attempts is difficult. Here we propose an ewluation method using gold standards, i.e., pre-existing hand-compiled resources, as a means of comparing extraction techniques. Using this ewluation method, we compare two semantic extraction techniques which produce similar word lists, one using syntactic context of words , and the'other using windows of heuristiclly tagged words. The two techniques are very similar except that in one case selective natural language processing, a partial syntactic analysis, is performed. On a 4 megabyte corpus, syntactic contexts produce significantly better results against the gold standards for the most characteristic words in the corpus, while windows produce better results for rare words.
Subject-Dependent Co-Occurrence And Word Sense Disambiguation
- In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics
, 1991
"... this paper, we describe a method for obtaining subjectdependent associated word sets, or "neighborhoods " of a given word, relative to a particular (subject) domain. Using the subject classifications of Longman's Dictionary of Contemporary English (LDOCE), we have established subject-dependent co-oc ..."
Abstract
-
Cited by 41 (1 self)
- Add to MetaCart
this paper, we describe a method for obtaining subjectdependent associated word sets, or "neighborhoods " of a given word, relative to a particular (subject) domain. Using the subject classifications of Longman's Dictionary of Contemporary English (LDOCE), we have established subject-dependent co-occurrence links between words of the defining vocabulary to construct these neighborhoods. We will describe the application of these neigh- borhoods to information retrieval, and present a method of word sense disambigua- tion based on these co-occurrences, an extension of previous work
Acquisition of Semantic Lexicons: Using Word Sense Disambiguation to Improve Precision
, 2000
"... lexicons from machine-readable resources. We describe semantic filters designed to reduce the number of incorrect assignments (i.e., improve precision) made by a purely syntactic technique. We demonstrate that it is possible to use these filters to build broad-coverage lexicons with minimal effort, ..."
Abstract
-
Cited by 21 (7 self)
- Add to MetaCart
lexicons from machine-readable resources. We describe semantic filters designed to reduce the number of incorrect assignments (i.e., improve precision) made by a purely syntactic technique. We demonstrate that it is possible to use these filters to build broad-coverage lexicons with minimal effort, at a depth of knowledge that lies at the syntax-semantics interface. We report on our results of disambiguating the verbs in the semantic filters by adding WordNet sense annotations. We then show the results of our classification on unknown words and we evaluate these results.
Software Architecture for Language Engineering
, 2000
"... This thesis defines the boundaries of Software Architecture for Language Engineering (SALE), an area formed by the intersection of human language computation and software engineering. SALE covers all areas of the provision of infrastructural systems to support research and development of language pr ..."
Abstract
-
Cited by 21 (7 self)
- Add to MetaCart
This thesis defines the boundaries of Software Architecture for Language Engineering (SALE), an area formed by the intersection of human language computation and software engineering. SALE covers all areas of the provision of infrastructural systems to support research and development of language processing software. In order to demonstrate the theory developed in relation to SALE, we present the design, implementation and evaluation of GATE, a General Architecture for Text Engineering, which illustrates in practice many of the theoretical points made.
Acquisition of lexical translation relations from MRDs
- MACHINE TRANSLATION
, 1995
"... In this paper we present a methodology for extracting information about lexical translation equivalences from the machine readable versions of conventional dictionaries (MRDs), and describe a series of experiments on semi-automatic construction of a linked multilingual lexical knowledge base for Eng ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
In this paper we present a methodology for extracting information about lexical translation equivalences from the machine readable versions of conventional dictionaries (MRDs), and describe a series of experiments on semi-automatic construction of a linked multilingual lexical knowledge base for English, Dutch, and Spanish. We discuss the advantages and limitations of using MRDs that this has revealed, and some strategies we have developed to cover gaps where no direct translation can be found.
Context-sensitive measurement of word distance by adaptive scaling of a semantic space
- In Proceedings of Recent Advances in Natural Language Processing (pp. 161-168). Tzigov Chark
, 1995
"... The paper proposes a computationally feasible method for measuring contextsensitive semantic distance between words. The distance is computed by adaptive scaling of a semantic space. In the semantic space, each word in the vocabulary V is represented by a multidimensional vector which is obtained fr ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
The paper proposes a computationally feasible method for measuring contextsensitive semantic distance between words. The distance is computed by adaptive scaling of a semantic space. In the semantic space, each word in the vocabulary V is represented by a multidimensional vector which is obtained from an English dictionary through a principal component analysis. Given a word set C which specifies a context for measuring word distance, each dimension of the semantic space is scaled up or down according to the distribution of C in the semantic space. In the space thus transformed, distance between words in V becomes dependent on the context C. An evaluation through a word prediction task shows that the proposed measurement successfully extracts the context of a text.

