Results 1 - 10
of
13
Framework and Results for English SENSEVAL
- Special Issue on SENSEVAL. Computers and the Humanties
, 2000
"... Abstract. SENSEVAL was the first open, community-based evaluation exercise for Word Sense Disambiguation programs. It adopted the quantitative approach to evaluation developed in MUC and other ARPA evaluation exercises. It took place in 1998. In this paper we describe the structure, organisation and ..."
Abstract
-
Cited by 75 (1 self)
- Add to MetaCart
Abstract. SENSEVAL was the first open, community-based evaluation exercise for Word Sense Disambiguation programs. It adopted the quantitative approach to evaluation developed in MUC and other ARPA evaluation exercises. It took place in 1998. In this paper we describe the structure, organisation and results of the SENSEVAL exercise for English. We present and defend various design choices for the exercise, describe the data and gold-standard preparation, consider issues of scoring strategies and baselines, and present the results for the 18 participating systems. The exercise identifies the state-of-the-art for fine-grained word sense disambiguation, where training data is available, as 74–78 % correct, with a number of algorithms approaching this level of performance. For systems that did not assume the availability of training data, performance was markedly lower and also more variable. Human inter-tagger agreement was high, with the gold standard taggings being around 95 % replicable. Key words: evaluation, SENSEVAL, word sense disambiguation 1.
Subcategorization Acquisition
, 2002
"... Manual development of large subcategorised lexicons has proved difficult because predicates change behaviour between sublanguages, domains and over time. Yet access to a comprehensive subcategorization lexicon is vital for successful parsing capable of recovering predicate-argument relations, and pr ..."
Abstract
-
Cited by 64 (13 self)
- Add to MetaCart
Manual development of large subcategorised lexicons has proved difficult because predicates change behaviour between sublanguages, domains and over time. Yet access to a comprehensive subcategorization lexicon is vital for successful parsing capable of recovering predicate-argument relations, and probabilistic parsers would greatly benefit from accurate information concerning the relative likelihood of different subcategorisation frames (scfs) of a given predicate. Acquisition of subcategorization lexicons from textual corpora has recently become increasingly popular. Although this work has met with some success, resulting lexicons indicate a need for greater accuracy. One significant source of error lies in the statistical filtering used for hypothesis selection, i.e. for removing noise from automatically acquired scfs. This thesis builds on earlier work in verbal subcategorization acquisition, taking as a starting point the problem with statistical filtering. Our investigation shows that statistical filters tend to work poorly because not only is the underlying distribution zipfian, but there is also very little correlation between conditional distribution of
Lexical Semantic Techniques for Corpus Analysis
, 1993
"... this paper we outline a research program for computational linguistics, making extensive use of text corpora. We demonstrate how a semantic framework for lexical knowledge can suggest richer relationships among words in text beyond that of simple co-occurrence. The work suggests how linguistic pheno ..."
Abstract
-
Cited by 58 (6 self)
- Add to MetaCart
this paper we outline a research program for computational linguistics, making extensive use of text corpora. We demonstrate how a semantic framework for lexical knowledge can suggest richer relationships among words in text beyond that of simple co-occurrence. The work suggests how linguistic phenomena such as metonymy and polysemy might be exploitable for semantic tagging of lexical items. Unlike with purely statistical collocational analyses, the framework of a semantic theory allows the automatic construction of predictions about deeper semantic relationships among words appearing in collocational systems. We illustrate the approach for the acquisition of lexical information for several classes of nominals, and how such techniques can fine-tune the lexical structures acquired from an initial seeding of a machine-readable dictionary. In addition to conventional lexical semantic relations, we show how information concerning lexical presuppositions and preference relations can also be acquired from corpora, when analyzed with the appropriate semantic tools. Finally, we discuss the potential that corpus studies have for enriching the data set for theoretical linguistic research, as well as helping to confirm or disconfirm linguistic hypotheses
Semantic Lexicon Acquisition for Learning Natural Language Interfaces
- Department of Computer Sciences, University of Texas
, 1989
"... This paper describes a system, WOLIm (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with representations of their meaning. The lexicon learned consists of words paired with meaning representations. WOLFIE is part of an integrated system ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
This paper describes a system, WOLIm (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with representations of their meaning. The lexicon learned consists of words paired with meaning representations. WOLFIE is part of an integrated system that learns to parse novel sentences into semantic representations, such as logical database queries. Experimental results are presented demonstrating WOLFIE'S ability to learn useful lexicons for a database interface in four different natural lan- guages. The lexicons learned by WOLFIE are compared to those acquired by a comparable system developed by Siskind (1996).
Combining Corpus and Machine-Readable Dictionary Data for Building Bilingual Lexicons
, 1996
"... . This paper describes and discusses some theoretical and practical problems arising from developing a system to combine the structured but incomplete information from machine readable dictionaries (MRDs) with the unstructured but more complete information available in corpora for the creation of a ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
. This paper describes and discusses some theoretical and practical problems arising from developing a system to combine the structured but incomplete information from machine readable dictionaries (MRDs) with the unstructured but more complete information available in corpora for the creation of a bilingual lexical data base, presenting a methodology to integrate information from both sources into a single lexical data structure. The bicord system (BIlingual CORpus-enhanced Dictionaries) involves linking entries in Collins English-French and FrenchEnglish bilingual dictionary with a large English-French and French-English bilingual corpus. We have concentrated on the class of action verbs of movement, building on earlier work on lexical correspondences specific to this verb class between languages (Klavans and Tzoukermann, 1989), (Klavans and Tzoukermann, 1990a), (Klavans and Tzoukermann, 1990b). 1 We first examine the way prototypical verbs of movement are translated in the Collin...
From a Children's First Dictionary to a Lexical Knowledge Base of Conceptual Graphs
- ST. LEONARDS (NSW): MACQUARIE LIBRARY
, 1997
"... This thesis aims at building a Lexical Knowledge Base (LKB) that will be useful to a Natural Language Processing (NLP) system by extracting information from a Machine Readable Dictionary (MRD). Our source of knowledge is the American Heritage First Dictionary (AHFD) which contains 1800 entries and i ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
This thesis aims at building a Lexical Knowledge Base (LKB) that will be useful to a Natural Language Processing (NLP) system by extracting information from a Machine Readable Dictionary (MRD). Our source of knowledge is the American Heritage First Dictionary (AHFD) which contains 1800 entries and is designed for children of age six to eight learning the structure and the basic vocabulary of their language. Using a children's dictionary allows us to restrict our vocabulary, but still work on general knowledge about day to day concepts and actions. Our Lexical Knowledge Base contains information extracted from the AHFD and represented using the Conceptual Graph (CG) formalism. The graph definitions explicitly give the information contained in all the noun and verb definitions from the AHFD. Each sentence of each definition is tagged, parsed and automatically transformed into a conceptual graph. The type hierarchy, extracted automatically from the definitions, groups all the nouns a...
The Automated Building and Updating of a Knowledge Base through the Analysis of Natural Language Text
, 1991
"... This report is concerned with the development of tools needed to provide a system such as an expert system with the ability to automatically build and update its knowledge base through the analysis of technical material that is in natural language (and machine-readable) form. These tools include bot ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This report is concerned with the development of tools needed to provide a system such as an expert system with the ability to automatically build and update its knowledge base through the analysis of technical material that is in natural language (and machine-readable) form. These tools include both those that are needed to perform the natural language processing tasks that are required (the natural language component) and those that are needed to extract the relevant information from the text and appropriately store it in the knowledge base (the knowledge representation and acquisition component). The text that is being used as a testbed for this project is the Merck Veterinary Manual
The Semantics and Pragmatics of Polysemy: A Relevance-Theoretic Account
"... January 2011I, Ingrid Lossius Falkum, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
January 2011I, Ingrid Lossius Falkum, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis.
Business Models for Dictionaries and NLP
"... NLP needs dictionaries, and dictionary-makers can use NLP to make better dictionaries, so there is great potential for synergy between the two activities. To date, there has been only very limited collaboration. The two reasons for this are (a) dictionary publishers' concerns regarding intellectu ..."
Abstract
- Add to MetaCart
NLP needs dictionaries, and dictionary-makers can use NLP to make better dictionaries, so there is great potential for synergy between the two activities. To date, there has been only very limited collaboration. The two reasons for this are (a) dictionary publishers' concerns regarding intellectual property, and (b) the different languages that lexicographers and NLP researchers speak. In this paper I present a model for overcoming the first and suggest some strategies for the second. 1 Introduction NLP needs dictionaries, and dictionary-makers can use NLP to make better dictionaries, so there is great potential for synergy between the two activities. 1 There is ample motivation for NLP to court dictionary publishers, and vice versa. To date, NLP research has used dictionaries and dictionaries have used NLP, but the two processes have not been brought together. The NLP that has gone into making dictionaries has not been the NLP that was based on an earlier version of the sam...
Accumulation of Lexical Sets: Acquisition of Dictionary Resources
"... This paper presents our work on accumulation of lexical sets which includes acquisition of dictionary resources and production of new lexical sets from this. The method for the acquisition, using a context-free syntax-directed translator and text modification techniques, proves easy-to-use, flexible ..."
Abstract
- Add to MetaCart
This paper presents our work on accumulation of lexical sets which includes acquisition of dictionary resources and production of new lexical sets from this. The method for the acquisition, using a context-free syntax-directed translator and text modification techniques, proves easy-to-use, flexible, and efficient. Categories of production are analyzed, and basic operations are proposed which make up a formalism for specifying and doing production. About 1.7 million lexical units were acquired and produced from dictionaries of various pes and complexities. The paper also proposes a combinatorial and dynamic organization for lexical systems, which is based on the notion of virtual accumulation and the abstraction levels of lexical sets.

