Results 1 - 10
of
22
Building a large-scale knowledge base for machine translation
- In Proceedings of AAAI
, 1994
"... Knowledge-based machine translation (KBMT) systems have achieved excellent results in constrained domains, but have not yet scaled up to newspaper text. The reason is that knowledge resources (lexicons, grammar rules, world models) must be painstakingly handcrafted from scratch. One of the hypothese ..."
Abstract
-
Cited by 171 (8 self)
- Add to MetaCart
Knowledge-based machine translation (KBMT) systems have achieved excellent results in constrained domains, but have not yet scaled up to newspaper text. The reason is that knowledge resources (lexicons, grammar rules, world models) must be painstakingly handcrafted from scratch. One of the hypotheses being tested in the PAN-GLOSS machine translation project is whether or not these resources can be semi-automatically acquired on a very large scale. This paper focuses on the construction of a large ontology (or knowledge base, or world model) for supporting KBMT. It contains representations for some 70,000 commonly encountered objects, processes, qualities, and relations. The ontology was constructed by merging various online dictionaries, semantic networks, and bilingual resources, through semi-automatic methods. Some of these methods (e.g., conceptual matching of semantic taxonomies) are broadly applicable to problems of importing/exporting knowledge from one KB to another. Other methods (e.g., bilingual matching) allow a knowledge engineer to build up an index to a KB in a second language, such as Spanish or Japanese.
Designing Statistical Language Learners: Experiments on Noun Compounds
, 1995
"... Statistical language learning research takes the view that many traditional natural language processing tasks can be solved by training probabilistic models of language on a sufficient volume of training data. The design of statistical language learners therefore involves answering two questions: (i ..."
Abstract
-
Cited by 65 (0 self)
- Add to MetaCart
Statistical language learning research takes the view that many traditional natural language processing tasks can be solved by training probabilistic models of language on a sufficient volume of training data. The design of statistical language learners therefore involves answering two questions: (i) Which of the multitude of possible language models will most accurately reflect the properties necessary to a given task? (ii) What will constitute a sufficient volume of training data? Regarding the first question, though a variety of successful models have been discovered, the space of possible designs remains largely unexplored. Regarding the second, exploration of the design space has so far proceeded without an adequate answer. The goal of this thesis is to advance the exploration of the statistical language learning design space. In pursuit of that goal, the thesis makes two main theoretical contributions: it identifies a new class of designs by providing a novel theory of statistical natural language processing, and it presents the foundations for a predictive theory of data requirements to assist in future design explorations. The first of these contributions is called the meaning distributions theory. This theory
The Interaction of Knowledge Sources for Word Sense Disambiguation
- Computational Linguistics
, 2001
"... Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in artificial in telligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most ..."
Abstract
-
Cited by 58 (2 self)
- Add to MetaCart
Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in artificial in telligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most useful and whether their combination leads to improved results. We present a sense tagger which uses several knowledge sources. Tested accuracy exceeds 94 % on our evaluation corpus. Our system attempts to disambiguate all content words in running text rather than limiting itself to treating a restricted vocabulary of words. It is argued that this approach is more likely to assist the creation of practical systems. 1.
WordNet 2 - A Morphologically and Semantically Enhanced Resource
- University of Maryland
, 1999
"... This paper presents an on-going project intended to enhance WordNet morphologically and semantically. The motivation for this work steams from the current limitations of WordNet when used as a linguistic knowledge base. We envision a software tool that automatically parses the conceptual defining gl ..."
Abstract
-
Cited by 55 (3 self)
- Add to MetaCart
This paper presents an on-going project intended to enhance WordNet morphologically and semantically. The motivation for this work steams from the current limitations of WordNet when used as a linguistic knowledge base. We envision a software tool that automatically parses the conceptual defining glosses, attributing part-of-speech tags and phrasal brackets. The nouns, verbs, adjectives and adverbs from every de nition are then disambiguated and linked to the corresponding synsets. This increases the connectivity between synsets allowing the retrieval of topically related concepts. Furthermore, the tool transforms the glosses, first into logical forms, and then into semantic forms. Using derivational morphology new links are added between the synsets. 1 Motivation WordNet has already been recognized as a valuable resource in the human language technology and knowledge processing communities. Its applicability has been cited in more than 200 papers and systems have been...
Automatically Deriving Structured Knowledge Bases From on-Line Dictionaries
- Simon Fraser University
, 1993
"... keywords: computational lexicography; lexical knowledge bases We describe an automated strategy which exploits on-line dictionaries to construct a richly-structured lexical knowledge base. In particular, we show how the Longman Dictionary of Contemporary English (LDOCE) can be used to build a direct ..."
Abstract
-
Cited by 34 (1 self)
- Add to MetaCart
keywords: computational lexicography; lexical knowledge bases We describe an automated strategy which exploits on-line dictionaries to construct a richly-structured lexical knowledge base. In particular, we show how the Longman Dictionary of Contemporary English (LDOCE) can be used to build a directed graph which captures semantic associations between words. The result is a huge and highly interconnected network of words linked by arcs labeled with semantic relations such as Hypernym, Part_of, Location, and Purpose. We argue that this knowledge base provides much more detailed information about word meanings than can be obtained using standard lexical lookup procedures or by relying on statistical measures of semantic associations among words. 1 We would like thank the other members of the Microsoft Natural Language group: Joseph Pentheroudakis, Karen Jensen, George Heidorn, and Diana Peterson. 1. Introduction This paper describes an automated strategy which exploits on-line dictio...
Word Sense Ambiguation: Clustering Related Senses
, 1994
"... This paper describes a heuristic approach to automatically identifying which senses of a machinereadable dictionary (MRD) headword are semantically related versus those which correspond to fundamentally different senses of the word. The inclusion of this information in a lexical database profoundly ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
This paper describes a heuristic approach to automatically identifying which senses of a machinereadable dictionary (MRD) headword are semantically related versus those which correspond to fundamentally different senses of the word. The inclusion of this information in a lexical database profoundly alters the nature of sense disambiguation: the appropriate "sense" of a polysemous word may now correspond to some set of related senses. Out' technique offers benefits both for on-line senantic processing and for the challenging task of mapping word senses across multiple MRDs in creating a merged lexical database.
Word Sense Disambiguation using Optimised Combinations of Knowledge Sources
- In: Proceedings of COLING-ACL'98
, 1998
"... Word sense disambiguation algorithms, with few exceptions, have made use of only one lexical knowledge source. We describe a system which performs word sense disambiguation on all content words in free text by combining different knowledge sources: semantic preferences, dictionary definitions and su ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
Word sense disambiguation algorithms, with few exceptions, have made use of only one lexical knowledge source. We describe a system which performs word sense disambiguation on all content words in free text by combining different knowledge sources: semantic preferences, dictionary definitions and subject /domain codes along with part-of-speech tags, optimised by means of a learning algorithm. We also describe the creation of a new sense tagged corpus by combining existing resources. Tested accuracy of our approach on this corpus exceeds 92%, demonstrating the viability of all-word disambiguation rather than restricting oneself to a small sample. 1 Introduction This paper describes a system that integrates a number of partial sources of information to perform word sense disambiguation (WSD) of content words in general text at a high level of accuracy. The methodology and evaluation of WSD are somewhat different from those of other NLP modules, and one can distinguish three aspects of ...
Sense Tagging: Semantic Tagging with a Lexicon
- IN PROCEEDINGS OF THE SIGLEX WORKSHOP
, 1997
"... Sense tagging, the automatic assignment of the appropriate sense from some lexicon to each of the words in a text, is a specialised instance of the general problem of semantic tagging by category or type. We discuss which recent word sense disambiguation algorithms are appropriate for sense ta ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
Sense tagging, the automatic assignment of the appropriate sense from some lexicon to each of the words in a text, is a specialised instance of the general problem of semantic tagging by category or type. We discuss which recent word sense disambiguation algorithms are appropriate for sense tagging. It is our belief that sense tagging can be carried out effectively by combining several simple, independent, methods and we include the design of such a tagger. A prototype of this system has been implemented, correctly tagging 86% of polysemous word tokens in a small test set, providing evidence that our hypothesis is correct.
Combining Weak Knowledge Sources for Sense Disambiguation
- In Proceedings of the International Joint Conference on Artificial Intelligence
, 1999
"... There has been a tradition of combining different knowledge sources in Artificial Intelligence research. We apply this methodology to word sense disambiguation (WSD), a long-standing problem in Computational Linguistics. We report on an implemented sense tagger which uses a machine readable dictiona ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
There has been a tradition of combining different knowledge sources in Artificial Intelligence research. We apply this methodology to word sense disambiguation (WSD), a long-standing problem in Computational Linguistics. We report on an implemented sense tagger which uses a machine readable dictionary to provide both a set of senses and associated forms of information on which to base disambiguation decisions. The system is based on an architecture which makes use of different sources of lexical knowledge in two ways and optimises their combination using a learning algorithm. Tested accuracy of our approach on a general corpus exceeds 94%, demonstrating the viability of allword disambiguation as opposed to restricting oneself to a small sample. 1 Introduction The methodology and evaluation of word sense disambiguation (WSD) as a distinct task are somewhat different from those of others in NLP, and one can distinguish three aspects of this difference, all of which come down to evaluat...

