Results 1 -
3 of
3
Using Bilingual Materials to Develop Word Sense Disambiguation Methods
, 1992
"... Word sense disambiguation has been recognized as a major problem in natural language processing research for over forty years. Much of this work has been stymied by difficulties in acquiring appropriate lexical resources, such as semantic networks and annotated corpora. Following the suggestion in B ..."
Abstract
-
Cited by 69 (2 self)
- Add to MetaCart
Word sense disambiguation has been recognized as a major problem in natural language processing research for over forty years. Much of this work has been stymied by difficulties in acquiring appropriate lexical resources, such as semantic networks and annotated corpora. Following the suggestion in Brown et al. (1991a) and Dagan et al. (1991), we have achieved considerable progress recently by taking advantage of a new source of testing and training materials. Rather than depending on small amounts of hand-labeled text, we have been making use of relatively large amounts of parallel text, text such as the Canadian Hansards (parliamentary debates), which are available in two (or more) languages. The translation can often be used in lieu of hand-labeling. For example, consider the polysemous word sentence, which has two major senses: (1) a judicial sentence, and (2), a syntactic sentence. We can collect a number of sense (1) examples by extracting instances that are translated as peine, and we can collect a number of sense (2) examples by extracting instances that are translated as phrase. In this way, we have been able to acquire a considerable amount of testing and training material for developing and testing our disambiguation algorithms. The availability of this testing and training material has enabled us to develop quantitative disambiguation methods that achieve 90 % accuracy in discriminating between two very distinct senses of a noun such as
TRANSLATION AND ERROR ANALYSES- A CORPUS FOR A LEARNING TASK AND A FIELD OF LINGUISTIC LEVEL ANALYSIS.
"... Very often labels are assigned to linguistic areas of study to delimit special interactive aspects of the behaviour of any concrete manifestation of language1. Scholars do normally realise that this can be done following two main opposite lines: going down to the nitty-gritty or going up to a wider ..."
Abstract
- Add to MetaCart
Very often labels are assigned to linguistic areas of study to delimit special interactive aspects of the behaviour of any concrete manifestation of language1. Scholars do normally realise that this can be done following two main opposite lines: going down to the nitty-gritty or going up to a wider scope. Of these the
Designing an English-Spanish Dictionary of Word Combinations Including Usage Examples from Corpora
"... As maintained by Bussman, a ‘corpus ’ may be characterised as “a finite set of concrete linguistic utterances that serves as an empirical basis for linguistic research ” (1996: 106). A language corpus is “a collection of linguistic data, either written texts or a transcription of recorded speech, wh ..."
Abstract
- Add to MetaCart
As maintained by Bussman, a ‘corpus ’ may be characterised as “a finite set of concrete linguistic utterances that serves as an empirical basis for linguistic research ” (1996: 106). A language corpus is “a collection of linguistic data, either written texts or a transcription of recorded speech, which can be used as a starting-point of linguistic

