Results 1 - 10
of
19
Introduction to the special issue on word sense disambiguation
- Computational Linguistics J
, 1998
"... ..."
Decision Lists For Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French
, 1994
"... This paper presents a statistical decision procedure for lexical ambiguity resolution. The algorithm exploits both local syntactic patterns and more distant collocational evidence, generating an efficient, effective, and highly perspicuous recipe for resolving a given ambiguity. By identifying and u ..."
Abstract
-
Cited by 126 (3 self)
- Add to MetaCart
This paper presents a statistical decision procedure for lexical ambiguity resolution. The algorithm exploits both local syntactic patterns and more distant collocational evidence, generating an efficient, effective, and highly perspicuous recipe for resolving a given ambiguity. By identifying and utilizing only the single best disambiguating evidence in a target context, the algorithm avoids the problematic complex modeling of statistical dependencies. Although directly applicable to a wide class of ambiguities, the algorithm is described and evaluated in a realistic case study, the problem of restoring missing accents in Spanish and French text. Current accuracy exceeds 99% on the full task, and typically is over 90% for even the most difficult ambiguities.
Word sense disambiguation: The state of the art
- Computational Linguistics
, 1998
"... The automatic disambiguation of word senses has been an interest and concern since the earliest days of computer treatment of language in the 1950's. Sense disambiguation is an “intermediate task ” (Wilks and Stevenson, 1996) which is not an end in itself, but rather is necessary at one level or ano ..."
Abstract
-
Cited by 92 (3 self)
- Add to MetaCart
The automatic disambiguation of word senses has been an interest and concern since the earliest days of computer treatment of language in the 1950's. Sense disambiguation is an “intermediate task ” (Wilks and Stevenson, 1996) which is not an end in itself, but rather is necessary at one level or another to accomplish most natural language processing tasks. It is
Homograph Disambiguation in Text-to-speech Synthesis
- PROGRESS IN SPEECH SYNTHESIS
, 1997
"... This chapter presents a statistical decision procedure for lexical ambiguity resolution in text-to-speech synthesis. Based on decision lists, the algorithm incorporates both local syntactic patterns and more distant collocational evidence, combining the strengths of decision trees, N-gram taggers an ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
This chapter presents a statistical decision procedure for lexical ambiguity resolution in text-to-speech synthesis. Based on decision lists, the algorithm incorporates both local syntactic patterns and more distant collocational evidence, combining the strengths of decision trees, N-gram taggers and Bayesian classifiers. The algorithm is applied to 7 major types of ambiguity where context can be used to choose a word's pronunciation.
Measures and Applications of Lexical Distributional Similarity
, 2003
"... This thesis is concerned with the measurement and application of lexical distributional similarity. Two words are said to be distributionally similar if they appear in similar contexts. This loose definition, however, has led to many measures being proposed or adopted from fields such as geometry, s ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
This thesis is concerned with the measurement and application of lexical distributional similarity. Two words are said to be distributionally similar if they appear in similar contexts. This loose definition, however, has led to many measures being proposed or adopted from fields such as geometry, statistics, Information Retrieval (IR) and Information Theory. Our aim is to investigate the properties which make a good measure of lexical distributional similarity. We start by introducing the concept of lexical distributional similarity. We discuss potential applications, which can be roughly divided into distributional or language modelling applications and semantic applications, and methods of evaluation (Chapter 2). We look at existing measures of distributional similarity and carry out an empirical comparison of fifteen of these measures, paying particular attention to the effects of word frequency (Chapter 3). We propose a new general framework for distributional similarity based on the context of lexical substitutability, which me measure using the IR concepts of precision and recall. This framework allows us to investigate the key factors in similarity of asymmetry, the relative influence of different contexts and the extent to which words share a context (Chapter 4). Finally, we consider the application of distributional similarity in language modelling (Chapter 5) and as a predictor of semantic similarity using human judgements of similarity and a spelling correction task (Chapter 6).
Incremental Construction of Finite-State Automata and Transducers, and their Use in the Natural Language Processing
, 1998
"... This dissertation states that it is possible to construct minimal deterministic finite-state automata fast and using little memory. Two new construction algorithms are presented. An implementation is discussed. Compared to a similar algorithm by Dominique Revuz, those presented here use far less mem ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
This dissertation states that it is possible to construct minimal deterministic finite-state automata fast and using little memory. Two new construction algorithms are presented. An implementation is discussed. Compared to a similar algorithm by Dominique Revuz, those presented here use far less memory. The thesis states that it is possible to construct automata that guess canonical forms and categories of unknown words much faster than it is done by other algorithms. A new algorithm is given and discussed. An overview of the use of finite-state automata in natural language processing (NLP) is given. A new type of automata is introduced. A method for spelling correction is enhanced so that it can handle Polish words.
Automatic Insertion of Accents in French Text
, 1998
"... Automatic accent insertion (AAI ) is the problem of re-inserting accents (diacritics) into a text where they are missing. Unaccented French texts are still quite common in electronic media, as a result of a long history of character encoding problems and the lack of well-established conventions for ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Automatic accent insertion (AAI ) is the problem of re-inserting accents (diacritics) into a text where they are missing. Unaccented French texts are still quite common in electronic media, as a result of a long history of character encoding problems and the lack of well-established conventions for typing accented characters on computer keyboards. We present an AAI method for French, based on a stochastic language model. This method was implemented into a program and C library of functions, which are now commercially available. Our experiments show that French text processed with this program contains less than one accent error per 130 words. We also show how our AAI method can be used to do on-the-fly accent insertions within a word-processing environment, which makes it possible to write in French without having to type accents. A prototype of such a system was integrated into the Emacs editor, and is now available to all students and employees of the Universit'e de Montr'eal's compu...
Word Sense Disambiguation Criteria: A Systematic Study
- In: (col
, 2004
"... This article describes the results of a systematic indepth study of the criteria used for word sense disambiguation. Our study is based on 60 target words: 20 nouns, 20 adjectives and 20 verbs. Our results are not always in line with some practices in the field. For example, we show that omitting no ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This article describes the results of a systematic indepth study of the criteria used for word sense disambiguation. Our study is based on 60 target words: 20 nouns, 20 adjectives and 20 verbs. Our results are not always in line with some practices in the field. For example, we show that omitting noncontent words decreases performance and that bigrams yield better results than unigrams. 1
Exploration of Contextual Constraints for Character Pre-Classification
- In Proceedings of the 6th International Conference on Document Analysis and Recognition (ICDAR
, 2001
"... We present strategies and results for identifying the symbol type (lower-case, upper-case, digit, and punctuation or special symbols) of every character in a text document by using various kinds of information from neighboring characters. In the expectation of reasonable word and character segmentat ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We present strategies and results for identifying the symbol type (lower-case, upper-case, digit, and punctuation or special symbols) of every character in a text document by using various kinds of information from neighboring characters. In the expectation of reasonable word and character segmentation for shape clustering, we designed several type recognition methods that depend on cluster n-grams, shape codes, and withinword context. On an ASCII test corpus of 925 articles that simulates perfect image-level processing, these methods achieve a substantial improvement over default assignment of all characters to lower case.
Diacritics Restoration: Learning from Letters versus Learning from Words
, 2002
"... This paper presents a method for diacritics restoration based on learning mechanisms that act at letter level. This technique is new to our knowledge, and we compare it with the well known techniques for diacritics restoration that learn from words. Our method is particularly useful for languages th ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper presents a method for diacritics restoration based on learning mechanisms that act at letter level. This technique is new to our knowledge, and we compare it with the well known techniques for diacritics restoration that learn from words. Our method is particularly useful for languages that lack large electronic dictionaries and where means for generalization beyond words are required. Accuracies of over 99% at letter level are reported.

