Results 1 -
5 of
5
Similarity-based models of word cooccurrence probabilities
- Machine Learning
, 1999
"... Abstract. In many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations “eat a peach ” and “eat a beach ” is more likely. Statistical NLP met ..."
Abstract
-
Cited by 70 (0 self)
- Add to MetaCart
Abstract. In many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations “eat a peach ” and “eat a beach ” is more likely. Statistical NLP methods determine the likelihood of a word combination from its frequency in a training corpus. However, the nature of language is such that many word combinations are infrequent and do not occur in any given corpus. In this work we propose a method for estimating the probability of such previously unseen word combinations using available information on “most similar ” words. We describe probabilistic word association models based on distributional word similarity, and apply them to two tasks, language modeling and pseudo-word disambiguation. In the language modeling task, a similarity-based model is used to improve probability estimates for unseen bigrams in a back-off language model. The similaritybased method yields a 20 % perplexity improvement in the prediction of unseen bigrams and statistically significant reductions in speech-recognition error. We also compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and bigram frequency to avoid giving too much weight to easy-to-disambiguate high-frequency configurations. The similaritybased methods perform up to 40 % better on this particular task.
Word clustering and disambiguation based on co-occurrence data
- Natural Language Engineering
, 1998
"... We address the problem of clustering words (or con-structing a thesaurus) based on co-occurrence data, and using the acquired word classes to improve the accuracy of syntactic disambiguation. We view this problem as that of estimating a joint probability dis-tribution specifying the joint probabilit ..."
Abstract
-
Cited by 37 (0 self)
- Add to MetaCart
We address the problem of clustering words (or con-structing a thesaurus) based on co-occurrence data, and using the acquired word classes to improve the accuracy of syntactic disambiguation. We view this problem as that of estimating a joint probability dis-tribution specifying the joint probabilities of word pairs, such as noun verb pairs. We propose an effi-cient algorithm based on the Minimum Description Length (MDL) principle for estimating such a prob-ability distribution. Our method is a natural ex-tension of those proposed in (Brown et al., 1992) and (Li and Abe, 1996), and overcomes their draw-backs while retaining their advantages. We then coinbined this clustering method with the disam-I)iguation method of (Li and Abe, 1995) to derive a disambiguation method that makes use of both auto-matically constructed thesauruses and a hand-made thesaurus. The overall disambiguation accuracy achieved by our method is 85.2%, which compares favorably against the accuracy (82.4%) obtained by the state-of-the-art disambiguation method of (Brill and Resnik, 1994). 1
Chinese whispers - an efficient graph clustering algorithm and its application to natural language processing problems
- In Proceedings of TextGraphs: the 1st Workshop on Graph Based Methods for Natural Language Processing
, 2006
"... We introduce Chinese Whispers, a randomized graph-clustering algorithm, which is time-linear in the number of edges. After a detailed definition of the algorithm and a discussion of its strengths and weaknesses, the performance of Chinese Whispers is measured on Natural Language Processing (NLP) pro ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
We introduce Chinese Whispers, a randomized graph-clustering algorithm, which is time-linear in the number of edges. After a detailed definition of the algorithm and a discussion of its strengths and weaknesses, the performance of Chinese Whispers is measured on Natural Language Processing (NLP) problems as diverse as language separation, acquisition of syntactic word classes and word sense disambiguation. At this, the fact is employed that the small-world property holds for many graphs in NLP. 1

