MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Similarity-Based Models of Word Cooccurrence Probabilities (1999) [52 citations — 0 self]

Abstract:

In many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations "eat a peach" and "eat a beach" is more likely. Statistical NLP methods determine the likelihood of a word combination from its frequency in a training corpus. However, the nature of language is such that many word combinations are infrequent and do not occur in any given corpus. In this work we propose a method for estimating the probability of such previously unseen word combinations using available information on "most similar" words. We describe probabilistic word association models based on distributional word similarity, and apply them to two tasks, language modeling and pseudo-word disambiguation. In the language modeling task, a similarity-based model is used to improve probability estimates for unseen bigrams in a back-off language model. The similarity-based ...

Citations

4923 Elements of Information Theory – Cover, Thomas - 1991
3011 Pattern Classification and Scene Analysis – Duda, Hart - 1973
1072 Introduction to WordNet: An On-line Lexical Database – Miller, Beckwith, et al. - 1990
787 Instance-based Learning Algorithms – Aha, Kibler, et al. - 1991
653 Information Theory and Statistics – Kullback - 1959
619 A Probabilistic Theory of Pattern Recognition – Devroye, Gyorfi, et al. - 1996
588 A stochastic parts program and noun phrase parser for unrestricted text – Church - 1988
540 Nearest neighbor pattern classification – Cover, Hart - 1967
508 Estimation of probabilities from sparse data for the language model component of a speech recognizer – Katz - 1987
407 Distributional clustering of english words – Pereira, Tishby, et al. - 1993
400 Towards memory-based reasoning – Stanfill, Waltz - 1986
396 Class-based n-gram models of natural language – BROWN, J, et al. - 1990
391 An empirical study of smoothing techniques for language modeling – Chen, Goodman - 1996
307 Locally weighted learning – Atkeson, Moore, et al. - 1997
244 Semantic similarity based on corpus statistics and lexical taxonomy – Jiang, Conrath - 1997
235 The population frequencies of species and the estimation of population parameters – Good - 1953
230 Interpolated estimation of Markov source parameters from sparse data – Jelinek, Mercer - 1980
228 Word sense disambiguation using statistical models of Roget's categories trained on large corpora – Yarowsky - 1992
224 Explorations in Automatic Thesaurus Discovery – Grefenstette - 1994
207 Divergence measures based on the shannon entropy – Lin - 1991
180 Integrating multiple knowledge sources to disambiguate word sense: an exemplar-based approach – Ng, Lee - 1996
165 Noun classification from predicate-argument structure – Hindle - 1990
151 The zerofrequency problem: Estimating the probabilities of novel events in adaptive text compression – Witten, Bell - 1991
104 Disambiguating Noun Groupings with Respect to WordNet Senses – Resnik - 1995
102 Dimensions of meaning – Schütze - 1992
77 Improved clustering techniques for class-based statistical language modeling – Kneser, Ney - 1993
75 Using syntactic dependency as local context to resolve word sense ambiguity – Lin - 1997
69 Contextual word similarity and estimation from sparse data – Dagan, Marcus, et al. - 1993
63 A case-based approach to knowledge acquisition for domain-specific sentence analysis – Cardie - 1993
61 Aggregate and mixed-order Markov models for statistical language processing – Saul, Pereira - 1997
56 Principles of lexical language modeling for speech recognition – Jelinek, Mercer, et al. - 1991
55 Use of syntactic context to produce term association lists for text retrieval – Grefenstette - 1992
53 Similaritybased estimation of word cooccurrence probabilities – Dagan, Pereira, et al. - 1994
42 Wordnet and distributional analysis: a class-based approach to lexical discovery – Resnik - 1992
41 Word space – Schütze - 1993
40 Work on statistical methods for word sense disambiguation – Gale, Church, et al. - 1999
36 Exemplar-based word sense disambiguation: some recent improvements – Ng - 1997
35 Cooccurrence smoothing for stochastic language modeling – Essen, Steinbiss - 1992
34 Similarity-based methods for word sense disambiguation – Dagan, Lee, et al. - 1997
32 Statistical Sense Disambiguation with Relatively Small Corpus Using Dictionary Definitions, 33rd Annual Meeting of the Association for Computational Linguistics,26-30 – Luk - 1995
31 Similarity-based approaches to natural language processing – Lee - 1997
28 Experiments on linguistically-based term associations – Ruge - 1992
26 Discovery procedures for sublanguage selectional patterns: Initial experiments – Grishman, Hirschman, et al. - 1986
20 Smoothing of automatically generated selectional constraints – Grishman, Sterling - 1993
19 Learning similarity-based word sense disarnbiguation – Karov, Edelman - 1996
9 Isolated word recognition using hidden markov models – Sugawara, Nishimura, et al. - 1985
7 An extended clustering algorithm for statistical language models – Ueberla - 1994
7 Hierarchical clustering of words and application to NLP tasks – Ushioda - 1996
3 Distributional clustering of English words. In 31st annual meeting of the association for computational linguistics (p – Pereira, Tishby, et al. - 1993
2 the proceedings of 31st Annual Meeting of ACL – In - 1994