MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Empirical Estimates of Adaptation: The chance of Two Noriegas is closer to p/2 than p^2 (2000) [34 citations — 0 self]

by Kenneth Church
In Proceedings of the 17th conference on Computational linguistics
Add To MetaCart

Abstract:

Repetition is very common. Repetition is very common. Adaptive language models, which allow probabilities to change or adapt after seeing just a few words of a text, were introduced in speech recognition to account for text cohesion. Suppose a document mentions Noriega once. What is the chance that he will be mentioned again? If the first instance has probability p, then under standard (bag-of-words) independence assumptions, two instances ought to have probability p^2, but we find the probability is actually closer to p/2. The first mention of a word obviously depends on frequency, but surprisingly, the second does not. Adaptation depends more on lexical content than frequency; there is more adaptation for content words (proper nouns, technical terminology and good keywords for information retrieval), and less adaptation for function words, cliches and ordinary first names.

Citations

144 Frequency Analysis of English Usage – Francis - 1982
53 Poisson mixtures – Gale - 1995
38 Context And Structure In Automated Full-Text Information Access". Doctor of Philosophy Thesis – Hearst - 1994
20 Dynamic nonlocal language modeling via hierarchical topic-based adaptation – Florian, Yarowsky - 1999