Using the Web to Obtain Frequencies for Unseen Bigrams (2003)
Cached
Download Links
- [acl.ldc.upenn.edu]
- [homepages.inf.ed.ac.uk]
- DBLP
Other Repositories/Bibliography
| Venue: | Computational Linguistics |
| Citations: | 104 - 2 self |
BibTeX
@ARTICLE{Keller03usingthe,
author = {Frank Keller and Mirella Lapata},
title = {Using the Web to Obtain Frequencies for Unseen Bigrams},
journal = {Computational Linguistics},
year = {2003},
volume = {29},
pages = {459--484}
}
Years of Citing Articles
OpenURL
Abstract
This article shows that the Web can be employed to obtain frequencies for bigrams that are unseen in a given corpus. We describe a method for retrieving counts for adjective-noun, noun-noun, and verb-object bigrams from the Web by querying a search engine. We evaluate this method by demonstrating: (a) a high correlation between Web frequencies and corpus frequencies; (b) a reliable correlation between Web frequencies and plausibility judgments; (c) a reliable correlation between Web frequencies and frequencies recreated using class-based smoothing; (d) a good performance of Web frequencies in a pseudodisambiguation task. 1.







