@MISC{Church95oneterm, author = {Kenneth Ward Church}, title = {One Term or Two?}, year = {1995} }
Share
OpenURL
Abstract
How effective is stemming? Text normalization? Stemming experiments test two hypotheses: one term (+stemmer) or two (--stemmer). The truth lies somewhere in between. The correlations, r, between a word and its variants (e.g., + s, + ly, +uppercase) tend to be small (refuting the one term hypothesis), but non-negligible (refuting the two term hypothesis). Moreover, r varies systematically depending on the words involved; it is relatively large for a good keyword, r(hostage , hostages) ~ ~0.5, and small for pairs with little content, r(anytime, Anytime) ~ ~0, or conflicting content, r(continental , Continental) ~ ~0. 1. How effective is suffixing? Text normalization? NLP? Many systems use a stemmer to map morphological variants, e.g., hostage and hostages, into a single term. Do stemmers help retrieval performance? Frakes (1992, table 8.1, p. 148) summarizes a number of stemming experiments, many of which failed to find much of a difference in terms of precision and recall (though t...