Results 1 
1 of
1
GoodTuring smoothing without tears
 Journal of Quantitative Linguistics
, 1995
"... The performance of statistically based techniques for many tasks such as spelling correction, sense disambiguation, and translation is improved if one can estimate a probability for an object of interest which has not been seen before. GoodTuring methods are one means of estimating these probabilit ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
The performance of statistically based techniques for many tasks such as spelling correction, sense disambiguation, and translation is improved if one can estimate a probability for an object of interest which has not been seen before. GoodTuring methods are one means of estimating these probabilities for previously unseen objects. However, the use of GoodTuring methods requires a smoothing step which must smooth in regions of vastly different accuracy. Such smoothers are difficult to use, and may have hindered the use of GoodTuring methods in computational linguistics. This paper presents a method which uses the simplest possible smooth, a straight line, together with a rule for switching from Turing estimates which are more accurate at low frequencies. We call this method the Simple GoodTuring (SGT) method. Two examples, one from prosody, the other from morphology, are used to illustrate the SGT. While the goal of this research was to provide a simple estimator, the SGT turns out to be the most accurate of several methods applied in a set of Monte Carlo examples which satisfy the assumptions of the GoodTuring methods. The accuracy of the SGT is compared to two other methods for estimating the same probabilities, the Expected Likelihood Estimate (ELE) and two way cross validation. The SGT method is