Results 1 -
1 of
1
Good-Turing smoothing without tears
- Journal of Quantitative Linguistics
, 1995
"... The performance of statistically based techniques for many tasks such as spelling correction, sense disambiguation, and translation is improved if one can estimate a probability for an object of interest which has not been seen before. Good-Turing methods are one means of estimating these probabilit ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
The performance of statistically based techniques for many tasks such as spelling correction, sense disambiguation, and translation is improved if one can estimate a probability for an object of interest which has not been seen before. Good-Turing methods are one means of estimating these probabilities for previously unseen objects. However, the use of Good-Turing methods requires a smoothing step which must smooth in regions of vastly different accuracy. Such smoothers are difficult to use, and may have hindered the use of Good-Turing methods in computational linguistics. This paper presents a method which uses the simplest possible smooth, a straight line, together with a rule for switching from Turing estimates which are more accurate at low frequencies. We call this method the Simple Good-Turing (SGT) method. Two examples, one from prosody, the other from morphology, are used to illustrate the SGT. While the goal of this research was to provide a simple estimator, the SGT turns out to be the most accurate of several methods applied in a set of Monte Carlo examples which satisfy the assumptions of the Good-Turing methods. The accuracy of the SGT is compared to two other methods for estimating the same probabilities, the Expected Likelihood Estimate (ELE) and two way cross validation. The SGT method is

