Results 1 -
1 of
1
Estimation of probabilities from sparse data for the language model component of a speech recognizer
- IEEE Transactions on Acoustics, Speech and Signal Processing
, 1987
"... Abstract-The description of a novel type of rn-gram language model is given. The model offers, via a nonlinear recursive procedure, a com-putation and space efficient solution to the problem of estimating prob-abilities from sparse data. This solution compares favorably to other proposed methods. Wh ..."
Abstract
-
Cited by 574 (1 self)
- Add to MetaCart
Abstract-The description of a novel type of rn-gram language model is given. The model offers, via a nonlinear recursive procedure, a com-putation and space efficient solution to the problem of estimating prob-abilities from sparse data. This solution compares favorably to other proposed methods. While the method has been developed for and suc-cessfully implemented in the IBM Real Time Speech Recognizers, its generality makes it applicable in other areas where the problem of es-timating probabilities from sparse data arises. Sparseness of data is an inherent property of any real text, and it is a problem that one always encounters while collecting fre-quency statistics on words and word sequences (m-grams) from a text of finite size. This means that even for a very large data col-lection, the maximum likelihood estimation method does not allow Turing’s estimate PT for a probability of a word (m-gram) which occurred in the sample r times is r* PT = where r We call a procedure of replacing a count r with a modified count r ’ “discounting ” and a ratio rt/r a discount coefficient dr. When r ’ = r*, we have Turing’s discounting. Let us denote the m-gram wl, *.., w, as wy and the number of times it occurred in the sample text as c(wT). Then the maxi-mum likelihood estimate is

