Estimating Sparse Events using Probabilistic Logic: Application to Word n-Grams
user correction - Legacy Corrections
; Department of Computer Science; State University of New York at Buffalo; U.S.A.
; 520 Lee Entrance; Amherst, New York 14228-2567
In several tasks from different fields, we are encountering sparse events. In order to provide with probabilities for such events, researchers commonly perform a maximum likelihood (ML) estimation. However, it is well-known that the ML estimator is sensitive to extreme values. In other words, configurations with low or high frequencies are respectively underestimated or overestimated and therefore nonreliable. In order to solve this problem and to better evaluate these probability values, we propose a novel approach based on the probabilistic logic (PL) paradigm. For a sake of illustration, we focuss on this paper on events such as word trigrams (w 3 ; w 1 ; w 2 ) or word/pos-tag trigrams ((w 3 ; t 3 ); (w 1 ; t 1 ); (w 2 ; t 2 )). These latter entities are the basic objects used in speech or handwriting recognition. In order to distinguish between for example: "replace the fun" and "replace the floor" an accurate estimation of these two trigrams is needed. The ML estimation is equival...