Building probabilistic models of language is a central task in natural language and speech processing allowing to integrate the syntactic and/or semantic (and recently pragmatic) constraints of the language into the systems. Probabilistic language models are an attractive alternative to the more traditional rule-based systems, such as context free grammars, because of the recent availability of massive amount of text corpora which can be used to efficiently train the models and because instead of binary grammaticality judgement offered by the rule-based systems, likelihood of any sequence of lexical units can be obtained, which is a crucial factor in such tasks as speech recognition. Probabilistic language models also find their application in part-of-speech tagging, machine translation, semantic disambiguation and numerous other fields. The most widely used language models are based on the estimation of the probability of observing a given lexical unit conditioned on the observations of n−1 preceding lexical units, and are known as n-gram models. When the n-gram estimates are