## A simple LNRE model for random character sequences (2004)

Venue: | In Proceedings of the 7èmes Journées Internationales d'Analyse Statistique des Données Textuelles (Louvain-la-Neuve |

Citations: | 10 - 4 self |

### BibTeX

@INPROCEEDINGS{Evert04asimple,

author = {Stefan Evert},

title = {A simple LNRE model for random character sequences},

booktitle = {In Proceedings of the 7èmes Journées Internationales d'Analyse Statistique des Données Textuelles (Louvain-la-Neuve},

year = {2004},

pages = {411--422}

}

### OpenURL

### Abstract

This paper describes a population model for word frequency distributions based on the Zipf-Mandelbrot law, corresponding to the word frequency distribution induced by a random character sequence. The model, which has convenient analytical and numerical properties, is shown to be adequate for the description of language data extracted by automatic means from large text corpora. It can thus be used to study the problems faced by the statistical analysis of such data in the field of natural-language processing. Keywords: lexical statistics, LNRE models, Zipf-Mandelbrot law, random text, cooccurrence statistics 1 Introduction to lexical statistics and LNRE models Most work in the area of lexical statistics is based on random sampling with replacement. 1 This model assumes a population of types w1, · · · , wS with occurrence probabilities π1, · · · , πS. S is called the population size and may be infinite (S = ∞) in the case of a countably infinite population. The probabilities πi are the parameters of this model and must satisfiy