## The Complexity and Entropy of Literary Styles (1996)

Citations: | 5 - 1 self |

### BibTeX

@MISC{Kontoyiannis96thecomplexity,

author = {I. Kontoyiannis},

title = {The Complexity and Entropy of Literary Styles},

year = {1996}

}

### OpenURL

### Abstract

Since Shannon's original experiment in 1951, several methods have been applied to the problem of determining the entropy of English text. These methods were based either on prediction by human subjects, or on computer-implemented parametric models for the data, of a certain Markov order. We ask why computer-based experiments almost always yield much higher entropy estimates than the ones produced by humans. We argue that there are two main reasons for this discrepancy. First, the long-range correlations of English text are not captured by Markovian models and, second, computerbased models only take advantage of the text statistics without being able to "understand" the contextual structure and the semantics of the given text. The second question we address is what does the "entropy" of a text say about the author's literary style. In particular, is there an intuitive notion of "complexity of style" that is captured by the entropy? We present preliminary results based on a non-parametric entropy estimation algorithm that o er partial answers to these questions. These results indicate that taking long-range correlations into account significantly improves the entropy estimates. We get an estimate of 1.77 bits-per-character for a onemillion-character sample taken from Jane Austen's works. Also comparing the estimates obtained from several di erent texts provides some insight into the interpretation of the notion of "entropy" when applied to English text rather than to random processes, and the relationship between the entropy and the "literary complexity" of an author's style. Advantages of this entropy estimation method are that it does not require prior training, it is uniformly good over different styles and languages, and it seems to converge reasonably fast.