@MISC{Warren_vocabularysize, author = {Robert Warren}, title = {Vocabulary size and email authentication}, year = {} }
Share
OpenURL
Abstract
This paper explores the performance of the method proposed by Efron and Thisted to predict vocabulary sizes based on sampled text. The objective of this research is to determine whether this simple and quick test can be used as a coarse indicator of authorship. Three sets of emails, as well as other texts are analyzed in order to collect performance data. The conclusion is that the test is at best a lower bound indicator within the T <1.0 region and is not sufficient as an authentication method. 1