Storing Text Retrieval Systems on CD-ROM: Compression and Encryption Considerations (1989)
| Venue: | ACM Transactions on Information Systems |
| Citations: | 21 - 3 self |
BibTeX
@ARTICLE{Klein89storingtext,
author = {Shmuel T. Klein and Abraham Bookstein and Scott Deerwester},
title = {Storing Text Retrieval Systems on CD-ROM: Compression and Encryption Considerations},
journal = {ACM Transactions on Information Systems},
year = {1989},
volume = {7},
pages = {230--245}
}
OpenURL
Abstract
: The emergence of the CD-ROM as a storage medium for full-text databases raises the question of the maximum size database that can be contained by this medium. As an example, the problem of storing the Tr'esor de la Langue Fran¸caise on a CD-ROM is examined in this paper. The text alone of this database is 700 MB long, more than a CD-ROM can hold. But in addition the dictionary and concordance needed to access this data must be stored. A further constraint is that some of the material is copyrighted, and it is desirable that such material be difficult to decode except through software provided by the system. Pertinent approaches to compression of the various files are reviewed and the compression of the text is related to the problem of data encryption: specifically, it is shown that, under simple models of text generation, Huffman encoding produces a bit-string indistinguishible from a representation of coin flips. Categories and Subject Descriptors: E.3 E.4 H.3.2 J.5 General terms: ...







