Results 1 -
4 of
4
Adding Compression to a Full-Text Retrieval System
, 1995
"... We describe the implementation of a data compression scheme as an integral and transparent layer within a full-text... ..."
Abstract
-
Cited by 75 (25 self)
- Add to MetaCart
We describe the implementation of a data compression scheme as an integral and transparent layer within a full-text...
Indexing Compressed Text
- Proceedings of the 4th South American Workshop on String Processing
, 1997
"... We present a technique to build an index based on suffix arrays for compressed texts. We also propose a compression scheme for textual databases based on words that generates a compression code that preserves the lexicographical ordering of the text words. As a consequence it permits the sorting of ..."
Abstract
-
Cited by 20 (8 self)
- Add to MetaCart
We present a technique to build an index based on suffix arrays for compressed texts. We also propose a compression scheme for textual databases based on words that generates a compression code that preserves the lexicographical ordering of the text words. As a consequence it permits the sorting of the compressed strings to generate the suffix array without decompressing. As the compressed text is under 30% of the size of the original text we are able to build the suffix array twice as fast on the compressed text. The compressed text plus index is 55-60% of the size of the original text plus index and search times are reduced to approximately half the time. We also present analytical and experimental results for different variations of the word-oriented compression paradigm.
Text Compression for Dynamic Document Databases
- IEEE Transactions on Knowledge and Data Engineering
, 1994
"... For compression of text databases, semi-static word-based methods provide good performance in terms of both speed and disk space, but two problems arise. First, the memory requirements for the compression model during decoding can be unacceptably high. Second, the need to handle document insertions ..."
Abstract
-
Cited by 19 (7 self)
- Add to MetaCart
For compression of text databases, semi-static word-based methods provide good performance in terms of both speed and disk space, but two problems arise. First, the memory requirements for the compression model during decoding can be unacceptably high. Second, the need to handle document insertions means that the collection must be periodically recompressed, if compression e#ciency is to be maintained on dynamic collections. Here we show that with careful management the impact of both of these drawbacks can be kept small. Experiments with a word-based model and 500 Mb of text show that excellent compression rates can be retained even in the presence of severe memory limitations on the decoder, and after significant expansion in the amount of stored text.
Static Compression for Dynamic Texts
- Proc. IEEE Data Compression Conference
, 1994
"... : Two problems arise when semi-static word-based compression methods are applied to large texts, such as those stored in information retrieval systems. First, the space required for the model during decoding can become very large. Second, the need to handle document insertions means that the collect ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
: Two problems arise when semi-static word-based compression methods are applied to large texts, such as those stored in information retrieval systems. First, the space required for the model during decoding can become very large. Second, the need to handle document insertions means that the collection must be periodically recompressed if compression efficiency is to be maintained. Here we show that with careful management the impact of both of these drawbacks can be minimised. Experiments with a word-based model and over 500 Mb of text show that compression rates can be retained even in the face of severe memory limitations on the decoder, and in the face of significant expansion in the size of the text itself. 1 Word-Based Compression The use of a word-based zero-order compression model to represent English text has been considered by several authors [2, 6, 7, 15]. It is particularly appropriate for compressing full-text document collections, an application in which very large quant...

