Results 1 - 10
of
19
Reducing the Space Requirement of Suffix Trees
- Software – Practice and Experience
, 1999
"... We show that suffix trees store various kinds of redundant information. We exploit these redundancies to obtain more space efficient representations. The most space efficient of our representations requires 20 bytes per input character in the worst case, and 10.1 bytes per input character on average ..."
Abstract
-
Cited by 109 (10 self)
- Add to MetaCart
We show that suffix trees store various kinds of redundant information. We exploit these redundancies to obtain more space efficient representations. The most space efficient of our representations requires 20 bytes per input character in the worst case, and 10.1 bytes per input character on average for a collection of 42 files of different type. This is an advantage of more than 8 bytes per input character over previous work. Our representations can be constructed without extra space, and as fast as previous representations. The asymptotic running times of suffix tree applications are retained. Copyright © 1999 John Wiley & Sons, Ltd. KEY WORDS: data structures; suffix trees; implementation techniques; space reduction
FASTER SUFFIX SORTING
, 1999
"... We propose a fast and memory efficient algorithm for lexicographically sorting the suffixes of a string, a problem that has important applications in data compression as well as string matching. Our ..."
Abstract
-
Cited by 42 (2 self)
- Add to MetaCart
We propose a fast and memory efficient algorithm for lexicographically sorting the suffixes of a string, a problem that has important applications in data compression as well as string matching. Our
Boosting textual compression in optimal linear time
- Journal of the ACM
, 2005
"... Abstract. We provide a general boosting technique for Textual Data Compression. Qualitatively, it takes a good compression algorithm and turns it into an algorithm with a better compression Extended abstracts related to this article appeared in Proceedings of CPM 2001 and Proceedings of ACM-SIAM SOD ..."
Abstract
-
Cited by 34 (19 self)
- Add to MetaCart
Abstract. We provide a general boosting technique for Textual Data Compression. Qualitatively, it takes a good compression algorithm and turns it into an algorithm with a better compression Extended abstracts related to this article appeared in Proceedings of CPM 2001 and Proceedings of ACM-SIAM SODA 2004, and were combined due to their strong relatedness and complementarity. The work of P. Ferragina was partially supported by the Italian MIUR projects “Algorithms for the Next
Compression Boosting in Optimal Linear Time Using the Burrows-Wheeler Transform
- Journal of the ACM
, 2004
"... In this paper we provide the first compression booster that turns a zeroth order compressor into a more e#ective k-th order compressor without any loss in time e#ciency. More precisely, let A be an algorithm that compresses a string s within #|s|H # 0 (s)+ bits of storage in O(T (|s|)) time, where H ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
In this paper we provide the first compression booster that turns a zeroth order compressor into a more e#ective k-th order compressor without any loss in time e#ciency. More precisely, let A be an algorithm that compresses a string s within #|s|H # 0 (s)+ bits of storage in O(T (|s|)) time, where H # 0 (s) is the zeroth order entropy of the string s. Our booster improves A by compressing s within #|s|H # k (s) + log 2 + g k bits still using O(T (|s|)) time, where H # k (s) is the k-th order entropy of s.
Unifying Text Search And Compression - Suffix Sorting, Block Sorting and Suffix Arrays
, 2000
"... Today many electronic documents are available such as articles of newspapers, dictionaries, books, DNA sequences, etc. and they are stored in databases. We also have many documents on the Internet and have many e-mail documents. Therefore, fast queries on such huge amount of documents and their comp ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Today many electronic documents are available such as articles of newspapers, dictionaries, books, DNA sequences, etc. and they are stored in databases. We also have many documents on the Internet and have many e-mail documents. Therefore, fast queries on such huge amount of documents and their compression to reduce costs for storing or transferring them are important. In this thesis, a unified method for improving efficiency of search and compression for huge text data is proposed. All search methods and compression methods used in this thesis are related to a data structure called suffix array. The suffix array is a text search data structure and it is used in a text compression method called block sorting. Both are promising search method and compression method and there are many studies on the methods. Now a data structure called inverted file is used for queries from huge amount of documents. Though it is widely used, query unit is a document in order to reduce disk space to sto...
The Burrows-Wheeler Transform: Theory and Practice
- Lecture Notes in Computer Science
, 1999
"... In this paper we describe the Burrows-Wheeler Transform (BWT) a completely new approach to data compression which is the basis of some of the best compressors available today. Although it is easy to intuitively understand why the BWT helps compression, the analysis of BWT-based algorithms requir ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
In this paper we describe the Burrows-Wheeler Transform (BWT) a completely new approach to data compression which is the basis of some of the best compressors available today. Although it is easy to intuitively understand why the BWT helps compression, the analysis of BWT-based algorithms requires a careful study of every single algorithmic component. We describe two algorithms which use the BWT and we show that their compression ratio can be bounded in terms of the k-th order empirical entropy of the input string for any k 0. Intuitively, this means that these algorithms are able to make use of all the regularity which is in the input string.
LIPT: A reversible lossless text transform to improve compression performance
- In Proceedings of the IEEE Data Compression Conference 2001, Snowbird
, 2001
"... Abstract. Lossless compression researchers have developed highly sophisticated approaches, such as Huffman encoding, arithmetic encoding, the Lempel-Ziv family, ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract. Lossless compression researchers have developed highly sophisticated approaches, such as Huffman encoding, arithmetic encoding, the Lempel-Ziv family,
LIPT: A Lossless Text Transform to improve compression
- In Proceedings of International Conference on Information and Theory : Coding and Computing, Las Vegas
, 2001
"... We propose an approach to develop a dictionary based reversible lossless text transformation, called LIPT (Length Index Preserving Transform), which can be applied to a source text to improve existing algorithm’s ability to compress. In LIPT, the length of the input word and the offset of the words ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
We propose an approach to develop a dictionary based reversible lossless text transformation, called LIPT (Length Index Preserving Transform), which can be applied to a source text to improve existing algorithm’s ability to compress. In LIPT, the length of the input word and the offset of the words in the dictionary are denoted with alphabets. Our encoding scheme makes use of recurrence of same length of words in the English Language to create context in the transformed text that the entropy coders can exploit. LIPT achieves some compression at the preprocessing stage as well and retains enough context and redundancy for the compression algorithms to give better results. Bzip2 with LIPT gives 5.24 % improvement in average BPC over Bzip2 without LIPT, and PPMD with LIPT gives 4.46% improvement in average BPC over PPMD without LIPT, for our test corpus. 1.
Improving the Speed of LZ77 Compression by Hashing and Suffix Sorting
, 2000
"... this paper is to find them quickly. ..."

