Reducing the Space Requirement of Suffix Trees
 Software – Practice and Experience
, 1999
"... We show that suffix trees store various kinds of redundant information. We exploit these redundancies to obtain more space efficient representations. The most space efficient of our representations requires 20 bytes per input character in the worst case, and 10.1 bytes per input character on average ..."
Cited by 118 (10 self)
We show that suffix trees store various kinds of redundant information. We exploit these redundancies to obtain more space efficient representations. The most space efficient of our representations requires 20 bytes per input character in the worst case, and 10.1 bytes per input character on average for a collection of 42 files of different type. This is an advantage of more than 8 bytes per input character over previous work. Our representations can be constructed without extra space, and as fast as previous representations. The asymptotic running times of suffix tree applications are retained. Copyright © 1999 John Wiley & Sons, Ltd. KEY WORDS: data structures; suffix trees; implementation techniques; space reduction
FASTER SUFFIX SORTING
, 1999
"... We propose a fast and memory efficient algorithm for lexicographically sorting the suffixes of a string, a problem that has important applications in data compression as well as string matching. Our ..."
Cited by 46 (2 self)
We propose a fast and memory efficient algorithm for lexicographically sorting the suffixes of a string, a problem that has important applications in data compression as well as string matching. Our
Boosting textual compression in optimal linear time
 Journal of the ACM
, 2005
"... Abstract. We provide a general boosting technique for Textual Data Compression. Qualitatively, it takes a good compression algorithm and turns it into an algorithm with a better compression Extended abstracts related to this article appeared in Proceedings of CPM 2001 and Proceedings of ACMSIAM SOD ..."
Cited by 39 (19 self)
Abstract. We provide a general boosting technique for Textual Data Compression. Qualitatively, it takes a good compression algorithm and turns it into an algorithm with a better compression Extended abstracts related to this article appeared in Proceedings of CPM 2001 and Proceedings of ACMSIAM SODA 2004, and were combined due to their strong relatedness and complementarity. The work of P. Ferragina was partially supported by the Italian MIUR projects “Algorithms for the Next
Compression Boosting in Optimal Linear Time Using the BurrowsWheeler Transform
 Journal of the ACM
, 2004
"... In this paper we provide the first compression booster that turns a zeroth order compressor into a more e#ective kth order compressor without any loss in time e#ciency. More precisely, let A be an algorithm that compresses a string s within #sH # 0 (s)+ bits of storage in O(T (s)) time, where H ..."
Cited by 12 (5 self)
In this paper we provide the first compression booster that turns a zeroth order compressor into a more e#ective kth order compressor without any loss in time e#ciency. More precisely, let A be an algorithm that compresses a string s within #sH # 0 (s)+ bits of storage in O(T (s)) time, where H # 0 (s) is the zeroth order entropy of the string s. Our booster improves A by compressing s within #sH # k (s) + log 2 + g k bits still using O(T (s)) time, where H # k (s) is the kth order entropy of s.
Enhanced WordBased BlockSorting Text Compression
 PROC. OF THE TWENTYFIFTH AUSTRALASIAN CONFERENCE ON COMPUTER SCIENCE
, 2002
"... The Block Sorting process of Burrows and Wheeler can be applied to any sequence in which symbols are (or might be) conditioned upon each other. In particular, it is possible to parse text into a stream of words, and then employ block sorting to identify and so exploit any conditioning relationships ..."
Cited by 9 (0 self)
The Block Sorting process of Burrows and Wheeler can be applied to any sequence in which symbols are (or might be) conditioned upon each other. In particular, it is possible to parse text into a stream of words, and then employ block sorting to identify and so exploit any conditioning relationships between words. In this paper we build upon the previous work of two of the authors, describing several further recency rank transformations, and considering also the role of the entropy coder. By combining the best of the new recency transformations with an entropy coder that conditions ranks upon gross characteristics of previous ones, we are able to obtain improved compression on typical text files.
LIPT: A Lossless Text Transform to improve compression
 In Proceedings of International Conference on Information and Theory : Coding and Computing, Las Vegas
, 2001
"... We propose an approach to develop a dictionary based reversible lossless text transformation, called LIPT (Length Index Preserving Transform), which can be applied to a source text to improve existing algorithm’s ability to compress. In LIPT, the length of the input word and the offset of the words ..."
Cited by 9 (2 self)
We propose an approach to develop a dictionary based reversible lossless text transformation, called LIPT (Length Index Preserving Transform), which can be applied to a source text to improve existing algorithm’s ability to compress. In LIPT, the length of the input word and the offset of the words in the dictionary are denoted with alphabets. Our encoding scheme makes use of recurrence of same length of words in the English Language to create context in the transformed text that the entropy coders can exploit. LIPT achieves some compression at the preprocessing stage as well and retains enough context and redundancy for the compression algorithms to give better results. Bzip2 with LIPT gives 5.24 % improvement in average BPC over Bzip2 without LIPT, and PPMD with LIPT gives 4.46% improvement in average BPC over PPMD without LIPT, for our test corpus. 1.
Unifying Text Search And Compression  Suffix Sorting, Block Sorting and Suffix Arrays
, 2000
"... Today many electronic documents are available such as articles of newspapers, dictionaries, books, DNA sequences, etc. and they are stored in databases. We also have many documents on the Internet and have many email documents. Therefore, fast queries on such huge amount of documents and their comp ..."
Cited by 6 (0 self)
Today many electronic documents are available such as articles of newspapers, dictionaries, books, DNA sequences, etc. and they are stored in databases. We also have many documents on the Internet and have many email documents. Therefore, fast queries on such huge amount of documents and their compression to reduce costs for storing or transferring them are important. In this thesis, a unified method for improving efficiency of search and compression for huge text data is proposed. All search methods and compression methods used in this thesis are related to a data structure called suffix array. The suffix array is a text search data structure and it is used in a text compression method called block sorting. Both are promising search method and compression method and there are many studies on the methods. Now a data structure called inverted file is used for queries from huge amount of documents. Though it is widely used, query unit is a document in order to reduce disk space to sto...
LIPT: A reversible lossless text transform to improve compression performance
 In Proceedings of the IEEE Data Compression Conference 2001, Snowbird
, 2001
"... Abstract. Lossless compression researchers have developed highly sophisticated approaches, such as Huffman encoding, arithmetic encoding, the LempelZiv family, ..."
Cited by 6 (0 self)
Abstract. Lossless compression researchers have developed highly sophisticated approaches, such as Huffman encoding, arithmetic encoding, the LempelZiv family,
The BurrowsWheeler Transform: Theory and Practice
 Lecture Notes in Computer Science
, 1999
"... In this paper we describe the BurrowsWheeler Transform (BWT) a completely new approach to data compression which is the basis of some of the best compressors available today. Although it is easy to intuitively understand why the BWT helps compression, the analysis of BWTbased algorithms requir ..."
Cited by 5 (1 self)
In this paper we describe the BurrowsWheeler Transform (BWT) a completely new approach to data compression which is the basis of some of the best compressors available today. Although it is easy to intuitively understand why the BWT helps compression, the analysis of BWTbased algorithms requires a careful study of every single algorithmic component. We describe two algorithms which use the BWT and we show that their compression ratio can be bounded in terms of the kth order empirical entropy of the input string for any k 0. Intuitively, this means that these algorithms are able to make use of all the regularity which is in the input string.