Results 1  10
of
29
Lineartime longestcommonprefix computation in suffix arrays and its applications
, 2001
"... Abstract. We present a lineartime algorithm to compute the longest common prefix information in suffix arrays. As two applications of our algorithm, we show that our algorithm is crucial to the effective use of blocksorting compression, and we present a lineartime algorithm to simulate the bottom ..."
Abstract

Cited by 113 (2 self)
 Add to MetaCart
(Show Context)
Abstract. We present a lineartime algorithm to compute the longest common prefix information in suffix arrays. As two applications of our algorithm, we show that our algorithm is crucial to the effective use of blocksorting compression, and we present a lineartime algorithm to simulate the bottomup traversal of a suffix tree with a suffix array combined with the longest common prefix information. 1
Universal Lossless Source Coding With the Burrows Wheeler Transform
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2002
"... The Burrows Wheeler Transform (BWT) is a reversible sequence transformation used in a variety of practical lossless sourcecoding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWTbased compression schemes ar ..."
Abstract

Cited by 44 (4 self)
 Add to MetaCart
(Show Context)
The Burrows Wheeler Transform (BWT) is a reversible sequence transformation used in a variety of practical lossless sourcecoding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWTbased compression schemes are widely touted as lowcomplexity algorithms giving lossless coding rates better than those of the ZivLempel codes (commonly known as LZ'77 and LZ'78) and almost as good as those achieved by prediction by partial matching (PPM) algorithms. To date, the coding performance claims have been made primarily on the basis of experimental results. This work gives a theoretical evaluation of BWTbased coding. The main results of this theoretical evaluation include: 1) statistical characterizations of the BWT output on both finite strings and sequences of length , 2) a variety of very simple new techniques for BWTbased lossless source coding, and 3) proofs of the universality and bounds on the rates of convergence of both new and existing BWTbased codes for finitememory and stationary ergodic sources. The end result is a theoretical justification and validation of the experimentally derived conclusions: BWTbased lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in ZivLempel style codes and, for some BWTbased codes, within a constant factor of the optimal rate of convergence for finitememory sources.
Modifications of the Burrows and Wheeler Data Compression Algorithm
 Proceedings of the ieee Data Compression Conference
, 1999
"... this paper we improve upon these previous results on the BWalgorithm. Based on the context tree model, we consider the specific statistical properties of the data at the output of the BWT. We describe six important properties, three of which have not been described elsewhere. These considerations l ..."
Abstract

Cited by 28 (4 self)
 Add to MetaCart
(Show Context)
this paper we improve upon these previous results on the BWalgorithm. Based on the context tree model, we consider the specific statistical properties of the data at the output of the BWT. We describe six important properties, three of which have not been described elsewhere. These considerations lead to modifications of the coding method, which in turn improve the coding efficiency. We shortly describe how to compute the BWT with low complexity in time and space, using suffix trees in two different representations. Finally, we present experimental results about the compression rate and running time of our method, and compare these results to previous achievements. More references on the methods described in this paper can be found in [1, 5].
The Context Trees of Block Sorting Compression
 IN PROCEEDINGS OF THE IEEE DATA COMPRESSION CONFERENCE, SNOWBIRD, UTAH, MARCH 30  APRIL 1
, 1998
"... The BurrowsWheeler transform (BWT)andblock sorting compression are closely related to the context trees of PPM. The usual approach of treating BWT as merely a permutation is not able to fully exploit this relation. We show that ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
The BurrowsWheeler transform (BWT)andblock sorting compression are closely related to the context trees of PPM. The usual approach of treating BWT as merely a permutation is not able to fully exploit this relation. We show that
Block Sorting Text Compression  Final Report
, 1996
"... A recent development in text compression is a "block sorting" algorithm which permutes the input text according to a special sort procedure and then processes the permuted text with MovetoFront and a final statistical compressor. The technique combines good speed with excellent compressi ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
A recent development in text compression is a "block sorting" algorithm which permutes the input text according to a special sort procedure and then processes the permuted text with MovetoFront and a final statistical compressor. The technique combines good speed with excellent compression performance. This report investigates the block sorting compression algorithm, in particular trying to understand its operation and limitations. Various approaches are investigated in an attempt to improve the compression with block sorting, most of which involve a hierarchy of coding models to allow fast adaptation to local contexts. The best technique involves a new "structured" coding model, especially designed for compressing data with skew symbol distributions. Block sorting compression is found to be related to work by Shannon in 1951 on the prediction of English text. The work confirms blocksorting as a good text compression technique, with a compression approaching that of the currently be...
Improvements to BurrowsWheeler Compression Algorithm
, 2000
"... In 1994 Burrows and Wheeler presented a new algorithm for lossless data compression. The compression ratio that can be achieved using their algorithm is comparable with the best known other algorithms, whilst its complexity is relatively small. In this paper we explain the internals of this algorith ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
In 1994 Burrows and Wheeler presented a new algorithm for lossless data compression. The compression ratio that can be achieved using their algorithm is comparable with the best known other algorithms, whilst its complexity is relatively small. In this paper we explain the internals of this algorithm and discuss its various modifications that have been presented so far. Then we propose new improvements for its effectiveness. They allow us for obtaining the compression ratio equal to 2.271 bpc for the Calgary Corpus files, which is the best result in the class of BurrowsWheeler Transform based algorithms.
Block Sorting and Compression
 Proceedings of the IEEE Data Compression Conference
, 1997
"... The Block Sorting Lossless Data Compression Algorithm (BSLDCA) described by Burrows and Wheeler [3] has received considerable attention. It achieves as good compression rates as contextbased methods, such as PPM, but at execution speeds closer to ZivLempel techniques [5]. This paper, describes the ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
The Block Sorting Lossless Data Compression Algorithm (BSLDCA) described by Burrows and Wheeler [3] has received considerable attention. It achieves as good compression rates as contextbased methods, such as PPM, but at execution speeds closer to ZivLempel techniques [5]. This paper, describes the Lexical Permutation Sorting Algorithm (LPSA), its theoretical basis, and delineates its relationship to BSLDCA. In particular we describe how BSLDCA can be reduced to LPSA and show how LPSA could give better results than BSLDCA when transmitting permutations. We also introduce a new technique, Inversion Frequencies, and show that it does as well as MovetoFront (MTF) Coding when there is locality of reference in the data. 1 Introduction Burrows and Wheeler [3], introduced a new algorithm, which they call the Block Sorting Lossless Data Compression Algorithm (BSLDCA). When applied to text or image data their algorithm achieves better compression rates than ZivLempel techniques with compa...
Pattern Matching in Compressed Text and Images
, 2001
"... Normally compressed data needs to be decompressed before it is processed, but if the compression has been done in the fight way, it is often possible to search the data without having to decompress it, or at least only partially decompress it. The problem can be divided into lossless and lossy c ..."
Abstract

Cited by 13 (11 self)
 Add to MetaCart
Normally compressed data needs to be decompressed before it is processed, but if the compression has been done in the fight way, it is often possible to search the data without having to decompress it, or at least only partially decompress it. The problem can be divided into lossless and lossy compression methods, and then in each of these cases the pattern matching can be either exact or inexact. Much work has been reported in the literature on techniques for all of these cases, including algorithms that are suitable for pattern matching for various compression methods, and compression methods designed specifically for pattern matching. This work is surveyed in this paper. The paper also exposes the important relationship between pattern matching and compression, and proposes some performance measures for compressed pattern matching algorithms. Ideas and directions for future work are also described.
LIPT: A Lossless Text Transform to improve compression
 In Proceedings of International Conference on Information and Theory : Coding and Computing, Las Vegas
, 2001
"... We propose an approach to develop a dictionary based reversible lossless text transformation, called LIPT (Length Index Preserving Transform), which can be applied to a source text to improve existing algorithm’s ability to compress. In LIPT, the length of the input word and the offset of the words ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
(Show Context)
We propose an approach to develop a dictionary based reversible lossless text transformation, called LIPT (Length Index Preserving Transform), which can be applied to a source text to improve existing algorithm’s ability to compress. In LIPT, the length of the input word and the offset of the words in the dictionary are denoted with alphabets. Our encoding scheme makes use of recurrence of same length of words in the English Language to create context in the transformed text that the entropy coders can exploit. LIPT achieves some compression at the preprocessing stage as well and retains enough context and redundancy for the compression algorithms to give better results. Bzip2 with LIPT gives 5.24 % improvement in average BPC over Bzip2 without LIPT, and PPMD with LIPT gives 4.46% improvement in average BPC over PPMD without LIPT, for our test corpus. 1.