## Universal Lossless Source Coding With the Burrows Wheeler Transform (2002)

### Cached

### Download Links

- [sensorweb.mit.edu]
- [www.ee.princeton.edu]
- [www.princeton.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | IEEE TRANSACTIONS ON INFORMATION THEORY |

Citations: | 39 - 4 self |

### BibTeX

@ARTICLE{Effros02universallossless,

author = {Michelle Effros and Karthik Visweswariah and Sanjeev R. Kulkarni and Sergio Verdú},

title = {Universal Lossless Source Coding With the Burrows Wheeler Transform},

journal = {IEEE TRANSACTIONS ON INFORMATION THEORY},

year = {2002},

volume = {48},

number = {5},

pages = {1061--1081}

}

### Years of Citing Articles

### OpenURL

### Abstract

The Burrows Wheeler Transform (BWT) is a reversible sequence transformation used in a variety of practical lossless source-coding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWT-based compression schemes are widely touted as low-complexity algorithms giving lossless coding rates better than those of the Ziv--Lempel codes (commonly known as LZ'77 and LZ'78) and almost as good as those achieved by prediction by partial matching (PPM) algorithms. To date, the coding performance claims have been made primarily on the basis of experimental results. This work gives a theoretical evaluation of BWT-based coding. The main results of this theoretical evaluation include: 1) statistical characterizations of the BWT output on both finite strings and sequences of length , 2) a variety of very simple new techniques for BWT-based lossless source coding, and 3) proofs of the universality and bounds on the rates of convergence of both new and existing BWT-based codes for finite-memory and stationary ergodic sources. The end result is a theoretical justification and validation of the experimentally derived conclusions: BWT-based lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in Ziv--Lempel style codes and, for some BWT-based codes, within a constant factor of the optimal rate of convergence for finite-memory sources.

### Citations

1210 | A universal algorithm for sequential data compression
- Ziv, Lempel
- 1977
(Show Context)
Citation Context ...hms with competing codes. Experimental results on algorithms using this transformation (e.g., [2], [3], [5]) indicate lossless coding rates better than those achieved by Ziv–Lempel-style codes (LZ��=-=�77 [8], -=-LZ’78 [9], and their descendants) but typically not quite as good as those achieved by the prediction by partial mapping (PPM) schemes described in works like [10], [11], [2]. BWT code implementatio... |

591 | A Block-sorting Lossless Data Compression Algorithm
- Burrows, Wheeler
- 1994
(Show Context)
Citation Context ...-memory sources. Index Terms—Burrows Wheeler Transform (BWT), rate of convergence, redundancy, text compression, universal noiseless source coding. I. INTRODUCTION THE Burrows Wheeler Transform (BWT=-=) [1]-=- is a slightly expansive reversible sequence transformation currently receiving considerable attention from researchers interested in Manuscript received July 16, 1999; revised December 27, 2001. This... |

564 |
A space-economical suffix tree construction algorithm
- McCreight
- 1976
(Show Context)
Citation Context ...emory requirements given here refer to the first (of several) implementations of the BWT described by Burrows and Wheeler in [1]. The chosen implementation uses the suffix tree algorithm described in =-=[36], -=-which achieves worst case complexity and memory results. The BWT achieves data expansion rather than data compression. How then do algorithms working in the BWT domain yield such good performance–co... |

352 | Data compression using adaptive coding and partial string matching
- Cleary, Witten
- 1984
(Show Context)
Citation Context ...ed by Ziv–Lempel-style codes (LZ’77 [8], LZ’78 [9], and their descendants) but typically not quite as good as those achieved by the prediction by partial mapping (PPM) schemes described in works=-= like [10], -=-[11], [2]. BWT code implementation yields complexity comparable to that of the Ziv–Lempel codes, which are significantly faster than algorithms like PPM [1], [2]. Early theoretical investigations of... |

295 |
Universal coding, information, prediction, and estimation
- Rissanen
- 1984
(Show Context)
Citation Context ...mality in universal lossless source coding [25]–[29]. For any class of sources smoothly parameterized by real numbers, the optimal rate of convergence of is proven achievable to within for almost al=-=l [27]-=-, [28]. This work focuses first on the problem of minimax universal lossless source coding for stationary finite-memory sources. A review of the class of unifilar, ergodic, finite-state-machine (FSM) ... |

204 | Arithmetic coding
- Rissanen, Langdon
- 1979
(Show Context)
Citation Context ... the boundaries to the decoder and then independently encoding the subsequences. A variety of codes may be used in coding the individual subsequence of . The algorithm used here is an arithmetic code =-=[44] w-=-ith a Krichevsky–Trofimov (KT) [25] probability model. The elegance, simplicity, and convergence properties of this sequential code motivate the choice. Given a probability model for symbols , the a... |

169 |
A universal data compression system
- Rissanen
- 1983
(Show Context)
Citation Context ...ial state , the conditional probability of string given is defined as where for all .sEFFROS et al.: UNIVERSAL LOSSLESS SOURCE CODING WITH THE BURROWS WHEELER TRANSFORM 1063 The class of FSMX sources =-=[30]-=-, also called finite-order FSM sources, is the subset of the class of FSM sources for which there exists an integer such that for every , the most recent symbols uniquely determine the state at time .... |

165 | The context-tree weighting method: Basic properties
- Willems, Shtarkov, et al.
- 1995
(Show Context)
Citation Context ...Table I summarizes the rates of convergence and complexities of the BWT-based source codes on finite-memory sources, comparing those results both to the corresponding bounds for LZ’77, LZ’78, and =-=CTW [49] a-=-nd to the optimal rate of convergence. While CTW, like the algorithms described in Theorems 1–3, requires complexity that grows only linearly with , that complexity has a hidden dependence on the me... |

153 |
Locally adaptive data compression scheme
- Bentley, Sleator, et al.
- 1986
(Show Context)
Citation Context ...on of move-to-front coding follows. The idea behind move-to-front coding appears in a variety of works under a variety of names, including the “book stack” codes of [39], the “move-to-front” c=-=odes of [40], [41], -=-and the “interval” and “recency ranking” codes of [42]. In each case, the description length of a particular symbol or word depends on the recency of its last appearance. Symbols used more rec... |

134 |
Trofimov, “The performance of universal encoding
- Krichevsky, K
- 1981
(Show Context)
Citation Context ...versal lossless source codes. Rissanen and others extend Davisson’s results for finitely parameterized sources and quantify the condition of secondorder optimality in universal lossless source codin=-=g [25]��-=-�[29]. For any class of sources smoothly parameterized by real numbers, the optimal rate of convergence of is proven achievable to within for almost all [27], [28]. This work focuses first on the prob... |

123 | Implementing the PPM data compression scheme
- Moffat
- 1990
(Show Context)
Citation Context ...Ziv–Lempel-style codes (LZ’77 [8], LZ’78 [9], and their descendants) but typically not quite as good as those achieved by the prediction by partial mapping (PPM) schemes described in works like =-=[10], [11], -=-[2]. BWT code implementation yields complexity comparable to that of the Ziv–Lempel codes, which are significantly faster than algorithms like PPM [1], [2]. Early theoretical investigations of BWT-b... |

115 | Unbounded length contexts for PPM - Cleary, Teahan - 1997 |

93 |
Universal noiseless coding
- Davisson
- 1973
(Show Context)
Citation Context ...e referred to by their redundancy functions ,isaweakly minimax universal lossless source code on if for each and a strongly minimax universal lossless source code on if that convergence is uniform in =-=[24]-=-. This work focuses primarily on minimax universal lossless source coding. The redundancy results derived in this work are, however, all achieved by first finding deterministic bounds on the source co... |

72 |
A universal finite memory source
- Weinberger, Rissanen, et al.
- 1995
(Show Context)
Citation Context .... This condition is both restrictive [31] and unnecessary for this work. As a result, the restriction is dropped, yielding a class of generalized FSMX sources, here called finite-memory sources after =-=[32]. Fo-=-r any finite-memory source, there exists a minimum suffix set of strings from and an integer such that and for all The state variables are variable-length strings describing the finite “context” o... |

45 | A vector quantization approach to universal noiseless coding and quantization
- Chou, Effros, et al.
- 1996
(Show Context)
Citation Context ...l lossless source codes. Rissanen and others extend Davisson’s results for finitely parameterized sources and quantify the condition of secondorder optimality in universal lossless source coding [25=-=]–[29]-=-. For any class of sources smoothly parameterized by real numbers, the optimal rate of convergence of is proven achievable to within for almost all [27], [28]. This work focuses first on the problem o... |

41 | A fast block-sorting algorithm for lossless data compression. Data Compression Conference
- Schindler
- 1997
(Show Context)
Citation Context ..., maximal sort lengths are sometimes imposed, with ties broken based on position in the original string. Descriptions of some of these variations and their performances appear in works like [1], [33]�=-=��[35]-=-, [7], [4]. While the choice of sorting technique used in any practical implementation should depend on the system priorities for that application, for the sake of simplicity, complexity and memory re... |

33 |
A sequential algorithm for the universal coding of finite memory sources,” submitted to
- Weinberger, Lempel, et al.
(Show Context)
Citation Context ...ing . FSMX sources inherit from FSM sources the condition that the current state is a function only of the current source symbol and the previous state ( for all ). This condition is both restrictive =-=[31]-=- and unnecessary for this work. As a result, the restriction is dropped, yielding a class of generalized FSMX sources, here called finite-memory sources after [32]. For any finite-memory source, there... |

29 |
Data compression with the Burrows-Wheeler transform
- Nelson
- 1996
(Show Context)
Citation Context ...y speaking, the BWT shifts the source redundancy caused by memory to a redundancy caused by a nonequiprobable and nonstationary first-order distribution. Early BWT-based codes (e.g., [1], [37], [33], =-=[34]-=-) capitalize on the observation that the BWT tends to group together long strings of like characters (see, for example, Fig. 1), thereby producing a string that is more easily compressed than the orig... |

28 |
Coding for a binary independent piecewise-identically-distributed source
- Willems
- 1996
(Show Context)
Citation Context ... yielding bits per symbol for any and suggests that the result generalizes from two subsequences to subsequences to give Unfortunately, the algorithmic complexity grows exponentially with for unknown =-=[46]-=-. In [46], Willems suggests two alternative sequential algorithms. The algorithms differ in their performances and their complexities, giving where the minima are both taken with respect to the choice... |

26 |
Redundancy of the Lempel-Ziv Incremental Parsing Rule
- Savari
- 1997
(Show Context)
Citation Context ...rom a finite-memory source, the performance of the best BWT-based codes converges to the optimal performance at a rate of , surpassing the convergence of LZ’77 [21] and the convergence of LZ’78 [2=-=2], [23] a-=-nd the variation of LZ’77 given in [21]. This convergence comes within a constant factor of the optimal rate of convergence for finite-memory sources. Note that many of the codes considered here use... |

26 |
Block sorting text compression
- Fenwick
- 1996
(Show Context)
Citation Context ...rther, maximal sort lengths are sometimes imposed, with ties broken based on position in the original string. Descriptions of some of these variations and their performances appear in works like [1], =-=[33]��-=-�[35], [7], [4]. While the choice of sorting technique used in any practical implementation should depend on the system priorities for that application, for the sake of simplicity, complexity and memo... |

25 | A fast algorithm for making suffix arrays and for Burrows–Wheeler transformation
- Sadakane
- 1998
(Show Context)
Citation Context ...ort lengths are sometimes imposed, with ties broken based on position in the original string. Descriptions of some of these variations and their performances appear in works like [1], [33]–[35], [7]=-=, [4]-=-. While the choice of sorting technique used in any practical implementation should depend on the system priorities for that application, for the sake of simplicity, complexity and memory requirements... |

24 | Low-complexity sequential lossless coding for piecewise-stationary memoryless sources
- Shamir, Merhav
- 1999
(Show Context)
Citation Context ...ing where the minima are both taken with respect to the choice of and with . The space complexities of the two algorithms grow more slowly than the time complexities, which are and , respectively. In =-=[47]-=-, Shamir and Merhav describe an algorithm giving The space and time complexity of their algorithm is . Even though the results in Merhav [45], Willems [46], and Shamir and Merhav [47] are for p.i.i.d.... |

24 |
Universal Redundancy Rates Do Not Exist
- Shields
- 1993
(Show Context)
Citation Context ... and a lower bound on the optimal per-symbol description length for sequence length on the same distribution. The difference between and equals which does not vary with the algorithm in operation. In =-=[48]-=-, Shields proves that for any function such that , there exists a source in the class of stationary ergodic sources such that Thus, there do not exist general bounds on (or, consequently, ) for the cl... |

21 | The Context Trees of Block Sorting Compression
- Larsson
- 1998
(Show Context)
Citation Context ...to BWT-based compression algorithms has focused on experimental comparisons of BWT-based algorithms with competing codes. Experimental results on algorithms using this transformation (e.g., [2], [3], =-=[5]) indi-=-cate lossless coding rates better than those achieved by Ziv–Lempel-style codes (LZ’77 [8], LZ’78 [9], and their descendants) but typically not quite as good as those achieved by the prediction ... |

21 | On the average redundancy rate of the Lempel-Ziv code
- Louchard, Szpankowski
- 1997
(Show Context)
Citation Context ...rawn from a finite-memory source, the performance of the best BWT-based codes converges to the optimal performance at a rate of , surpassing the convergence of LZ’77 [21] and the convergence of LZ��=-=�78 [22], -=-[23] and the variation of LZ’77 given in [21]. This convergence comes within a constant factor of the optimal rate of convergence for finite-memory sources. Note that many of the codes considered he... |

20 | Minimax noiseless universal coding for Markov sources - Davisson - 1983 |

16 | Higher compression from the Burrows-Wheeler transform by modified sorting
- Chapin, Tate
- 1998
(Show Context)
Citation Context ...icated by M. Weinberger, Associate Editor for Source Coding. Publisher Item Identifier S 0018-9448(02)02800-6. 0018-9448/02$17.00 © 2002 IEEE practical lossless data compression algorithms (e.g., [2]=-=–[7]-=-). To date, the majority of research devoted to BWT-based compression algorithms has focused on experimental comparisons of BWT-based algorithms with competing codes. Experimental results on algorithm... |

15 |
of individual sequences via variable-rate coding
- “Compression
- 1978
(Show Context)
Citation Context ...mpeting codes. Experimental results on algorithms using this transformation (e.g., [2], [3], [5]) indicate lossless coding rates better than those achieved by Ziv–Lempel-style codes (LZ’77 [8], LZ=-=’78 [9]-=-, and their descendants) but typically not quite as good as those achieved by the prediction by partial mapping (PPM) schemes described in works like [10], [11], [2]. BWT code implementation yields co... |

14 | On the minimum description length principle for sources with piecewise constant parameters
- Merhav
- 1993
(Show Context)
Citation Context ...requirements of the code are again . E. Coding for Piecewise-Constant Parameters Next, consider coding the BWT’s output using a code designed for data sequences with piecewise-constant parameters. I=-=n [45]-=-, Merhav considers the problem of universal lossless coding for sources with piecewise-constant parameters, considering both upper and lower bounds on coding performance. The achievability argument gi... |

12 |
Improved redundancy of a version of the Lempel-Ziv algorithm
- Wyner, Wyner
- 1995
(Show Context)
Citation Context ... MAY 2002 On sequences of length drawn from a finite-memory source, the performance of the best BWT-based codes converges to the optimal performance at a rate of , surpassing the convergence of LZ’7=-=7 [21] and-=- the convergence of LZ’78 [22], [23] and the variation of LZ’77 given in [21]. This convergence comes within a constant factor of the optimal rate of convergence for finite-memory sources. Note th... |

9 | Improvements to the Block Sorting Text Compression Algorithm
- FENWICK
- 1995
(Show Context)
Citation Context ...offs? Roughly speaking, the BWT shifts the source redundancy caused by memory to a redundancy caused by a nonequiprobable and nonstationary first-order distribution. Early BWT-based codes (e.g., [1], =-=[37]-=-, [33], [34]) capitalize on the observation that the BWT tends to group together long strings of like characters (see, for example, Fig. 1), thereby producing a string that is more easily compressed t... |

8 |
Asymptotic optimality of the block sorting data compression algorithm
- Arimura, Yamamoto
- 1998
(Show Context)
Citation Context ... common context is random. Sadakane notes, however, that “the permutation in the BWT is not completely random” but conjectures that the proposed algorithms work for BWT-transformed data sequences.=-= In [15]��-=-�[17], Arimura and Yamamoto present a sequence of information-theoretic results on BWT-based source coding, demonstrating the universality of BWT-based codes for finite memory and stationary totally e... |

6 | Lexical permutation sorting algorithm
- Arnavut, Magliveras
- 1997
(Show Context)
Citation Context ...oted to BWT-based compression algorithms has focused on experimental comparisons of BWT-based algorithms with competing codes. Experimental results on algorithms using this transformation (e.g., [2], =-=[3], [5])-=- indicate lossless coding rates better than those achieved by Ziv–Lempel-style codes (LZ’77 [8], LZ’78 [9], and their descendants) but typically not quite as good as those achieved by the predic... |

6 | Text compression using recency rank with context and relation to context sorting, block sorting and PPM
- Sadakane
- 1997
(Show Context)
Citation Context ...l codes, which are significantly faster than algorithms like PPM [1], [2]. Early theoretical investigations of BWT-based algorithms include the work of Sadakane, Ariumura and Yamamoto, and Effros. In =-=[12]-=-, [13], Sadakane considers the performance of source codes based on a variant of the BWT described in [14] and states that codes based on block sorting are asymptotically optimal for finite-order Mark... |

3 | Output distribution of the Burrows-Wheeler transform
- Visweswariah, Kulkarni, et al.
- 2000
(Show Context)
Citation Context ...BWT-based universal codes. This paper combines the aforementioned results by Effros with the asymptotic analyses of convergence rate and output statistics derived by Visweswariah, Kulkarni, and Verdú=-= [19]-=-, [20] and a nonasymptotic analysis of the BWT output statistics by Effros. The key results are: statistical characterizations of the BWT output for both finite strings and sequences of length , a pro... |

3 |
On tree sources, finite state machines and time reversal; or, how does the tree look from the other side? to be presented at the 1995
- Seroussi, Weinberger
- 1995
(Show Context)
Citation Context ...hile the idea of reversing the data string prior to transformation is conceptually useful, string reversal is not necessary to obtain an equation of the form given in (1). This assertion follows from =-=[38]-=-, which proves that the time reversal of any finite-memory source yields another finite-memory source. As a result, for any data sequence drawn from a stationary finite-memory distribution for which t... |

3 |
Interval and recency rank source coding: Two on line adaptive variable-length schemes
- Elias
- 1987
(Show Context)
Citation Context ...ont coding appears in a variety of works under a variety of names, including the “book stack” codes of [39], the “move-to-front” codes of [40], [41], and the “interval” and “recency rank=-=ing” codes of [42]-=-. In each case, the description length of a particular symbol or word depends on the recency of its last appearance. Symbols used more recently get shorter descriptions than symbols used less recently... |

2 | sure convergence coding theorem for block sorting data compression - “Almost - 1998 |

2 | On the performance of recency-rank and block-sorting universal lossless data compression algorithms - Muramatsu - 2002 |

1 |
Block sorting transformations
- Arnavut, Leavitt, et al.
- 1998
(Show Context)
Citation Context ...engths, the last column is the only column that yields a reversible transformation. These observations together motivate a variety of alternatives to the BWT, such as the algorithms described in [3], =-=[6]-=-, where modifications in the table generation techniques allow for use of earlier table columns. While the argument that the last column of the BWT table has the least impact on the ordering of the ta... |

1 |
optimality of variants of the block sorting compression
- “On
- 1998
(Show Context)
Citation Context ...s, which are significantly faster than algorithms like PPM [1], [2]. Early theoretical investigations of BWT-based algorithms include the work of Sadakane, Ariumura and Yamamoto, and Effros. In [12], =-=[13]-=-, Sadakane considers the performance of source codes based on a variant of the BWT described in [14] and states that codes based on block sorting are asymptotically optimal for finite-order Markov sou... |

1 |
Information theoretic analyzes of block sorting data compression method
- Arimura
- 1999
(Show Context)
Citation Context ...on context is random. Sadakane notes, however, that “the permutation in the BWT is not completely random” but conjectures that the proposed algorithms work for BWT-transformed data sequences. In [=-=15]–[17]-=-, Arimura and Yamamoto present a sequence of information-theoretic results on BWT-based source coding, demonstrating the universality of BWT-based codes for finite memory and stationary totally ergodi... |

1 |
Topics in the analysis of universal compression algorithms
- Visweswariah
- 2000
(Show Context)
Citation Context ...sed universal codes. This paper combines the aforementioned results by Effros with the asymptotic analyses of convergence rate and output statistics derived by Visweswariah, Kulkarni, and Verdú [19],=-= [20]-=- and a nonasymptotic analysis of the BWT output statistics by Effros. The key results are: statistical characterizations of the BWT output for both finite strings and sequences of length , a proof of ... |

1 |
Book stack data compression,” Probl
- Ryabko
- 1980
(Show Context)
Citation Context ...cribing the BWT output, a description of move-to-front coding follows. The idea behind move-to-front coding appears in a variety of works under a variety of names, including the “book stack” codes=-= of [39], the “mov-=-e-to-front” codes of [40], [41], and the “interval” and “recency ranking” codes of [42]. In each case, the description length of a particular symbol or word depends on the recency of its las... |

1 |
locally adaptive data compression scheme
- “A
- 1986
(Show Context)
Citation Context ...move-to-front coding follows. The idea behind move-to-front coding appears in a variety of works under a variety of names, including the “book stack” codes of [39], the “move-to-front” codes o=-=f [40], [41], and th-=-e “interval” and “recency ranking” codes of [42]. In each case, the description length of a particular symbol or word depends on the recency of its last appearance. Symbols used more recently ... |