## An O(n) semi-predictive universal encoder via the BWT (2004)

Venue: | IEEE Trans. Inform. Theory |

Citations: | 8 - 2 self |

### BibTeX

@ARTICLE{Baron04ano(n),

author = {Dror Baron and Yoram Bresler},

title = {An O(n) semi-predictive universal encoder via the BWT},

journal = {IEEE Trans. Inform. Theory},

year = {2004},

volume = {50},

pages = {928--937}

}

### OpenURL

### Abstract

We provide an O(N) algorithm for a non-sequential semi-predictive encoder whose pointwise redundancy with respect to any (unbounded depth) tree source is O(1) bits per state above Rissanen’s lower bound. This is achieved by using the Burrows Wheeler transform (BWT), an invertible permutation transform that has been suggested for lossless data compression. First, we use the BWT only as an efficient computational tool for pruning context trees, and encode the input sequence rather than the BWT output. Second, we estimate the minimum description length (MDL) source by incorporating suffix tree methods to construct the unbounded depth context tree that corresponds to the input sequence in O(N) time. Third, we point out that a variety of previous source coding methods required superlinear complexity for determining which tree source state generated each of the symbols of the input. We show how backtracking from the BWT output to the input sequence enables to solve this problem in O(N) worst-case complexity.

### Citations

9524 |
Elements of Information Theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ... by a stationary ergodic source whose (per-symbol) entropy for length-N blocks is HN, the expected redundancy ρ is defined as ρ � Ex[l(x)] − NHN, where l(x) is the length of a uniquely decodable code =-=[1]-=- for x, and Ex[·] denotes expectation over all length-N inputs. Rissanen [2] proved that for a source with K (unknown) parameters over a compact set, a lower bound on the expected redundancy is ρ ≥ K ... |

9306 | Introduction to algorithms
- Cormen, Leiserson, et al.
- 1992
(Show Context)
Citation Context ...ies p(α|s) for each state s ∈ S and each symbol α ∈ X; we say that s generates symbols following it. Because S is complete and proper, the sequences of S can be arranged as leaves on an |X |-ary tree =-=[28]-=- (Figure 1); the unique state s that generated xi can be determined by entering the tree at the root, first choosing branch xi−1, then branch xi−2, and so on, until some leaf s is encountered. Let D �... |

620 | A block-sorting lossless data compression algorithm
- Burrows, Wheeler
- 1994
(Show Context)
Citation Context ...thms featuring fast computation and low memory use, while providing compression near (1) and (2). The recent interest in the Burrows Wheeler transform (BWT) can be understood in this context. The BWT =-=[17]-=- is an invertible permutation transform that has been suggested for data compression [13,14,17–22]. It has attracted intense research interest because it achieves compression results near the state of... |

577 |
A space-economical suffix tree construction algorithm
- McCreight
- 1976
(Show Context)
Citation Context ...gorithm, which will be described in Section V, relies on the properties of prefix trees. Furthermore, with prefix tree constructions the BWT output y can be computed in O(N) time. Following McCreight =-=[23]-=-, a prefix tree T that corresponds to a sequence x (Figure 2) contains internal nodes and leaves. Nodes are connected by arcs, where each arc is labeled by some nonempty sequence in X ∗ . Each interna... |

549 |
Stochastic complexity
- Rissanen
- 1989
(Show Context)
Citation Context ...e semi-predictive encoder [8–12]. Our BWT-MDL method can also be used to provide a tradeoff between memory use in the decoder and compression quality. MDL enjoys various optimal asymptotic properties =-=[38]-=-, hence BWT-MDL can be used for universal estimation, classification, and other problems in statistical inference. 20sIn particular, because our method is an asymptotically optimal estimator of tree s... |

354 | On-line construction of suffix trees
- Ukkonen
- 1995
(Show Context)
Citation Context ...r, each leaf corresponds to a prefix. McCreight [23] suggested an O(N) nonsequential prefix tree construction algorithm. Recent work relating to prefix tree methods has considered sequential variants =-=[22,24,25]-=-, and refinements of the data structure for reducing the memory use [26]. Prefix trees can also be used to construct the BWT output [13,22]. By sorting sibling arcs according to their last symbol, the... |

305 |
Universal coding, information, prediction, and estimation
- Rissanen
- 1984
(Show Context)
Citation Context ...ks is HN, the expected redundancy ρ is defined as ρ � Ex[l(x)] − NHN, where l(x) is the length of a uniquely decodable code [1] for x, and Ex[·] denotes expectation over all length-N inputs. Rissanen =-=[2]-=- proved that for a source with K (unknown) parameters over a compact set, a lower bound on the expected redundancy is ρ ≥ K (1 − ɛ) log(N) (1) 2 bits (log(·) denotes the base-two logarithm), for any ɛ... |

271 |
Stochastic complexity and modeling
- Rissanen
- 1986
(Show Context)
Citation Context ...S ∗ is within O(1) bits per state of the pointwise redundancy bound (2). This result was proved for CTW-MDL [11, Theorem 1], and also applies to Nohre’s method [12] and our BWT-MDL method. Theorem 1 (=-=[9,11]-=-): If the natural code is used to describe the MDL source S ∗ , then the pointwise redundancy ρ(x) of the semi-predictive approach over the ML entropy of the input sequence x w.r.t. S ∗ satisfies ρ(x)... |

184 |
A Universal Data Compression System
- Rissanen
- 1983
(Show Context)
Citation Context ...ees A context tree is a data structure that stores information about contexts preceding symbols in the input sequence x. Context trees have been used in a variety of source coding methods in the past =-=[4,5,7]-=-, and are also related to the basic version of the new BWT-MDL algorithm. In contrast to prefix trees, where labels of arcs are sequences in X ∗ , the labels of the arcs of a context tree are symbols ... |

173 | The contexttree weighting method: Basic properties - Willems, Shtarkov, et al. - 1995 |

138 |
Trofimov, "The performance of universal encoding
- Krichevskii, K
- 1981
(Show Context)
Citation Context ...rsion. In the worst case, the recursion depth is O(N), using O(N) words of memory. Phase II needs to store the array U, along with symbol counts {n α s (i)}α∈X, ∀s ∈ S for the sequential KT estimator =-=[33]-=-. With our computational model, U uses O(N) words of memory, and the symbol counts use O(| ˆ S|) words of memory. Because | ˆ S| = O(N) (see the proof of Theorem 2), the memory use of Phase II is O(N)... |

123 | Reducing the space requirements of suffix trees
- Kurtz
- 1998
(Show Context)
Citation Context ...ential prefix tree construction algorithm. Recent work relating to prefix tree methods has considered sequential variants [22,24,25], and refinements of the data structure for reducing the memory use =-=[26]-=-. Prefix trees can also be used to construct the BWT output [13,22]. By sorting sibling arcs according to their last symbol, the prefixes corresponding to the leaves are lexicographically sorted. Ther... |

114 |
Universal modeling and coding
- Rissanen, Langdon
- 1984
(Show Context)
Citation Context ... X are estimated with a Krichevsky-Trofimov (KT) [33] estimator as p � α � � � β {ns (i)}β∈X = |X | 2 n α s (i) + 1 2 + � β∈X nβ s(i) , α ∈ X. (6) These probabilities are fed into an arithmetic coder =-=[1,6,35]-=-, which encodes x sequentially. Let ns(i) � � α∈X nα s (i), nα s � nα s (N + 1), and ns � ns(N + 1). We define the ML conditional probabilities as p α s � nα s ns , α ∈ X, and the ML entropy as ˆ Hs �... |

79 | S.: From Ukkonen to McCreight and Weiner: a unifying view of linear-time suffix tree construction
- Giegerich, Kurtz
- 1997
(Show Context)
Citation Context ...r, each leaf corresponds to a prefix. McCreight [23] suggested an O(N) nonsequential prefix tree construction algorithm. Recent work relating to prefix tree methods has considered sequential variants =-=[22,24,25]-=-, and refinements of the data structure for reducing the memory use [26]. Prefix trees can also be used to construct the BWT output [13,22]. By sorting sibling arcs according to their last symbol, the... |

74 |
A universal finite memory source
- Weinberger, Rissanen, et al.
- 1995
(Show Context)
Citation Context ...ɛ > 0, except for a set of inputs whose probability vanishes as N → ∞. Furthermore, they proved existence of sequential uniquely decodable codes that attain this lower bound. For context tree sources =-=[4]-=-, the redundancy bounds have been achieved by several families of source coding methods. Sequentially encoding the next symbol according to the source that is optimal for the portion of the input sequ... |

52 | K.: Faster suffix sorting - Larsson, Sadakane - 1999 |

42 | Universal lossless source coding with the Burrows Wheeler transform
- Effros, Visweswariah, et al.
- 2002
(Show Context)
Citation Context ...the sequence xi, xi+1, . . .,xj where xk ∈ X for i ≤ k ≤ j. Consider length-N input sequences x = x N 1 , i.e., x ∈ X N . Let X ∗ denote the set of finitelength sequences over X. Define a tree source =-=[4,6,15,18,20]-=- as a finite set of sequences called states S ⊂ X ∗ that is complete and proper (completeness implies that any semiinfinite sequence has a suffix in S; properness implies that there are no two sequenc... |

41 |
Optimal sequential probability assignment for individual sequences
- Weinberger, Merhav, et al.
- 1994
(Show Context)
Citation Context ...lso known as the empirical entropy) of x w.r.t. C, i.e., the minimum over models in C of the entropy conditioned on the model, with parameters set to their ML estimates. Weinberger, Merhav, and Feder =-=[3]-=- sharpened Rissanen’s result from a probabilistic setup to individual sequences, and proved that, for a source with K unknown parameters, any sequential uniquely decodable code satisfies ρ(x) ≥ K (1 −... |

40 | The context-tree weighting method: Extensions
- Willems
- 1998
(Show Context)
Citation Context ...–14]. Semi-predictive algorithms achieve the redundancy bounds (1) and (2), but are non-sequential. Finally, Context tree weighting (CTW) computes the mixture of all context tree sources sequentially =-=[15,16]-=-. The mixture approach achieves the redundancy bounds (1) and (2) for any input. With the quest for minimum redundancy essentially over, other aspects gain in both theoretical and practical importance... |

38 | Extended application of suffix trees to data compression - Larsson - 1996 |

34 |
A sequential algorithm for the universal coding of finitememory sources
- Weinberger, Lempel, et al.
- 1992
(Show Context)
Citation Context ...the sequence xi, xi+1, . . .,xj where xk ∈ X for i ≤ k ≤ j. Consider length-N input sequences x = x N 1 , i.e., x ∈ X N . Let X ∗ denote the set of finitelength sequences over X. Define a tree source =-=[4,6,15,18,20]-=- as a finite set of sequences called states S ⊂ X ∗ that is complete and proper (completeness implies that any semiinfinite sequence has a suffix in S; properness implies that there are no two sequenc... |

33 | On the performance of BWT sorting algorithms - Seward |

30 | Structures of String Matching and Data Compression
- Larsson
- 1999
(Show Context)
Citation Context ... the coding length within O(1) of the true optimum. Then, Section V-B discusses the computational complexity of Basic BWT-MDL. Section V-C presents a modified O(N) algorithm based on Larsson’s method =-=[13,14]-=-, which we call the BWT-MDL algorithm. Section V-D lays out the memory use of BWT-MDL, and Section V-E shows how our method can be used to provide a tradeoff between memory use in the decoder and comp... |

29 |
Data compression with the Burrows-Wheeler transform
- Nelson
- 1996
(Show Context)
Citation Context ...ted for data compression [13,14,17–22]. It has attracted intense research interest because it achieves compression results near the state of the art while being more efficient in terms of computation =-=[13,17,19,21,22]-=-. Previous practitioners and researchers ran the BWT 2son input sequences and compressed the BWT output directly [13,17–21]. Because the BWT output distribution is similar to piecewise i.i.d. (PIID) [... |

16 |
A stochastic approach to the gamma function
- Gordon
- 1994
(Show Context)
Citation Context ...le. For large t, i.e., t ≥ 3√ N, we approximate the gamma function with a form of Stirling’s formula, � � log ˆΓ(t) � 1 � log(2π) + t − 2 1 � � log(t) − t − 2 1 � log(e). (11) 12t According to Gordon =-=[37]-=-, 1 1 − < ln 12t 360t3 It follows that the accuracy of (11) is 0 < log(e) 360(t + 1 8 For large t the error is at most log(e) 360N . � Γ(t) √ t− 2πt 1 2e−t � � � < log ˆΓ(t) )3 15 < 1 12t − 1 360(t + ... |

14 |
Some Topics in Descriptive Complexity
- Nohre
- 1994
(Show Context)
Citation Context ...st lS −log(pS(x)). However, because all of x needs to be processed in Phase I before the encoding begins in Phase II, semi-predictive source coding methods are non-sequential. Related methods - Nohre =-=[12]-=- provided a method that estimates the globally optimal MDL tree source S ∗ by pruning the context tree with dynamic programming. His method uses the natural code (see also [15]) to describe the estima... |

14 | On the minimum description length principle for sources with piecewise constant parameters
- Merhav
- 1993
(Show Context)
Citation Context ...ies close to the log(N) bits implied by Rissanen’s bound [2], 2 because PIID methods require log(N) bits per transition between segments, in addition to the 1 log(N) bits per parameter in (1) and (2) =-=[34]-=-. 2 Larsson [13,14] provided a semi-predictive method for compressing the BWT output. He estimates the MDL tree source by ˜ S, encodes the structure of ˜ S and the segment lengths in y, and then encod... |

14 | Linear time universal coding and time reversal of tree sources via fsm closure. Information Theory
- Martin, Seroussi, et al.
- 2004
(Show Context)
Citation Context ...rlinear aggregate complexity. Although we have provided an O(N) solution to this problem in the BWT-MDL encoder, we have not developed an O(N) complexity decoder. We refer the reader to Martin et al. =-=[36]-=- for an O(N) decoder, and concentrate on the encoder in the remainder of the paper. Context depths - To simplify the exposition, in this section we consider all tree sources whose depth is up to Dmax.... |

12 |
Twice–universal coding,” Problems of Information Transmission
- Ryabko
- 1984
(Show Context)
Citation Context ...ents for symbols generated by s to an arithmetic encoder. The Decoder first determines ˆ S, and afterwards uses it to decode x sequentially. The semi-predictive approach was first described by Ryabko =-=[8]-=-. If the actual tree source S that generated x belongs to the class C being considered during the coding length minimization, then the resulting coding length is at most lS −log(pS(x)). However, becau... |

12 | Universal Data Compression Based on the Burrows–Wheeler Transformation: Theory and Practice
- Balkenhol, Kurtz
(Show Context)
Citation Context ...ted for data compression [13,14,17–22]. It has attracted intense research interest because it achieves compression results near the state of the art while being more efficient in terms of computation =-=[13,17,19,21,22]-=-. Previous practitioners and researchers ran the BWT 2son input sequences and compressed the BWT output directly [13,17–21]. Because the BWT output distribution is similar to piecewise i.i.d. (PIID) [... |

9 | PPM performance with BWT Complexity: A fast and effective data compression algorithm
- Effros
(Show Context)
Citation Context ...ted for data compression [13,14,17–22]. It has attracted intense research interest because it achieves compression results near the state of the art while being more efficient in terms of computation =-=[13,17,19,21,22]-=-. Previous practitioners and researchers ran the BWT 2son input sequences and compressed the BWT output directly [13,17–21]. Because the BWT output distribution is similar to piecewise i.i.d. (PIID) [... |

8 | Fast universal coding with context models - Rissanen - 1999 |

7 | A Study of the Context Tree maximizing method
- Volf, Willems
(Show Context)
Citation Context ...mbined with a data structure that Nohre proposed, leads to O(N + |X | Dmax ) complexity, which is superlinear in N. The semi-predictive approach was also studied by additional authors. Willems et al. =-=[10,11]-=- obtained a semi-predictive method by modifying CTW [15]; we call their method CTW-MDL. For tree sources up to some maximal depth Dmax, CTW-MDL requires O(NDmax) operations, which is superlinear in N.... |

2 | Tree Source Identification with the Burrows Wheeler Transform
- Baron, Bresler
- 2000
(Show Context)
Citation Context ...the sequence xi, xi+1, . . .,xj where xk ∈ X for i ≤ k ≤ j. Consider length-N input sequences x = x N 1 , i.e., x ∈ X N . Let X ∗ denote the set of finitelength sequences over X. Define a tree source =-=[4,6,15,18,20]-=- as a finite set of sequences called states S ⊂ X ∗ that is complete and proper (completeness implies that any semiinfinite sequence has a suffix in S; properness implies that there are no two sequenc... |

2 |
Fast Parallel Algorithms for Universal Lossless Source Coding
- Baron
- 2003
(Show Context)
Citation Context ...ree source model. Under some mixing conditions, it can be proved that Dmax(N) must grow logarithmically with N in order to process the entire context tree for “typical” [1] input sequences (see Baron =-=[32]-=- and references therein). Therefore, for “typical” inputs, a tree of depth O(log(N)) will contain all the relevant information about the input. G. Computational model Our computational model makes the... |

1 |
Larsson,“The Context Trees of Block Sorting Compression
- J
- 1998
(Show Context)
Citation Context ... a semi-predictive method by modifying CTW [15]; we call their method CTW-MDL. For tree sources up to some maximal depth Dmax, CTW-MDL requires O(NDmax) operations, which is superlinear in N. Larsson =-=[13,14]-=- provided a semi-predictive method for compressing the BWT output. More details about Larsson’s method will be given in Section III-D. Redundancy results - When the natural code (see Section IV-A) is ... |