Results 1  10
of
38
Reducing the Space Requirement of Suffix Trees
 Software – Practice and Experience
, 1999
"... We show that suffix trees store various kinds of redundant information. We exploit these redundancies to obtain more space efficient representations. The most space efficient of our representations requires 20 bytes per input character in the worst case, and 10.1 bytes per input character on average ..."
Abstract

Cited by 120 (11 self)
 Add to MetaCart
(Show Context)
We show that suffix trees store various kinds of redundant information. We exploit these redundancies to obtain more space efficient representations. The most space efficient of our representations requires 20 bytes per input character in the worst case, and 10.1 bytes per input character on average for a collection of 42 files of different type. This is an advantage of more than 8 bytes per input character over previous work. Our representations can be constructed without extra space, and as fast as previous representations. The asymptotic running times of suffix tree applications are retained. Copyright © 1999 John Wiley & Sons, Ltd. KEY WORDS: data structures; suffix trees; implementation techniques; space reduction
Offline compression by greedy textual substitution
 PROC. IEEE
, 2000
"... Greedy offline textual substitution refers to the following approach to compression or structural inference. Given a long textstring x, a substring w is identified such that replacing all instances of w in x except one by a suitable pair of pointers yields the highest possible contraction of x; the ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
Greedy offline textual substitution refers to the following approach to compression or structural inference. Given a long textstring x, a substring w is identified such that replacing all instances of w in x except one by a suitable pair of pointers yields the highest possible contraction of x; the process is then repeated on the contracted textstring until substrings capable of producing contractions can no longer be found. This paper examines computational issues arising in the implementation of this paradigm and describes some applications and experiments.
Semantically Motivated Improvements for PPM Variants
 The Computer Journal
, 1997
"... This paper explains how to significantly improve the compression performance of any PPM variant ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
(Show Context)
This paper explains how to significantly improve the compression performance of any PPM variant
A Fast Algorithm for Making Suffix Arrays and for BurrowsWheeler Transformation
 IN PROCEEDINGS OF THE IEEE DATA COMPRESSION CONFERENCE, SNOWBIRD, UTAH, MARCH 30  APRIL 1
, 1998
"... We propose a fast and memory efficient algorithm for sorting suffixes of a text in lexicographic order. It is important to sort suffixes because an arrayof indexes of suffixes is called suffix array and it is a memory efficient alternative of the suffix tree. Sorting suffixes is also used for the ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
We propose a fast and memory efficient algorithm for sorting suffixes of a text in lexicographic order. It is important to sort suffixes because an arrayof indexes of suffixes is called suffix array and it is a memory efficient alternative of the suffix tree. Sorting suffixes is also used for the BurrowsWheeler transformation in the Block Sorting text compression, therefore fast sorting algorithms are desired. We compare
A Theoretical and Experimental Study on the Construction of Suffix Arrays in External Memory
"... ..."
The Context Trees of Block Sorting Compression
 IN PROCEEDINGS OF THE IEEE DATA COMPRESSION CONFERENCE, SNOWBIRD, UTAH, MARCH 30  APRIL 1
, 1998
"... The BurrowsWheeler transform (BWT)andblock sorting compression are closely related to the context trees of PPM. The usual approach of treating BWT as merely a permutation is not able to fully exploit this relation. We show that ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
The BurrowsWheeler transform (BWT)andblock sorting compression are closely related to the context trees of PPM. The usual approach of treating BWT as merely a permutation is not able to fully exploit this relation. We show that
The at most kdeep factor tree
, 2003
"... Cet article présente un nouvelle structure d’indexation proche de l’arbre des suffixes. Cette structure indexe tous les facteurs de longueur au plus k d’une chaîne. La construction et la place mémoire sont linéaires en la longueur de la chaîne (comme l’arbre des suffixes). Cependant, pour des valeur ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
(Show Context)
Cet article présente un nouvelle structure d’indexation proche de l’arbre des suffixes. Cette structure indexe tous les facteurs de longueur au plus k d’une chaîne. La construction et la place mémoire sont linéaires en la longueur de la chaîne (comme l’arbre des suffixes). Cependant, pour des valeurs de k petites, l’arbre des facteurs présente un fort gain mémoire visàvis de l’arbre des suffixes. Mots Clefs: arbre des suffixes, arbre des facteurs, structure d’indexation.
OnLine Stochastic Processes in Data Compression
, 1996
"... The ability to predict the future based upon the past in finitealphabet sequences has many applications, including communications, data security, pattern recognition, and natural language processing. By Shannon's theory and the breakthrough development of arithmetic coding, any sequence, a 1 ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
The ability to predict the future based upon the past in finitealphabet sequences has many applications, including communications, data security, pattern recognition, and natural language processing. By Shannon's theory and the breakthrough development of arithmetic coding, any sequence, a 1 a 2 \Delta \Delta \Delta a n , can be encoded in a number of bits that is essentially equal to the minimal informationlossless codelength, P i \Gamma log 2 p(a i ja 1 \Delta \Delta \Delta a i\Gamma1 ). The goal of universal online modeling, and therefore of universal data compression, is to deduce the model of the input sequence a 1 a 2 \Delta \Delta \Delta a n that can estimate each p(a i ja 1 \Delta \Delta \Delta a i\Gamma1 ) knowing only a 1 a 2 \Delta \Delta \Delta a i\Gamma1 so that the ex...
Linear Time Universal Coding and Time Reversal of Tree Sources via FSM Closure
 IEEE Trans. Inform. Theory
, 2004
"... Tree models are efficient parametrizations of finitememory processes, offering potentially significant model cost savings. The information theory literature has focused mostly on redundancy aspects of the universal estimation and coding of these models. In this paper, we investigate representations ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
Tree models are efficient parametrizations of finitememory processes, offering potentially significant model cost savings. The information theory literature has focused mostly on redundancy aspects of the universal estimation and coding of these models. In this paper, we investigate representations and supporting data structures for finitememory processes, as well as the major impact these structures have on the computational complexity of the universal algorithms in which they are used. We first generalize the class of tree models, and then define and investigate the properties of the finite state machine (FSM) closure of a tree, which is the smallest FSM that generates all the processes generated by the tree. The interaction between FSM closures, generalized context trees, and classical data structures such as compact suffix trees brings together the informationtheoretic and the computational aspects, leading to an implementation in linear encoding/decoding time of the semipredictive approach to the Context algorithm, a lossless universal coding scheme in the class of tree models. An optimal context selection rule and the corresponding context transitions are computationally not more expensive than the various steps involved in the implementation of the BurrowsWheeler transform (BWT) and use, in fact, similar tools. We also present a reversible transform that displays the same "context deinterleaving" feature as the BWT but is naturally based on an optimal context tree. FSM closures are also applied to an investigation of the effect of time reversal on tree models, motivated in part by the following question: When compressing a data sequence using a universal scheme in the class of tree models, can it make a difference whether we read the sequence from...