## Extended Application of Suffix Trees to Data Compression (1996)

Venue: | In Data Compression Conference |

Citations: | 37 - 2 self |

### BibTeX

@INPROCEEDINGS{Larsson96extendedapplication,

author = {N. Jesper Larsson},

title = {Extended Application of Suffix Trees to Data Compression},

booktitle = {In Data Compression Conference},

year = {1996},

pages = {190--199}

}

### Years of Citing Articles

### OpenURL

### Abstract

A practical scheme for maintaining an index for a sliding window in optimal time and space, by use of a suffix tree, is presented. The index supports location of the longest matching substring in time proportional to the length of the match. The total time for build and update operations is proportional to the size of the input. The algorithm, which is simple and straightforward, is presented in detail. The most prominent lossless data compression scheme, when considering compression performance, is prediction by partial matching with unbounded context lengths (PPM*). However, previously presented algorithms are hardly practical, considering their extensive use of computational resources. We show that our scheme can be applied to PPM*-style compression, obtaining an algorithm that runs in linear time, and in space bounded by an arbitrarily chosen window size. Application to Ziv--Lempel '77 compression methods is straightforward and the resulting algorithm runs in linear time. 1 Introdu...

### Citations

1138 | A Universal Algorithm for Sequential Data Compression
- Ziv, Lempel
- 1977
(Show Context)
Citation Context ...ting algorithm runs in linear time. 1 Introduction String matching is a central task in data compression. In particular, in string substitution methods---such as the original scheme of Ziv and Lempel =-=[14]-=----the dominating part of computation is string matching. Also, statistical data compression, such as the PPM methods [3, 4, 7], includes the operation of finding contexts, which are defined by string... |

664 |
Arithmetic coding for data compression
- Witten, Neal, et al.
- 1987
(Show Context)
Citation Context ...in context C , the count for c in C is used to encode the character: the higher the count, the larger the code space allocated to it. The encoding is most effectively performed with arithmetic coding =-=[8, 13]-=-. When a character appears in a context for the first time, its count in that context is zero, and the character can not be encoded. Therefore each context also keeps an escape count, used to encode a... |

548 |
A spaceâ€“economical suffix tree construction algorithm
- McCreight
- 1976
(Show Context)
Citation Context ...g contexts, which are defined by strings. In effect, this is a string matching operation, which, particularly when contexts are long, occupies a major part of computational resources. The suffix tree =-=[6, 11]-=- is a highly efficient data structure for string matching. A suffix tree indexes all substrings of a given string and can be constructed in linear time. Our primary contribution is to present a scheme... |

330 | I.: Data compression using adaptive coding and partial string matching
- Cleary, Witten
- 1984
(Show Context)
Citation Context ...n string substitution methods---such as the original scheme of Ziv and Lempel [14]---the dominating part of computation is string matching. Also, statistical data compression, such as the PPM methods =-=[3, 4, 7]-=-, includes the operation of finding contexts, which are defined by strings. In effect, this is a string matching operation, which, particularly when contexts are long, occupies a major part of computa... |

328 | On-line construction of suffix trees
- Ukkonen
- 1995
(Show Context)
Citation Context ...g contexts, which are defined by strings. In effect, this is a string matching operation, which, particularly when contexts are long, occupies a major part of computational resources. The suffix tree =-=[6, 11]-=- is a highly efficient data structure for string matching. A suffix tree indexes all substrings of a given string and can be constructed in linear time. Our primary contribution is to present a scheme... |

228 |
The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression
- Witten, Bell
- 1991
(Show Context)
Citation Context ...ned for characters that have never occured in the input stream. New contexts are added as they occur in the input stream. How and when to update escape counts is an intricate problem. Witten and Bell =-=[12]-=- consider several heuristics. 5.2 PPM* Previous to PPM*, the maximum order has usually been set to some small number. This is primarily to keep the number of states from growing too large, but also a ... |

139 | Arithmetic coding revisited
- Moffat, Neal, et al.
- 1998
(Show Context)
Citation Context ...in context C , the count for c in C is used to encode the character: the higher the count, the larger the code space allocated to it. The encoding is most effectively performed with arithmetic coding =-=[8, 13]-=-. When a character appears in a context for the first time, its count in that context is zero, and the character can not be encoded. Therefore each context also keeps an escape count, used to encode a... |

117 | Implementing the PPM data compression scheme
- Moffat
- 1990
(Show Context)
Citation Context ...n string substitution methods---such as the original scheme of Ziv and Lempel [14]---the dominating part of computation is string matching. Also, statistical data compression, such as the PPM methods =-=[3, 4, 7]-=-, includes the operation of finding contexts, which are defined by strings. In effect, this is a string matching operation, which, particularly when contexts are long, occupies a major part of computa... |

111 | Unbounded length contexts for PPM
- Cleary, Teahan
- 1997
(Show Context)
Citation Context ...n string substitution methods---such as the original scheme of Ziv and Lempel [14]---the dominating part of computation is string matching. Also, statistical data compression, such as the PPM methods =-=[3, 4, 7]-=-, includes the operation of finding contexts, which are defined by strings. In effect, this is a string matching operation, which, particularly when contexts are long, occupies a major part of computa... |

64 |
D.H.: Data compression with finite windows
- Fiala, Greene
- 1989
(Show Context)
Citation Context ...put has been processed. This is not feasible in practice. We need a scheme that allows maintaining only a limited part of the input preceding the current position---a sliding window. Fiala and Greene =-=[5]-=- claim to have modified McCreight's suffix tree construction algorithm [6] for use with a sliding window, by presenting a method for making deletions at constant amortized cost. However, a careful inv... |

63 |
Linear algorithm for data compression via string matching
- Rodeh, Pratt, et al.
- 1981
(Show Context)
Citation Context ...e of suffix trees for PPM*-style statistical modeling methods, together with its necessary theoretical justification. Also, application to Ziv--Lempel compression is natural. Some compression schemes =-=[3, 9]-=- require that each character, once read from the input, resides in primary storage until all of the input has been processed. This is not feasible in practice. We need a scheme that allows maintaining... |

15 |
The relationship between greedy parsing and symbolwise text compression
- Bell, Witten
- 1994
(Show Context)
Citation Context ...competitive results. Furthermore, with our sliding window technique we obtain a natural implementation of Ziv--Lempel compression which runs in linear time. It has been noted, e.g. by Bell and Witten =-=[2]-=-, that there is a strong connection between string substituting compression methods and symbolwise (statistical) methods. Our assertion that the exact same data structure is useful in both these famil... |

14 | Longest-match string searching for Ziv-Lempel compression
- Bell, Kulp
- 1993
(Show Context)
Citation Context ...here space requirements are bounded by a window size, and time complexity is linear in the size of the input. In a survey of string searching algorithms for Ziv--Lempel '77 compression, Bell and Kulp =-=[1]-=- rule out suffix trees because of the inefficiency of deletions. We assert that our method eliminates this inefficiency, and that suffix trees should certainly be considered for implementation of the ... |

12 | Probability estimation for PPM
- Teahan
- 1995
(Show Context)
Citation Context ...ssion method appears to be finite context modeling with unbounded context length, in the style of the PPM* algorithm presented by Cleary, Teahan, and Witten [3]. (Some refinements are given by Teahan =-=[10]-=-.) However, as presented in the original paper, this algorithm uses too much computational resources (both time and space) to be practically useful in most cases. Observing that the context trie emplo... |