## Compressed suffix trees with full functionality

### Cached

### Download Links

- [www.stanford.edu]
- [www.stanford.edu]
- [tcslab.csce.kyushu-u.ac.jp]
- DBLP

### Other Repositories/Bibliography

Venue: | Theory of Computing Systems |

Citations: | 56 - 6 self |

### BibTeX

@ARTICLE{Sadakane_compressedsuffix,

author = {Kunihiko Sadakane},

title = {Compressed suffix trees with full functionality},

journal = {Theory of Computing Systems},

year = {},

pages = {2007}

}

### Years of Citing Articles

### OpenURL

### Abstract

We introduce new data structures for compressed suffix trees whose size are linear in the text size. The size is measured in bits; thus they occupy only O(n log |A|) bits for a text of length n on an alphabet A. This is a remarkable improvement on current suffix trees which require O(n log n) bits. Though some components of suffix trees have been compressed, there is no linear-size data structure for suffix trees with full functionality such as computing suffix links, string-depths and lowest common ancestors. The data structure proposed in this paper is the first one that has linear size and supports all operations efficiently. Any algorithm running on a suffix tree can also be executed on our compressed suffix trees with a slight slowdown of a factor of polylog(n). 1

### Citations

970 |
Algorithms on strings, trees and sequences
- Gusfield
- 1997
(Show Context)
Citation Context ...ning on a suffix tree can also be executed on our compressed suffix trees with a slight slowdown of a factor of polylog(n). 1 Introduction Suffix trees are basic data structures for string algorithms =-=[13]-=-. A pattern can be found in time proportional to the pattern length from a text by constructing the suffix tree of the text in advance. The suffix tree can also be used for more complicated problems, ... |

686 | Suffix arrays: a new method for on-line string searches - Manber, EW - 1993 |

574 |
A Space-Economical Suffix Tree Construction Algorithm
- McCreight
- 1976
(Show Context)
Citation Context ... string algorithms are based on the use of suffix trees because this does not increase the asymptotic time complexity. A suffix tree of a string can be constructed in linear time in the string length =-=[28, 21, 27, 5]-=-. Therefore it is natural to use the suffix tree. However, concerning the space complexity, the suffix tree is worse than the string. Let A be the alphabet and T be a string of length n on A. Then the... |

348 | On-line construction of suffix tree
- Ukkonen
- 1995
(Show Context)
Citation Context ... string algorithms are based on the use of suffix trees because this does not increase the asymptotic time complexity. A suffix tree of a string can be constructed in linear time in the string length =-=[28, 21, 27, 5]-=-. Therefore it is natural to use the suffix tree. However, concerning the space complexity, the suffix tree is worse than the string. Let A be the alphabet and T be a string of length n on A. Then the... |

193 | Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String
- Grossi, Vitter
- 2000
(Show Context)
Citation Context ... to solve other problems. In this paper, we propose linear-size data structures for them. The data structures have size |CSA|+6n+ o(n) bits where |CSA| denotes the size of the compressed suffix array =-=[11]-=- of the text, which is also linear. As for the time complexity, our data structures support efficient operations on suffix trees. Any operation on a suffix tree is supported with a slowdown of a facto... |

192 | The LCA problem revisited
- Bender, Farach-Colton
- 2000
(Show Context)
Citation Context ...ection 5 states the main results: new data structures for compressed suffix trees. Section 6 shows some concluding remarks. 2 Suffix Trees In this section we review suffix trees. Let T [1..n] =T [1]T =-=[2]-=- ···T [n] be a text of length n on an alphabet A with |A| ≤ n. We assume that T [n] = $ is a unique terminator which alphabetically precedes all other symbols. The j-th suffix of T is defined as T [j.... |

191 | Opportunistic data structures with applications
- Ferragina, Manzini
- 2000
(Show Context)
Citation Context ...0 + n lg lg |A|) O(lg ɛ n) O(1) [25] (|A| = polylog(n)) nHh + O ( n lg lg n lg |A| n ) O(lg 2 n/ lg lg n) O(lg |A|) [10, Theorem 4.2] Many variations of the compressed suffix array have been proposed =-=[6, 7, 10, 8, 25]-=-. Ferragina and Manzini [6, 7] proposed the FM-index, a kind of compressed suffix array of size 5nHk+O( n lg n (|A|+lg lg n)+nɛ |A|2 |A| lg |A| )) bits where Hk is the order-k entropy of the text. Thi... |

144 | Succinct representation of balanced parentheses, static trees and planar graphs
- Munro, Raman
- 1997
(Show Context)
Citation Context ...an be computed by using lookup and inverse as in the definition, it cannot be done in constant time. 3.2 Balanced parentheses representations of trees We use a balanced parentheses encoding of a tree =-=[23, 24]-=-. An m-node rooted ordered tree can be encoded in 2m + o(m) bits with various constant time navigational operations. The tree is encoded into m nested open and close parentheses as follows. During a p... |

130 | Optimal suffix tree construction with large alphabets
- Farach
- 1997
(Show Context)
Citation Context ... string algorithms are based on the use of suffix trees because this does not increase the asymptotic time complexity. A suffix tree of a string can be constructed in linear time in the string length =-=[28, 21, 27, 5]-=-. Therefore it is natural to use the suffix tree. However, concerning the space complexity, the suffix tree is worse than the string. Let A be the alphabet and T be a string of length n on A. Then the... |

91 | New text indexing functionalities of the compressed suffix arrays - Sadakane - 2003 |

80 | Linear-time longestcommon-prefix computation in suffix arrays and its applications
- Kasai, Lee, et al.
(Show Context)
Citation Context ...4n+ o(n) bits. Our compressed suffix tree is similar to this; however we also support depth(v), lca(v, w) and sl(v). 3.3 Simulating suffix tree traversal by suffix array and height array Kasai et al. =-=[17]-=- showed that a bottom-up traversal of a suffix tree can be simulated by using only the suffix array and an array storing a set of the lengths of the longest common prefixes between two suffixes, calle... |

69 | An experimental study of an opportunistic index
- Ferragina, Manzini
- 2001
(Show Context)
Citation Context ...0 + n lg lg |A|) O(lg ɛ n) O(1) [25] (|A| = polylog(n)) nHh + O ( n lg lg n lg |A| n ) O(lg 2 n/ lg lg n) O(lg |A|) [10, Theorem 4.2] Many variations of the compressed suffix array have been proposed =-=[6, 7, 10, 8, 25]-=-. Ferragina and Manzini [6, 7] proposed the FM-index, a kind of compressed suffix array of size 5nHk+O( n lg n (|A|+lg lg n)+nɛ |A|2 |A| lg |A| )) bits where Hk is the order-k entropy of the text. Thi... |

61 | A space-economical sux tree construction algorithm - McCreight - 1976 |

60 | Optimal bounds for the predecessor problem
- Beame, Fich
- 1999
(Show Context)
Citation Context ...refore it is better to represent the suffix tree in size proportional in the string size, that is, in O(n lg |A|) bits. In this paper, we consider the following computation model. We assume a wordRAM =-=[1, 14]-=- with word size O(lg U) bits, where n ≤ U, in which standard arithmetic and bitwise boolean operations on word-sized operands can be performed in constant time. We also have O(U) memory cells, each of... |

57 | Space Efficient Suffix trees
- Munro, Raman, et al.
- 1998
(Show Context)
Citation Context ...practically. We propose O(n lg |A|)-bit data structures for suffix trees which have the full functionality of the current suffix trees. Though some data structures for suffix trees have been proposed =-=[3, 24]-=-, the following are missed: (1) the suffix link of an internal node, (2) the depth of an internal node, and (3) the lowest common ancestor (lca) between any two nodes. The suffix link is necessary to ... |

51 | Breaking a time-and-space barrier in constructing full-text indices
- Hon, Sadakane, et al.
(Show Context)
Citation Context ...mportant to mention the complexity of the working space to construct a compressed suffix tree. There are many algorithms for constructing compressed suffix arrays and trees using linear working space =-=[19, 15, 16]-=-. Therefore our compressed suffix trees can also be constructed using linear working space. An open question is the following: Can we reduce the linear term 6n to o(n)? Acknowledgments The author woul... |

46 |
Sorting and searching on the word RAM
- Hagerup
- 1998
(Show Context)
Citation Context ...refore it is better to represent the suffix tree in size proportional in the string size, that is, in O(n lg |A|) bits. In this paper, we consider the following computation model. We assume a wordRAM =-=[1, 14]-=- with word size O(lg U) bits, where n ≤ U, in which standard arithmetic and bitwise boolean operations on word-sized operands can be performed in constant time. We also have O(U) memory cells, each of... |

40 |
New Indices for Text
- Gonnet, Baeza-yates, et al.
- 1992
(Show Context)
Citation Context ...st leaf in the subtree rooted at node v. These are computed in constant time [24]. Actually they have proposed a compressed suffix tree using the suffix array and the parentheses encoding of Pat tree =-=[9]-=- in n lg n +4n+ o(n) bits. Our compressed suffix tree is similar to this; however we also support depth(v), lca(v, w) and sl(v). 3.3 Simulating suffix tree traversal by suffix array and height array K... |

30 | On-line construction of sux trees - Ukkonen - 1995 |

25 |
Efficient storage retrieval by content and address of static files
- Elias
- 1974
(Show Context)
Citation Context ... Section 4.1. ⊓⊔ 74.1 Data structures for Hgt array Here we show the data structure of Theorem 1. To achieve the space complexity, we use a space efficient data structure for storing sorted integers =-=[4]-=- and the select function [22] as it is used in the compressed suffix array [11]. Lemma 1 [12] Given s integers in sorted order, each containing w bits, where s<2w , we can store them in at most s(2 + ... |

23 | Optimal su'x tree construction with large alphabets, in - Farach - 1997 |

23 |
Reducing the space requirements of suffix trees”, Software Practice and Experience
- Kurtz
- 1999
(Show Context)
Citation Context ...aller than n, for example, for the whole genome sequence of human, |A| =4 (A = {a, c, g, t}) and n>2 31 (2.8 Giga). Even with a space-efficient implementation, the suffix tree is 40 Gigabytes in size =-=[18]-=-, whereas the string is only 700 Megabytes. 1 Let lg n denote log2 n. 1Therefore it is better to represent the suffix tree in size proportional in the string size, that is, in O(n lg |A|) bits. In th... |

22 |
On compressing and indexing data
- Ferragina, Manzini
(Show Context)
Citation Context ...0 + n lg lg |A|) O(lg ɛ n) O(1) [25] (|A| = polylog(n)) nHh + O ( n lg lg n lg |A| n ) O(lg 2 n/ lg lg n) O(lg |A|) [10, Theorem 4.2] Many variations of the compressed suffix array have been proposed =-=[6, 7, 10, 8, 25]-=-. Ferragina and Manzini [6, 7] proposed the FM-index, a kind of compressed suffix array of size 5nHk+O( n lg n (|A|+lg lg n)+nɛ |A|2 |A| lg |A| )) bits where Hk is the order-k entropy of the text. Thi... |

19 | Constructing compressed suffix arrays with large alphabets
- Hon, Lam, et al.
- 2003
(Show Context)
Citation Context ...mportant to mention the complexity of the working space to construct a compressed suffix tree. There are many algorithms for constructing compressed suffix arrays and trees using linear working space =-=[19, 15, 16]-=-. Therefore our compressed suffix trees can also be constructed using linear working space. An open question is the following: Can we reduce the linear term 6n to o(n)? Acknowledgments The author woul... |

18 | Efficient Discovery of Optimal Word Association Patterns in Large Text Databases
- Shimozono, Arimura, et al.
(Show Context)
Citation Context ...string of two strings, matching statistics, etc. The node depth is necessary to implicitly enumerate all maximal repeated substrings of the text in linear time, which can be used for text data mining =-=[26]-=-. The lca is necessary to compute the longest common extension of two suffixes in constant time, which can be used in approximate string matching problems. The above elements are also frequently used ... |

15 | Compressed sux arrays and sux trees with applications to text indexing and string matching - Grossi, Vitter - 2000 |

13 | Reducing the Space Requirement of Sux Trees - Kurtz - 1998 |

6 | Space ecient sux trees - Munro, Raman, et al. - 2001 |

4 | E#cient Discovery of Optimal WordAssociation Patterns in Large Text Databases - Shimozono, Arimura, et al. - 2000 |

4 |
Linear Pattern Matching Algorihms
- Weiner
- 1973
(Show Context)
Citation Context |

3 | E#cient Su#x Trees on Secondary Storage - Clark, Munro - 1996 |

3 |
Higher Order Entropy Analysis of Compressed Suffix Arrays
- Grossi, Gupta, et al.
- 2003
(Show Context)
Citation Context |

2 | A Space and Time E#cient Algorithm for Constructing Compressed Su#x Arrays - Lam, Sadakane, et al. - 2002 |

1 | Constructing Compressed Su#x Arrays with Large Alphabets - Hon, Lam, et al. - 2003 |