## Suffix Trees on Words (1995)

### Cached

### Download Links

- [www.ai.mit.edu]
- [user.it.uu.se]
- [larsson.dogma.net]
- [user.it.uu.se]
- [larsson.dogma.net]
- [www.larsson.dogma.net]
- DBLP

### Other Repositories/Bibliography

Citations: | 30 - 2 self |

### BibTeX

@MISC{Andersson95suffixtrees,

author = {Arne Andersson and N. Jesper Larsson and Kurt Swanson},

title = {Suffix Trees on Words},

year = {1995}

}

### Years of Citing Articles

### OpenURL

### Abstract

We present an intrinsic generalization on the suffix tree, designed to index a string of length n which has a natural partitioning into m multi-character substrings or words. The word suffix tree represents only the m suffixes that start at word boundaries. These boundaries are determined by delimiters, whose definition depends on the application. Since traditional suffix tree construction algorithms rely heavily on the fact that all suffixes are inserted, construction of a word suffix tree is nontrivial, in particular when only O(m) construction space is allowed. We solve this problem, presenting an algorithm with O(n) expected running time. In general, construction cost is \Omega(n) due to the need of scanning the entire input. In applications that require strict node ordering, an additional cost of sorting O(m') characters arises, where m' is the number of distinct words. This is a significant improvement over previous solutions. In some cases, when the alphabet is small, we may assume that the n characters in the input string occupy o(n) machine words. We show that this can allow a word suffix tree to be built in sublinear time.

### Citations

148 | Surpassing the information theoretic bound with fusion trees - Fredman, Willard - 1993 |

130 | Optimal suffix tree construction with large alphabets
- Farach
- 1997
(Show Context)
Citation Context ...red however, we have an improved time bound of O(n + s(m)) compared to the Θ(n + s(n)) time required to create a lexicographic traditional suffix tree on a string of length n. (Note that while Farach =-=[9]-=- obtains linear construction time, his approach requires sorting as a preparatory step.) We state this in the following observation: 4sObservation 2 A word suffix tree on a string of length n with m w... |

118 |
The myriad virtues of subword trees
- Apostolico
- 1985
(Show Context)
Citation Context ...hine words. We illustrate that this can allow a word suffix tree to be built in sublinear time. 1 Introduction The suffix tree [18] is a very important and useful data structure for many applications =-=[6]-=-. Traditional suffix tree construction algorithms rely heavily on the fact that all suffixes are inserted, in order to obtain efficient time bounds. Little work has been done for the common case where... |

86 | Sorting in linear time
- Andersson, Hagerup, et al.
- 1995
(Show Context)
Citation Context ...sort m ′ characters. For the general problem, with no restrictions on alphabet size, this implies an upper bound of O(n log log n) by applying the currently best known upper bound for integer sorting =-=[2]-=-. 6.2 A deterministic algorithm A deterministic version of Algorithm B can be obtained by representing the tree with only deterministic data structures, such as binary search trees. Also, when these d... |

77 | pattern matching algorithm - Linear - 1973 |

35 | S.N.: Efficient implementation of suffix trees
- Andersson
- 1995
(Show Context)
Citation Context ...sadvantage with the bucket array, however, is that the number of suffixes may vary a lot between the buckets, many buckets may even be empty. Therefore, Andersson and Nilsson suggested an improvement =-=[5]-=-: instead of the bucket array, a small, efficiently implemented suffix tree may be used to index the array of pointers. It is experimentally demonstrated that this data structure uses less space than ... |

33 | Improved behaviour of tries by adaptive branching
- Andersson, Nilsson
- 1993
(Show Context)
Citation Context ...s data structure uses less space than the bucket array while the number of disk accesses is smaller. The reason for this advantage is that a trie in general, and a level compressed trie in particular =-=[3, 4]-=-, adapts more nicely to the input distribution than a bucket array. For the same reasons, we conjecture that an efficiently implemented word suffix tree will offer a better time–space tradeoff than a ... |

23 |
Mehlhorn, Friedhelm Meyer Auf Der
- Dietzfelbinger, Karlin, et al.
(Show Context)
Citation Context ...he price of a higher search cost, when the alphabet is not small enough to be regarded as constant. 3s3. The edges at each node are hash-coded (randomized construction). Using dynamic perfect hashing =-=[8]-=-, we are guaranteed that searches spend constant time per node, even for a non-constant alphabet. Furthermore, this representation may be combined with variant 2. 4. Instead of storing pointers into t... |

23 | Optimal su'x tree construction with large alphabets, in - Farach - 1997 |

21 |
Efficient text searching of regular expressions
- Baeza-Yates, Gonnet
- 1989
(Show Context)
Citation Context ... where only certain suffixes of the input string are relevant, despite the savings in storage and processing times that are to be expected from only considering these suffixes. Baeza-Yates and Gonnet =-=[7]-=- have pointed out this possibility, by suggesting inserting only suffixes that start with a word, when the input consists of ordinary text. They imply that the resulting tree can be built in O(nH(n)) ... |

16 | Fast Algorithms for nding nearest common ancestors - Harel, Tarjan - 1984 |

15 |
deterministic sorting and Searching in linear space
- Anderson
- 1995
(Show Context)
Citation Context ...)=O(m log m ′ ). There are other possibilities, for example we could implement each node as a fusion tree [10], which implies i(m, m ′ )=O(m log m ′ / log log m ′ ) 8sor as an exponential search tree =-=[1]-=-, which implies or i(m, m ′ )=O(m � log m ′ ) i(m, m ′ )=O(m log log m ′ log log k) the latter bound being the more advantageous when the alphabet size is reasonably small. 7 Sublinear construction: A... |

8 | S.: Efficient Implementation of Suffix Trees. SoftwarePractice and Experience 25(2 - Andersson, Nilsson - 1995 |

7 |
Faster searching in tries and quadtreesan analysis of level compression
- Andersson, Nilsson
- 1994
(Show Context)
Citation Context ...s data structure uses less space than the bucket array while the number of disk accesses is smaller. The reason for this advantage is that a trie in general, and a level compressed trie in particular =-=[3, 4]-=-, adapts more nicely to the input distribution than a bucket array. For the same reasons, we conjecture that an efficiently implemented word suffix tree will offer a better time–space tradeoff than a ... |

5 | Sux Trees in the Functional Programming Paradigm - Giegerich, Kurtz - 1994 |

4 | A space economical suOEx tree construction algorithm - McCreight - 1976 |

3 | Ecient Implementation of Sux Trees - Andersson, Nilsson - 1995 |

2 | EOEcient implementation of suOEx trees - Andersson, Nilsson - 1995 |

2 | On-line construction of suOEx trees - Ukkonen - 1993 |