Results 1  10
of
88
Simple linear work suffix array construction
, 2003
"... Abstract. Suffix trees and suffix arrays are widely used and largely interchangeable index structures on strings and sequences. Practitioners prefer suffix arrays due to their simplicity and space efficiency while theoreticians use suffix trees due to lineartime construction algorithms and more exp ..."
Abstract

Cited by 179 (6 self)
 Add to MetaCart
Abstract. Suffix trees and suffix arrays are widely used and largely interchangeable index structures on strings and sequences. Practitioners prefer suffix arrays due to their simplicity and space efficiency while theoreticians use suffix trees due to lineartime construction algorithms and more explicit structure. We narrow this gap between theory and practice with a simple lineartime construction algorithm for suffix arrays. The simplicity is demonstrated with a C++ implementation of 50 effective lines of code. The algorithm is called DC3, which stems from the central underlying concept of difference cover. This view leads to a generalized algorithm, DC, that allows a spaceefficient implementation and, moreover, supports the choice of a space–time tradeoff. For any v ∈ [1, √ n], it runs in O(vn) time using O(n / √ v) space in addition to the input string and the suffix array. We also present variants of the algorithm for several parallel and hierarchical memory models of computation. The algorithms for BSP and EREWPRAM models are asymptotically faster than all previous suffix tree or array construction algorithms.
Reducing the Space Requirement of Suffix Trees
 Software – Practice and Experience
, 1999
"... We show that suffix trees store various kinds of redundant information. We exploit these redundancies to obtain more space efficient representations. The most space efficient of our representations requires 20 bytes per input character in the worst case, and 10.1 bytes per input character on average ..."
Abstract

Cited by 130 (12 self)
 Add to MetaCart
We show that suffix trees store various kinds of redundant information. We exploit these redundancies to obtain more space efficient representations. The most space efficient of our representations requires 20 bytes per input character in the worst case, and 10.1 bytes per input character on average for a collection of 42 files of different type. This is an advantage of more than 8 bytes per input character over previous work. Our representations can be constructed without extra space, and as fast as previous representations. The asymptotic running times of suffix tree applications are retained. Copyright © 1999 John Wiley & Sons, Ltd. KEY WORDS: data structures; suffix trees; implementation techniques; space reduction
From Ukkonen to McCreight and Weiner: A Unifying View of LinearTime Sux Tree Constructions. Algorithmica
, 1997
"... Abstract. We review the linear time sux tree constructions by Weiner, McCreight, and Ukkonen. We use the terminology of the most recent algorithm, Ukkonen's online construction, to explain its historic predecessors. This reveals relationships much closer than one would expect, since the three a ..."
Abstract

Cited by 84 (9 self)
 Add to MetaCart
(Show Context)
Abstract. We review the linear time sux tree constructions by Weiner, McCreight, and Ukkonen. We use the terminology of the most recent algorithm, Ukkonen's online construction, to explain its historic predecessors. This reveals relationships much closer than one would expect, since the three algorithms are based on rather dierent intuitive ideas. Moreover, it completely explains the dierences between these algorithms in terms of simplicity, eciency, and implementation complexity. Key Words. Text processing. Online string matching. Sux trees. Linear time algorithm. Program transformation. 1
An Introduction to Bioinformatics Algorithms
, 2004
"... In the early 1990s when one of us was teaching his first bioinformatics class, he was not sure that there would be enough students to teach. Although ..."
Abstract

Cited by 73 (0 self)
 Add to MetaCart
In the early 1990s when one of us was teaching his first bioinformatics class, he was not sure that there would be enough students to teach. Although
Efficient implementation of lazy suffix trees
 MESSAGE SEQUENCE CHARTS AND PETRI NETS, CITESEER.NJ.NEC.COM/VANDERAALST99INTERORGANIZATIONAL.HTML
, 1999
"... We present an efficient implementation of a writeonly topdown construction for suffix trees. Our implementation is based on a new, spaceefficient representation of suffix trees which requires only 12 bytes per input character in the worst case, and 8:5 bytes per input character on average for a c ..."
Abstract

Cited by 40 (5 self)
 Add to MetaCart
(Show Context)
We present an efficient implementation of a writeonly topdown construction for suffix trees. Our implementation is based on a new, spaceefficient representation of suffix trees which requires only 12 bytes per input character in the worst case, and 8:5 bytes per input character on average for a collection of files of different type. We show how to efficiently implement the lazy evaluation of suffix trees such that a subtree is evaluated not before it is traversed for the first time. Our experiments show that for the problem of searching many exact patterns in a fixed input string, the lazy topdown construction is often faster and more space efficient than other methods.
Optimal Exact String Matching Based on Suffix Arrays
 In Proceedings of the Ninth International Symposium on String Processing and Information Retrieval. SpringerVerlag, Lecture Notes in Computer Science
, 2002
"... Using the suffix tree of a string S, decision queries of the type "Is P a substring of S?" can be answered in O(P) time and enumeration queries of the type "Where are all z occurrences of P in S?" can be answered in O(P+z) time, totally independent of the size of S. However, ..."
Abstract

Cited by 38 (2 self)
 Add to MetaCart
Using the suffix tree of a string S, decision queries of the type "Is P a substring of S?" can be answered in O(P) time and enumeration queries of the type "Where are all z occurrences of P in S?" can be answered in O(P+z) time, totally independent of the size of S. However, in large scale applications as genome analysis, the space requirements of the suffix tree are a severe drawback. The suffix array is a more space economical index structure. Using it and an additional table, Manber and Myers (1993) showed that decision queries and enumeration queries can be answered in O(P+log S) and O(P+log S+z) time, respectively, but no optimal time algorithms are known. In this paper, we showhow to achieve the optimal O(P) and O(P+z) time bounds for the suffix array. Our approach is not confined to exact pattern matching. In fact, it can be used to efficiently solve all problems that are usually solved bya topdown traversal of the suffix tree. Experiments show that our method is not only of theoretical interest but also of practical relevance.
Suffix Cactus: A Cross between Suffix Tree and Suffix Array
, 1995
"... The suffix cactus is a new alternative to the suffix tree and the suffix array as an index of large static texts. Its size and its performance in searches lies between those of the suffix tree and the suffix array. Structurally, the suffix cactus can be seen either as a compact variation of the suff ..."
Abstract

Cited by 38 (2 self)
 Add to MetaCart
(Show Context)
The suffix cactus is a new alternative to the suffix tree and the suffix array as an index of large static texts. Its size and its performance in searches lies between those of the suffix tree and the suffix array. Structurally, the suffix cactus can be seen either as a compact variation of the suffix tree or as an augmented suffix array.
Suffix Trees on Words
, 1995
"... We present an intrinsic generalization on the suffix tree, designed to index a string of length n which has a natural partitioning into m multicharacter substrings or words. The word suffix tree represents only the m suffixes that start at word boundaries. These boundaries are determined by delimit ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
We present an intrinsic generalization on the suffix tree, designed to index a string of length n which has a natural partitioning into m multicharacter substrings or words. The word suffix tree represents only the m suffixes that start at word boundaries. These boundaries are determined by delimiters, whose definition depends on the application. Since traditional suffix tree construction algorithms rely heavily on the fact that all suffixes are inserted, construction of a word suffix tree is nontrivial, in particular when only O(m) construction space is allowed. We solve this problem, presenting an algorithm with O(n) expected running time. In general, construction cost is \Omega(n) due to the need of scanning the entire input. In applications that require strict node ordering, an additional cost of sorting O(m') characters arises, where m' is the number of distinct words. This is a significant improvement over previous solutions. In some cases, when the alphabet is small, we may assume that the n characters in the input string occupy o(n) machine words. We show that this can allow a word suffix tree to be built in sublinear time.
Fullycompressed suffix trees
 IN: PACS 2000. LNCS
, 2000
"... Suffix trees are by far the most important data structure in stringology, with myriads of applications in fields like bioinformatics and information retrieval. Classical representations of suffix trees require O(n log n) bits of space, for a string of size n. This is considerably more than the nlog ..."
Abstract

Cited by 26 (19 self)
 Add to MetaCart
(Show Context)
Suffix trees are by far the most important data structure in stringology, with myriads of applications in fields like bioinformatics and information retrieval. Classical representations of suffix trees require O(n log n) bits of space, for a string of size n. This is considerably more than the nlog 2 σ bits needed for the string itself, where σ is the alphabet size. The size of suffix trees has been a barrier to their wider adoption in practice. Recent compressed suffix tree representations require just the space of the compressed string plus Θ(n) extra bits. This is already spectacular, but still unsatisfactory when σ is small as in DNA sequences. In this paper we introduce the first compressed suffix tree representation that breaks this linearspace barrier. Our representation requires sublinear extra space and supports a large set of navigational operations in logarithmic time. An essential ingredient of our representation is the lowest common ancestor (LCA) query. We reveal important connections between LCA queries and suffix tree navigation.