Results 1 
7 of
7
Suffix arrays on words
 In Proceedings of the 18th Annual Symposium on Combinatorial Pattern Matching, volume 4580 of LNCS
, 2007
"... Abstract. Surprisingly enough, it is not yet known how to build directly a suffix array that indexes just the k positions at wordboundaries of a text T[1,n], taking O(n)timeandO(k) space in addition to T.Wepropose a classnote solution to this problem that achieves such optimal time and space bound ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
(Show Context)
Abstract. Surprisingly enough, it is not yet known how to build directly a suffix array that indexes just the k positions at wordboundaries of a text T[1,n], taking O(n)timeandO(k) space in addition to T.Wepropose a classnote solution to this problem that achieves such optimal time and space bounds. Wordbased versions of indexes achieving the same time/space bounds were already known for suffix trees [1,2] and (compact) DAWGs [3,4]. Our solution inherits the simplicity and efficiency of suffix arrays, with respect to such other wordindexes, and thus it foresees applications in wordbased approaches to data compression [5] and computational linguistics [6]. To support this, we have run a large set of experiments showing that wordbased suffix arrays may beconstructed twice as fast as their fulltext counterparts, and with a working space as low as 20%. The space reduction of the final wordbased suffix array impacts also in their query time (i.e. less random access binarysearch steps!), being faster by a factor of up to 3. 1
M.: Sparse compact directed acyclic word graphs
 In: Stringology
, 2006
"... Abstract. The suffix tree of string w represents all suffixes of w, and thus it supports full indexing of w for exact pattern matching. On the other hand, a sparse suffix tree of w represents only a subset of the suffixes of w, and therefore it supports sparse indexing of w. There has been a wide ra ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract. The suffix tree of string w represents all suffixes of w, and thus it supports full indexing of w for exact pattern matching. On the other hand, a sparse suffix tree of w represents only a subset of the suffixes of w, and therefore it supports sparse indexing of w. There has been a wide range of applications of sparse suffix trees, e.g., natural language processing and biological sequence analysis. Word suffix trees are a variant of sparse suffix trees that are defined for strings that contain a special word delimiter #. Namely, the word suffix tree of string w = w1w2 · · · wk, consisting of k words each ending with #, represents only the k suffixes of w of the form wi · · · wk. Recently, we presented an algorithm which builds word suffix trees in O(n) time with O(k) space, where n is the length of w. In addition, we proposed sparse directed acyclic word graphs (SDAWGs) and an online algorithm for constructing them, working in O(n) time and space. As a further achievement of this research direction, this paper introduces yet a new text indexing structure named sparse compact directed acyclic word graphs (SCDAWGs). We show that the size of SCDAWGs is smaller than that of word suffix trees and SDAWGs, and present an SCDAWG construction algorithm that works in O(n) time with O(k) space and in an online manner. 1
Sparse Suffix Tree Construction in Small Space
"... Abstract. We consider the problem of constructing a sparse suffix tree (or suffix array) for b suffixes of a given text T of length n, using only O(b) words of space during construction. Attempts at breaking the naive bound of Ω(nb) time for this problem can be traced back to the origins of string i ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We consider the problem of constructing a sparse suffix tree (or suffix array) for b suffixes of a given text T of length n, using only O(b) words of space during construction. Attempts at breaking the naive bound of Ω(nb) time for this problem can be traced back to the origins of string indexing in 1968. First results were only obtained in 1996, but only for the case where the suffixes were evenly spaced in T. In this paper there is no constraint on the locations of the suffixes. We show that the sparse suffix tree can be constructed in O(n log2 b) time. To achieve this we develop a technique, which may be of independent interest, that allows to efficiently answer b longest common prefix queries on suffixes of T, using only O(b) space. We expect that this technique will prove useful in many other applications in which space usage is a concern. Our first solution is MonteCarlo and outputs the correct tree with high probability. We then give a LasVegas algorithm which also uses O(b) space and runs in the same time bounds with high probability when b = O( n). Furthermore, additional tradeoffs between the space usage and the construction time for the MonteCarlo algorithm are given. 1
Sparse Text Indexing in Small Space∗
"... In this work we present efficient algorithms for constructing sparse suffix trees, sparse suffix arrays and sparse positions heaps for b arbitrary positions of a text T of length n while using only O(b) words of space during the construction. Attempts at breaking the naive bound of Ω(nb) time for co ..."
Abstract
 Add to MetaCart
(Show Context)
In this work we present efficient algorithms for constructing sparse suffix trees, sparse suffix arrays and sparse positions heaps for b arbitrary positions of a text T of length n while using only O(b) words of space during the construction. Attempts at breaking the naive bound of Ω(nb) time for constructing sparse suffix trees in O(b) space can be traced back to the origins of string indexing in 1968. First results were only obtained in 1996, but only for the case where the b suffixes were evenly spaced in T. In this paper there is no constraint on the locations of the suffixes. Our main contribution is to show that the sparse suffix tree (and array) can be constructed in O(n log2 b) time. To achieve this we develop a technique, that allows to efficiently answer b longest common prefix queries on suffixes of T, using only O(b) space. We expect that this technique will prove useful in many other applications in which space usage is a concern. Our first solution is MonteCarlo and outputs the correct tree with high probability. We then give a LasVegas algorithm which also uses O(b) space and runs in the same time bounds with high probability when b = O( n). Furthermore, additional tradeoffs between the space usage and the construction time for the MonteCarlo algorithm are given. Finally, we show that at the expense of slower pattern queries, it is possible to construct sparse position heaps in O(n+ b log b) time and O(b) space.
Fax: +810117067680Sparse and Truncated Suffix Trees on VariableLength Codes
, 2011
"... Abstract. The sparse suffix trees (SST), introduced by (Kärkkäinen and Ukkonen, COCOON 1996), is the suffix tree for a subset of all suffixes of an input text T of length n. In this paper, we study a special case that an input string is a sequence of codewords drawn from a regular prefix code ∆ ⊆ Σ ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. The sparse suffix trees (SST), introduced by (Kärkkäinen and Ukkonen, COCOON 1996), is the suffix tree for a subset of all suffixes of an input text T of length n. In this paper, we study a special case that an input string is a sequence of codewords drawn from a regular prefix code ∆ ⊆ Σ + recognized by a finite automaton, and index points locate on the code boundaries. In this case, we present an online algorithm that constructs the sparse suffix tree for an input string t on any variablelength regular prefix code, called the code suffix tree (CST), in O(n + m) time and O(k) additional space for a fixed base alphabet Σ, where m is the size of an automaton for ∆. Furthermore, we present a modified algorithm for ktruncated version of code suffix trees that runs in the same time and space complexities. Hence, these results generalize the previous results (Inenaga and Takeda, CPM 2006) for word suffix trees and (Na, Apostolico, Iliopoulos, and Park, Theor. Comp. Sci., 304, 2003) for truncated suffix trees on arbitrary variablelength regular prefix codes. 1
Online Construction of compact suffix vectors and maximal repeats ∗
"... A suffix vector of a string is an index data structure equivalent to a suffix tree. It was first introduced by Monostori et al. in 2001 [9, 10, 11]. They proposed a linear construction algorithm of an extended suffix vector then another linear algorithm to transform an extended suffix vector into a ..."
Abstract
 Add to MetaCart
(Show Context)
A suffix vector of a string is an index data structure equivalent to a suffix tree. It was first introduced by Monostori et al. in 2001 [9, 10, 11]. They proposed a linear construction algorithm of an extended suffix vector then another linear algorithm to transform an extended suffix vector into a more space economical compact suffix vector. We propose an online linear algorithm for directly constructing a compact suffix vector. Not only we show that it is possible to directly build a compact suffix vector but we will also show that this online construction can be faster than the construction of the extended suffix vector. Finally, we present an efficient method for computing maximal repeats using suffix vectors. 1
SURVEY AND SUMMARY Prospects and limitations of fulltext index structures in genome analysis
, 2012
"... The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinfo ..."
Abstract
 Add to MetaCart
The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memorytime tradeoffs. This article brings a comprehensive stateoftheart overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the tradeoffs they impose, but also their practical limitations, are explained and compared.