Results 1 
3 of
3
Suffix arrays on words
 In Proceedings of the 18th Annual Symposium on Combinatorial Pattern Matching, volume 4580 of LNCS
, 2007
"... Abstract. Surprisingly enough, it is not yet known how to build directly a suffix array that indexes just the k positions at wordboundaries of a text T[1,n], taking O(n)timeandO(k) space in addition to T.Wepropose a classnote solution to this problem that achieves such optimal time and space bound ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Abstract. Surprisingly enough, it is not yet known how to build directly a suffix array that indexes just the k positions at wordboundaries of a text T[1,n], taking O(n)timeandO(k) space in addition to T.Wepropose a classnote solution to this problem that achieves such optimal time and space bounds. Wordbased versions of indexes achieving the same time/space bounds were already known for suffix trees [1,2] and (compact) DAWGs [3,4]. Our solution inherits the simplicity and efficiency of suffix arrays, with respect to such other wordindexes, and thus it foresees applications in wordbased approaches to data compression [5] and computational linguistics [6]. To support this, we have run a large set of experiments showing that wordbased suffix arrays may beconstructed twice as fast as their fulltext counterparts, and with a working space as low as 20%. The space reduction of the final wordbased suffix array impacts also in their query time (i.e. less random access binarysearch steps!), being faster by a factor of up to 3. 1
M.: Sparse compact directed acyclic word graphs
 In: Stringology
, 2006
"... Abstract. The suffix tree of string w represents all suffixes of w, and thus it supports full indexing of w for exact pattern matching. On the other hand, a sparse suffix tree of w represents only a subset of the suffixes of w, and therefore it supports sparse indexing of w. There has been a wide ra ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract. The suffix tree of string w represents all suffixes of w, and thus it supports full indexing of w for exact pattern matching. On the other hand, a sparse suffix tree of w represents only a subset of the suffixes of w, and therefore it supports sparse indexing of w. There has been a wide range of applications of sparse suffix trees, e.g., natural language processing and biological sequence analysis. Word suffix trees are a variant of sparse suffix trees that are defined for strings that contain a special word delimiter #. Namely, the word suffix tree of string w = w1w2 · · · wk, consisting of k words each ending with #, represents only the k suffixes of w of the form wi · · · wk. Recently, we presented an algorithm which builds word suffix trees in O(n) time with O(k) space, where n is the length of w. In addition, we proposed sparse directed acyclic word graphs (SDAWGs) and an online algorithm for constructing them, working in O(n) time and space. As a further achievement of this research direction, this paper introduces yet a new text indexing structure named sparse compact directed acyclic word graphs (SCDAWGs). We show that the size of SCDAWGs is smaller than that of word suffix trees and SDAWGs, and present an SCDAWG construction algorithm that works in O(n) time with O(k) space and in an online manner. 1
Fax: +810117067680Sparse and Truncated Suffix Trees on VariableLength Codes
, 2011
"... Abstract. The sparse suffix trees (SST), introduced by (Kärkkäinen and Ukkonen, COCOON 1996), is the suffix tree for a subset of all suffixes of an input text T of length n. In this paper, we study a special case that an input string is a sequence of codewords drawn from a regular prefix code ∆ ⊆ Σ ..."
Abstract
 Add to MetaCart
Abstract. The sparse suffix trees (SST), introduced by (Kärkkäinen and Ukkonen, COCOON 1996), is the suffix tree for a subset of all suffixes of an input text T of length n. In this paper, we study a special case that an input string is a sequence of codewords drawn from a regular prefix code ∆ ⊆ Σ + recognized by a finite automaton, and index points locate on the code boundaries. In this case, we present an online algorithm that constructs the sparse suffix tree for an input string t on any variablelength regular prefix code, called the code suffix tree (CST), in O(n + m) time and O(k) additional space for a fixed base alphabet Σ, where m is the size of an automaton for ∆. Furthermore, we present a modified algorithm for ktruncated version of code suffix trees that runs in the same time and space complexities. Hence, these results generalize the previous results (Inenaga and Takeda, CPM 2006) for word suffix trees and (Na, Apostolico, Iliopoulos, and Park, Theor. Comp. Sci., 304, 2003) for truncated suffix trees on arbitrary variablelength regular prefix codes. 1