@MISC{Fredriksson_fromnondeterministic, author = {Kimmo Fredriksson}, title = {From nondeterministic suffix automaton to lazy suffix tree}, year = {} }

Share

OpenURL

Abstract

Given two strings, a pattern P of length m and a text T of length n over some alphabet Σ of size σ, we consider the exact string matching problem, i.e. we want to report all occurrences of P in T. The well-known Backward-Nondeterministic-DAWG-Matching (BNDM) algorithm is one of the most efficient algorithm for short to moderate length patterns. In this paper – as a prelude – we take the underlying nondeterministic suffix automaton and apply it to the text instead of to the pattern. The resulting algorithm is surprisingly simple, and efficient for relatively short patterns and small alphabet sizes in practice. We then show how the algorithm can be easily adapted to construct the suffix tree of T in a lazy manner. Both of the algorithms are efficient if the text is static but the patterns are given on-line (without possibility to batch the queries). We discuss various variants of the algorithms, and conclude with some experimental results.