Abstract:
We explore the design space of implementing suffix tree algorithms in the functional paradigm. We review the linear time and space algorithms of McCreight and Ukkonen. Based on a new terminology of nested suffixes and nested prefixes, we give a simpler and more declarative explanation of these algorithms than was previously known. We design two "naive" versions of these algorithms which are not linear time, but use simpler data structures, and can be implemented in a purely functional style. Furthermore, we present a new, "lazy" suffix tree construction which is even simpler. We evaluate both imperative and functional implementations of these algorithms. Our results show that the naive algorithms perform very favourably, and in particular, the lazy construction compares very well to all the others. 1 Introduction Suffix trees are the method of choice when a large sequence of symbols, the "text", is to be searched frequently for occurrences of short sequences, the "patterns". Given tha...
Citations
|
2010
|
The Design and Analysis of Computer Algorithms
– Aho, Hopcroft, et al.
- 1974
|
|
1122
|
Introduction to Functional Programming
– Bird, Wadler
- 1988
|
|
449
|
Suffix arrays: a new method for on-line string searches
– Manber, Myers
- 1993
|
|
429
|
A space-economical suffix tree construction algorithm
– McCreight
- 1976
|
|
312
|
Linear pattern matching algorithm
– Weiner
- 1973
|
|
106
|
Approximate string-matching with q-grams and maximal matches
– Ukkonen
- 1992
|
|
104
|
Elements of Functional Programming
– Reade
- 1993
|
|
88
|
The myriad virtues of subword trees
– Apostolico
- 1985
|
|
49
|
From Ukkonen to McCreight and Weiner: A unifying view of linear-time suffix tree construction
– Giegerich, Kurtz
- 1997
|
|
43
|
Implementing Haskell overloading
– Augustsson
- 1993
|
|
43
|
Constructing suffix trees on-line in linear time
– Ukkonen
- 1995
|
|
40
|
Approximate String Matching in Sublinear Expected Time
– Chang, Lawler
- 1990
|
|
36
|
Memory subsystem performance of programs using copying garbage collection
– Diwan, Tarditi, et al.
- 1994
|
|
30
|
E#cient and Elegant Subword Tree construction
– Chen, Seiferas
- 1985
|
|
26
|
Self-Alignments in Words and Their Applications
– Apostolico, Szpankowski
- 1992
|
|
15
|
Benchmarking Implementations of Functional Languages with \Pseudoknot", a Float-Intensive Benchmark
– Hartel, Feeley, et al.
- 1996
|
|
13
|
Fundamental Algorithms for a Declarative Pattern Matching System. Dissertation, Technische Fakultat, Universitat Bielefeld, available as Report 95-03
– Kurtz
- 1995
|
|
11
|
Introduction to Algorithms. MIT-Press
– Cormen, Leiserson, et al.
- 1990
|
|
9
|
Efficient on-line construction and correction of position trees
– Majster, Reiser
- 1980
|
|
8
|
String matching with constraints
– Crochemore
- 1988
|
|
5
|
Time optimal left to right construction of position trees
– Kempf, Bayer, et al.
- 1987
|
|
4
|
SuOEx trees in the functional programming paradigm
– Giegerich, Kurtz
- 1994
|
|
2
|
Embedding Sequence Analysis in the Functional Programming Paradigm -- A Feasibility Study
– Giegerich
- 1992
|
|
2
|
Mutable Abstract Data Types or How to Have Your State and Munge It Too
– Hudak
- 1993
|
|
2
|
On-line Construction of Suffix-Trees (revised version of [Ukk92b]). to appear in: Algorithmica, also available as
– Ukkonen
- 1993
|