## A Comparison of Imperative and Purely Functional Suffix Tree Constructions (1995)

Venue: | Science of Computer Programming |

Citations: | 21 - 7 self |

### BibTeX

@INPROCEEDINGS{Giegerich95acomparison,

author = {Robert Giegerich and Stefan Kurtz},

title = {A Comparison of Imperative and Purely Functional Suffix Tree Constructions},

booktitle = {Science of Computer Programming},

year = {1995},

pages = {187--218}

}

### Years of Citing Articles

### OpenURL

### Abstract

We explore the design space of implementing suffix tree algorithms in the functional paradigm. We review the linear time and space algorithms of McCreight and Ukkonen. Based on a new terminology of nested suffixes and nested prefixes, we give a simpler and more declarative explanation of these algorithms than was previously known. We design two "naive" versions of these algorithms which are not linear time, but use simpler data structures, and can be implemented in a purely functional style. Furthermore, we present a new, "lazy" suffix tree construction which is even simpler. We evaluate both imperative and functional implementations of these algorithms. Our results show that the naive algorithms perform very favourably, and in particular, the lazy construction compares very well to all the others. 1 Introduction Suffix trees are the method of choice when a large sequence of symbols, the "text", is to be searched frequently for occurrences of short sequences, the "patterns". Given tha...

### Citations

2573 |
The design and analysis of computer algorithms
- Aho, Hopcroft, et al.
- 1974
(Show Context)
Citation Context ...requirement down to O(n). However, subtree sharing may be impossible, when leaves are to be annotated with extra information. 2. pst(t) has O(n 2 ) nodes in the worst case (e.g. t = a n b n a n b n ) =-=[AHU74]-=-. Under realistic assumptions, the expected number of nodes is O(n) [AS92]. 3. cst(t) has O(n) nodes, as all inner nodes are branching, and there are at most n leaves. The edge labels can be represent... |

1363 | The Essence of Functional Programming
- Wadler
- 1992
(Show Context)
Citation Context ...However, the suffix tree undergoing a sequence of updates satisfies the condition of singlethreadedness. No copy of an intermediate tree is used elsewhere in the program. Thus, recent ideas on monads =-=[Wad92]-=- or mutable data types [Hud93] for functional programming languages incorporating local, in-place updates, apply to this case. A change of the data structure becomes necessary. The tree is represented... |

686 | Suffix arrays: a new method for on-line string searches - Manber, EW - 1993 |

574 |
A Space-Economical Suffix Tree Construction Algorithm
- McCreight
- 1976
(Show Context)
Citation Context ...T t (s) denotes the A + -tree, such that w occurs in T t (s), if and only if there is a suffix s 0 of t, such that js 0 jsjsj and w is a prefix of s 0 . The general structure of McCreight's algorithm =-=[McC76]-=- is to compute cst(t) by successively inserting the suffixes s of t into the tree. Notice that the intermediate trees are not suffix trees. More precisely, given T t (as), where as is a suffix of t, t... |

452 | Linear Pattern Matching Algorithm
- Weiner
- 1973
(Show Context)
Citation Context ...pre-paid, it is fortunate that suffix trees can be built in O(n) time and represented in O(n) space, where n is the length of t. Suffix tree construction algorithms have a long history, starting with =-=[Wei73]-=-. The construction in that paper is given in a somewhat obscure terminology. Later authors [McC76, MR80, CS85] have developed more transparent constructions, sometimes tailored to specific additional ... |

168 |
Approximate string matching with q-grams and maximal matches
- Ukkonen
- 1992
(Show Context)
Citation Context ...r work on a flexible pattern-matching system for biosequence analysis [Gie92]. Besides for locating subwords, suffix trees are useful for finding repetitions and palindromes, deriving q-gram profiles =-=[Ukk92a]-=-, and calculating the so-called matching statistics as a prerequisite for fast approximate matching [CL90]. Further typical problems are searching the text in reverse, searching its genetic complement... |

118 |
The myriad virtues of subword trees
- Apostolico
- 1985
(Show Context)
Citation Context ... a pattern p can be located in O(jpj) steps, independent of the length of t. This efficient access to all subwords of t has made suffix trees a ubiquitous data structure in a "myriad" of app=-=lications [Apo85]-=-. Since suffix tree construction is the price to be pre-paid, it is fortunate that suffix trees can be built in O(n) time and represented in O(n) space, where n is the length of t. Suffix tree constru... |

117 |
Elements of Functional Programming
- Reade
- 1989
(Show Context)
Citation Context ...tion addLinks, as shown in Figure 8 does so by a single tree traversal. With the suffix links, the tree actually becomes a circular structure. addLinks uses the technique of "computing with unkno=-=wns" [Rea89]-=- and hence mandates a lazy programming language. A related technique applies to eager languages. The implementation of nodes with lists of subtrees again introduces a factor l, where l is the average ... |

77 | From ukkonen to McCreight and weiner: A unifying view of linear-time suffix tree construction. Algorithmica
- Giegerich, Kurtz
- 1997
(Show Context)
Citation Context ...of execution time. This is remarkable, since the two algorithms are based on rather different ideas. It turned out that they are much more closely related than one would expect. This is explicated in =-=[GK94a]-=-. 5.5 Locality Effects When we benchmark our programs on the abstraction level of asymptotic analysis, e.g. by inserting counters for characteristic steps, our runtime statistics precisely reproduce t... |

52 | Constructing suffix trees on-line in linear time - Ukkonen - 1992 |

49 |
Approximate string matching in sublinear expected time
- Chang, Lawler
- 1990
(Show Context)
Citation Context ..., suffix trees are useful for finding repetitions and palindromes, deriving q-gram profiles [Ukk92a], and calculating the so-called matching statistics as a prerequisite for fast approximate matching =-=[CL90]-=-. Further typical problems are searching the text in reverse, searching its genetic complement, or searching in an abstraction of the original text (such as the purine/pyrimidine abstraction of the nu... |

45 | Implementing haskell overloading
- Augustsson
- 1993
(Show Context)
Citation Context ... to those of s. This could account for a small speed advantage of naiveOnline. We present some empirical results with the functional implementations shown above. We used the Chalmers Haskell compiler =-=[Aug93]-=-. Note that there may be Haskell compilers that produce better code [HL93]. All algorithms were measured on random texts (Bernoullidistribution) over alphabets with various sizes (k = 4; 20; 50; 90), ... |

38 | Memory subsystem performance of programs using copying garbage collection
- Diwan, Tarditi, et al.
- 1994
(Show Context)
Citation Context ...e: Anomaly A ukk and mcc look slightly superlinear, even the worse for increasing k. Anomaly B On the contrary, lazyTree looks closer to linear, and even the better for increasingsk. It is well known =-=[DTM94]-=- that on today's pipelined processors with multi-level caching, the performance of the memory subsystem can significantly affect a program's execution time. The size ratio between the resident page se... |

31 |
Ecient and Elegant Subword Tree Construction
- Chen, Seiferas
- 1985
(Show Context)
Citation Context ...sj and p is a prefix of s 0 . In other words, a nested prefix is empty or has another occurrence as a prefix of a longer suffix of t. 2 1 both are faster than the offline constructions of [Wei73] and =-=[CS85]-=-, which are also O(n). 2 Note that this definition is not symmetric, as it refers to suffixes of t, and prefixes of suffixes of t. 3 The Suffix Tree Family We give a rather liberal definition of suffi... |

27 | Self-Alignments in Words and Their Applications
- Apostolico, Szpankowski
- 1992
(Show Context)
Citation Context ...leaves are to be annotated with extra information. 2. pst(t) has O(n 2 ) nodes in the worst case (e.g. t = a n b n a n b n ) [AHU74]. Under realistic assumptions, the expected number of nodes is O(n) =-=[AS92]-=-. 3. cst(t) has O(n) nodes, as all inner nodes are branching, and there are at most n leaves. The edge labels can be represented in constant space by a pair of indices into t. This is necessary to ach... |

19 | Benchmarking implementations of functional languages with “Pseudoknot”, a floatintensive benchmark
- Hartel, Feeley, et al.
- 1996
(Show Context)
Citation Context ...ne. We present some empirical results with the functional implementations shown above. We used the Chalmers Haskell compiler [Aug93]. Note that there may be Haskell compilers that produce better code =-=[HL93]-=-. All algorithms were measured on random texts (Bernoullidistribution) over alphabets with various sizes (k = 4; 20; 50; 90), running on a SPARCstation 10/41 with 32 MB. We also confirmed our measurem... |

16 |
Introduction to Algorithms. MIT-Press
- Cormen, Leiserson, et al.
- 1996
(Show Context)
Citation Context ...implementation of lazyTree is an interesting topic of its own right, and we have not yet fully explored all alternatives. Our current version retains the basic recursion structure, uses counting sort =-=[CLR90]-=- for grouping suffixes according to first letters, and a naive function to determine longest common prefixes of those suffixes starting with the same letter. 5.2 Ukkonen's online Suffix Tree Construct... |

13 |
Fundamental Algorithms for a Declarative Pattern Matching System. Dissertation, Technische Fakultat, Universitat Bielefeld, available as Report 95-03
- Kurtz
- 1995
(Show Context)
Citation Context ...In this way, we have recently achieved a linear-time, purely functional suffix tree construction in monadic style. Its detailed analysis is outside the scope of this paper, and we refer the reader to =-=[Kur95]-=-. Given mutable arrays, we can also make the functional version of lazyTree independent of the alphabet factor. Recalling our discussion on locality in section 5.5, there is another virtue of lazyTree... |

11 | Efficient on-line construction and correction of position trees - Majster, Reiser - 1980 |

10 | String-matching with constraints - Crochemore - 1988 |

5 |
Sux Trees in the Functional Programming Paradigm
- Giegerich, Kurtz
- 1994
(Show Context)
Citation Context ... It processes the text from left to right, and hence, in this sense it is incremental. What more can one ask for? A part of this work, concentrating on the functional implementations, has occurred as =-=[GK94b]-=-. Our interest in suffix trees is motivated by our work on a flexible pattern-matching system for biosequence analysis [Gie92]. Besides for locating subwords, suffix trees are useful for finding repet... |

5 | Time optimal left to right construction of position trees - Kempf, Bayer, et al. - 1987 |

2 | Embedding Sequence Analysis in the Functional Programming Paradigm -- A Feasibility Study
- Giegerich
- 1992
(Show Context)
Citation Context ...is work, concentrating on the functional implementations, has occurred as [GK94b]. Our interest in suffix trees is motivated by our work on a flexible pattern-matching system for biosequence analysis =-=[Gie92]-=-. Besides for locating subwords, suffix trees are useful for finding repetitions and palindromes, deriving q-gram profiles [Ukk92a], and calculating the so-called matching statistics as a prerequisite... |

2 |
Mutable Abstract Data Types or How to Have Your State and Munge It Too
- Hudak
- 1993
(Show Context)
Citation Context ...going a sequence of updates satisfies the condition of singlethreadedness. No copy of an intermediate tree is used elsewhere in the program. Thus, recent ideas on monads [Wad92] or mutable data types =-=[Hud93]-=- for functional programming languages incorporating local, in-place updates, apply to this case. A change of the data structure becomes necessary. The tree is represented by mutable arrays, closely re... |

2 |
On-line Construction of Suffix-Trees (revised version of [Ukk92b]). to appear in: Algorithmica, also available as
- Ukkonen
- 1993
(Show Context)
Citation Context ...nology. Later authors [McC76, MR80, CS85] have developed more transparent constructions, sometimes tailored to specific additional requirements. The endpoint of the development is currently marked by =-=[Ukk93]-=-, presenting a simpler construction in O(n) space and time, which additionally is online. It processes the text from left to right, and hence, in this sense it is incremental. What more can one ask fo... |