Results 1  10
of
18
Lineartime construction of suffix arrays
 In Proc. 14th Symposium on Combinatorial Pattern Matching (CPM ’03
, 2003
"... Abstract. The time complexity of suffix tree construction has been shown to be equivalent to that of sorting: O(n) for a constantsize alphabet or an integer alphabet and O(n logn) for a general alphabet. However, previous algorithms for constructing suffix arrays have the time complexity of O(n l ..."
Abstract

Cited by 53 (1 self)
 Add to MetaCart
Abstract. The time complexity of suffix tree construction has been shown to be equivalent to that of sorting: O(n) for a constantsize alphabet or an integer alphabet and O(n logn) for a general alphabet. However, previous algorithms for constructing suffix arrays have the time complexity of O(n logn) even for a constantsize alphabet. In this paper we present a lineartime algorithm to construct suffix arrays for integer alphabets, which do not use suffix trees as intermediate data structures during its construction. Since the case of a constantsize alphabet can be subsumed in that of an integer alphabet, our result implies that the time complexity of directly constructing suffix arrays matches that of constructing suffix trees. 1
Efficient Approximate and Dynamic Matching of Patterns Using a Labeling Paradigm (Extended Abstract)
"... A key approach in string processing algorithmics has been the labeling paradigm [KMR72], which is based on assigning labels to some of the substrings of a given string. If these labels are chosen consistently, they can enable fast comparisons of substrings. Until the first optimal parallel algorithm ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
A key approach in string processing algorithmics has been the labeling paradigm [KMR72], which is based on assigning labels to some of the substrings of a given string. If these labels are chosen consistently, they can enable fast comparisons of substrings. Until the first optimal parallel algorithm for suffix tree construction was given in [SV94], the labeling paradigm was considered not to be competitive with other approaches. In this paper we show that, this general method is also useful for several central problems in the area of string processing: ffl Approximate String Matching, ffl Dynamic Dictionary Matching, ffl Dynamic Text Indexing. The approximate string matching problem deals with finding all substrings of a text which match a pattern "approximately", i.e., with at most m differences. The differences can be in the form of inserted, deleted, or replaced characters. The text indexing problem deals with finding all occurrences of a pattern in a text, after the text is prep...
Suffix Trees and their Applications in String Algorithms
, 1993
"... : The suffix tree is a compacted trie that stores all suffixes of a given text string. This data structure has been intensively employed in pattern matching on strings and trees, with a wide range of applications, such as molecular biology, data processing, text editing, term rewriting, interpreter ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
: The suffix tree is a compacted trie that stores all suffixes of a given text string. This data structure has been intensively employed in pattern matching on strings and trees, with a wide range of applications, such as molecular biology, data processing, text editing, term rewriting, interpreter design, information retrieval, abstract data types and many others. In this paper, we survey some applications of suffix trees and some algorithmic techniques for their construction. Special emphasis is given to the most recent developments in this area, such as parallel algorithms for suffix tree construction and generalizations of suffix trees to higher dimensions, which are important in multidimensional pattern matching. Work partially supported by the ESPRIT BRA ALCOM II under contract no. 7141 and by the Italian MURST Project "Algoritmi, Modelli di Calcolo e Strutture Informative". y Part of this work was done while the author was visiting AT&T Bell Laboratories. Email: grossi@di.uni...
Optimal Logarithmic Time Randomized Suffix Tree Construction
 In Proc 23rd ICALP
, 1996
"... The su#x tree of a string, the fundamental data structure in the area of combinatorial pattern matching, has many elegant applications. In this paper, we present a novel, simple sequential algorithm for the construction of su#x trees. We are also able to parallelize our algorithm so that we settl ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
The su#x tree of a string, the fundamental data structure in the area of combinatorial pattern matching, has many elegant applications. In this paper, we present a novel, simple sequential algorithm for the construction of su#x trees. We are also able to parallelize our algorithm so that we settle the main open problem in the construction of su#x trees: we give a Las Vegas CRCW PRAM algorithm that constructs the su#x tree of a binary string of length n in O(log n) time and O(n) work with high probability. In contrast, the previously known workoptimal algorithms, while deterministic, take# (log n) time.
Space and time efficient parallel algorithms and software for EST clustering
 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 2003
"... Expressed sequence tags, abbreviated as ESTs, are DNA molecules experimentally derived from expressed portions of genes. Clustering of ESTs is essential for gene recognition and for understanding important genetic variations such as those resulting in diseases. In this paper, we present the algorith ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Expressed sequence tags, abbreviated as ESTs, are DNA molecules experimentally derived from expressed portions of genes. Clustering of ESTs is essential for gene recognition and for understanding important genetic variations such as those resulting in diseases. In this paper, we present the algorithmic foundations and implementation of PaCE, a parallel software system we developed for largescale EST clustering. The novel features of our approach include 1) design of spaceefficient algorithms to limit the space required to linear in the size of the input data set, 2) a combination of algorithmic techniques to reduce the total work without sacrificing the quality of EST clustering, and 3) use of parallel processing to reduce runtime and facilitate clustering of large data sets. Using a combination of these techniques, we report the clustering of 327,632 rat ESTs in 47 minutes, and 420,694 Triticum aestivum ESTs in 3 hours and 15 minutes, using a 60processor IBM xSeries cluster. These problems are well beyond the capabilities of stateoftheart sequential software. We also present thorough experimental evaluation of our software including quality assessment using benchmark Arabidopsis EST data.
Optimal Parallel Construction of Minimal Suffix and Factor Automata
, 1995
"... This paper gives optimal parallel algorithms for the construction of the smallest deterministic finite automata recognizing all the suffixes and the factors of a string. The algorithms use recently discovered optimal parallel suffix tree construction algorithms together with data structures for t ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
This paper gives optimal parallel algorithms for the construction of the smallest deterministic finite automata recognizing all the suffixes and the factors of a string. The algorithms use recently discovered optimal parallel suffix tree construction algorithms together with data structures for the efficient manipulation of trees, exploiting the well known relation between suffix and factor automata and suffix trees.
A Parallel Algorithm for the Extraction of Structured Motifs
, 2004
"... In this work we propose a parallel algorithm for the efficient extraction of bindingsite consensus from genomic sequences. This algorithm, based on an existing approach, extracts structured motifs, that consist of an ordered collection of p ≥ 1 boxes with sizes and spacings between them spec ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
In this work we propose a parallel algorithm for the efficient extraction of bindingsite consensus from genomic sequences. This algorithm, based on an existing approach, extracts structured motifs, that consist of an ordered collection of p &ge; 1 boxes with sizes and spacings between them specified by given parameters. The contents of the boxes, which represent the extracted motifs, are unknown at the start of the process and are found by the algorithm using a suffix tree as the fundamental data structure. By partitioning the structured motif searching space we divide the most demanding part of the algorithm by a number of processors that can be loosely coupled. In this way we obtain, under conditions that are easily met, a speedup that is linear on the number of available processing units. This speedup is verified by both theoretical and experimental analysis, also presented in this paper.
Approximate Pattern Matching Using Locally Consistent Parsing
, 1997
"... A key approach in string processing algorithmics has been the labeling paradigm [KMR72], which is based on assigning labels to some of the substrings of a given string. If these labels are chosen consistently, they can enable fast comparisons of substrings. Until the first optimal parallel algorithm ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
A key approach in string processing algorithmics has been the labeling paradigm [KMR72], which is based on assigning labels to some of the substrings of a given string. If these labels are chosen consistently, they can enable fast comparisons of substrings. Until the first optimal parallel algorithm for suffix tree construction was given in [SV94], the labeling paradigm was considered not to be competitive with the most efficient approaches. In this paper we show that this general method can be used to obtain a linear time, deterministic algorithm for the Approximate String Matching problem. The approximate string matching problem deals with finding all substrings of a text which match a pattern "approximately", i.e., with at most m differences. The differences can be in the form of inserted, deleted, or replaced characters.
Lineartime construction of twodimensional suffix trees
 In Proceedings of the 26th International Colloquium on Automata, Languages and Programming (ICALP), volume 1644 of LNCS
, 1999
"... ..."
2004 ACM Symposium on Applied Computing A parallel algorithm for the extraction of structured motifs
"... In this work we propose a parallel algorithm for the efficient extraction of bindingsite consensus from genomic sequences. This algorithm, based on an existing approach, extracts structured motifs, that consist of an ordered collection of p ≥ 1 boxes with sizes and spacings between them specified b ..."
Abstract
 Add to MetaCart
In this work we propose a parallel algorithm for the efficient extraction of bindingsite consensus from genomic sequences. This algorithm, based on an existing approach, extracts structured motifs, that consist of an ordered collection of p ≥ 1 boxes with sizes and spacings between them specified by given parameters. The contents of the boxes, which represent the extracted motifs, are unknown at the start of the process and are found by the algorithm using a suffix tree as the fundamental data structure. By partitioning the structured motif searching space we divide the most demanding part of the algorithm by a number of processors that can be loosely coupled. In this way we obtain, under conditions that are easily met, a speedup that is linear on the number of available processing units. This speedup is verified by both theoretical and experimental analysis, also presented in this paper.