Results 1  10
of
34
Compressed suffix arrays and suffix trees with applications to text indexing and string matching
, 2005
"... The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for spaceefficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text T consisting of n symbols drawn from a fixed alphabet Σ. ..."
Abstract

Cited by 188 (17 self)
 Add to MetaCart
The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for spaceefficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text T consisting of n symbols drawn from a fixed alphabet Σ. The text T can be represented in n lg Σ  bits by encoding each symbol with lg Σ  bits. The goal is to support fast online queries for searching any string pattern P of m symbols, with T being fully scanned only once, namely, when the index is created at preprocessing time. The text indexing schemes published in the literature are greedy in terms of space usage: they require Ω(n lg n) additional bits of space in the worst case. For example, in the standard unit cost RAM, suffix trees and suffix arrays need Ω(n) memory words, each of Ω(lg n) bits. These indexes are larger than the text itself by a multiplicative factor of Ω(lg Σ  n), which is significant when Σ is of constant size, such as in ascii or unicode. On the other hand, these indexes support fast searching, either in O(m lg Σ) timeorinO(m +lgn) time, plus an outputsensitive cost O(occ) for listing the occ pattern occurrences. We present a new text index that is based upon compressed representations of suffix arrays and suffix trees. It achieves a fast O(m / lg Σ  n +lgɛ Σ  n) search time in the worst case, for any constant
Efficient Implementation of Lazy Suffix Trees
, 1999
"... We present an efficient implementation of a writeonly topdown construction for suffix trees. Our implementation is based on a new, spaceefficient representation of suffix trees that requires only 12 bytes per input character in the worst case, and 8.5 bytes per input character on average for a co ..."
Abstract

Cited by 38 (5 self)
 Add to MetaCart
We present an efficient implementation of a writeonly topdown construction for suffix trees. Our implementation is based on a new, spaceefficient representation of suffix trees that requires only 12 bytes per input character in the worst case, and 8.5 bytes per input character on average for a collection of files of different type. We show how to efficiently implement the lazy evaluation of suffix trees such that a subtree is evaluated only when it is traversed for the first time. Our experiments show that for the problem of searching many exact patterns in a fixed input string, the lazy topdown construction is often faster and more space efficient than other methods. Copyright c ○ 2003 John Wiley & Sons, Ltd. KEY WORDS: string matching; suffix tree; spaceefficient implementation; lazy evaluation
A Lower Bound for Parallel String Matching
 SIAM J. Comput
, 1993
"... This talk presents the derivation of an\Omega\Gamma/28 log m) lower bound on the number of rounds necessary for finding occurrences of a pattern string P [1::m] in a text string T [1::2m] in parallel using m comparisons in each round. The parallel complexity of the string matching problem using p ..."
Abstract

Cited by 25 (13 self)
 Add to MetaCart
This talk presents the derivation of an\Omega\Gamma/28 log m) lower bound on the number of rounds necessary for finding occurrences of a pattern string P [1::m] in a text string T [1::2m] in parallel using m comparisons in each round. The parallel complexity of the string matching problem using p processors for general alphabets follows. 1. Introduction Better and better parallel algorithms have been designed for stringmatching. All are on CRCWPRAM with the weakest form of simultaneous write conflict resolution: all processors which write into the same memory location must write the same value of 1. The best CREWPRAM algorithms are those obtained from the CRCW algorithms for a logarithmic loss of efficiency. Optimal algorithms have been designed: O(logm) time in [8, 17] and O(log log m) time in [4]. (An optimal algorithm is one with pt = O(n) where t is the time and p is the number of processors used.) Recently, Vishkin [18] developed an optimal O(log m) time algorithm. Unlike...
Symmetry Breaking for Suffix Tree Construction (Extended Abstract)
"... There are several serial algorithms for suffix tree construction which run in linear time, but the number of operations in the only parallel algorithm available, due to Apostolico, Iliopoulos, Landau, Schieber and Vishkin, is proportional to n log n. The algorithm is based on labeling substrings, s ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
There are several serial algorithms for suffix tree construction which run in linear time, but the number of operations in the only parallel algorithm available, due to Apostolico, Iliopoulos, Landau, Schieber and Vishkin, is proportional to n log n. The algorithm is based on labeling substrings, similar to a classical serial algorithm, with the same operations bound, by Karp, Miller and Rosenberg. We show how to break symmetries that occur in the process of assigning labels using the Deterministic Coin Tossing (DCT) technique, and thereby reduce the number of labeled substrings to linear. We give several algorithms for suffix tree construction. One of them runs in O(log² n) parallel time and O(n) work for input strings whose characters are drawn from a constant size alphabet.
Efficient Approximate and Dynamic Matching of Patterns Using a Labeling Paradigm (Extended Abstract)
"... A key approach in string processing algorithmics has been the labeling paradigm [KMR72], which is based on assigning labels to some of the substrings of a given string. If these labels are chosen consistently, they can enable fast comparisons of substrings. Until the first optimal parallel algorithm ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
A key approach in string processing algorithmics has been the labeling paradigm [KMR72], which is based on assigning labels to some of the substrings of a given string. If these labels are chosen consistently, they can enable fast comparisons of substrings. Until the first optimal parallel algorithm for suffix tree construction was given in [SV94], the labeling paradigm was considered not to be competitive with other approaches. In this paper we show that, this general method is also useful for several central problems in the area of string processing: ffl Approximate String Matching, ffl Dynamic Dictionary Matching, ffl Dynamic Text Indexing. The approximate string matching problem deals with finding all substrings of a text which match a pattern "approximately", i.e., with at most m differences. The differences can be in the form of inserted, deleted, or replaced characters. The text indexing problem deals with finding all occurrences of a pattern in a text, after the text is prep...
Optimal Parallel Suffix Tree Construction
, 1997
"... An O(m)work, O(m)space, O(log m)time CREWPRAM algorithm for constructing the suffix tree of a string s of length m drawn from any fixed alphabet set is obtained. This is the first known work and space optimal parallel algorithm for this problem. It can be generalized to a string s drawn fr ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
An O(m)work, O(m)space, O(log m)time CREWPRAM algorithm for constructing the suffix tree of a string s of length m drawn from any fixed alphabet set is obtained. This is the first known work and space optimal parallel algorithm for this problem. It can be generalized to a string s drawn from any general alphabet set to perform in O(log m) time and O(m log j\Sigmaj) work and space, after the characters in s have been sorted alphabetically, where j\Sigmaj is the number of distinct characters in s. In this case too, the algorithm is workoptimal.
Suffix Trees and their Applications in String Algorithms
, 1993
"... : The suffix tree is a compacted trie that stores all suffixes of a given text string. This data structure has been intensively employed in pattern matching on strings and trees, with a wide range of applications, such as molecular biology, data processing, text editing, term rewriting, interpreter ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
: The suffix tree is a compacted trie that stores all suffixes of a given text string. This data structure has been intensively employed in pattern matching on strings and trees, with a wide range of applications, such as molecular biology, data processing, text editing, term rewriting, interpreter design, information retrieval, abstract data types and many others. In this paper, we survey some applications of suffix trees and some algorithmic techniques for their construction. Special emphasis is given to the most recent developments in this area, such as parallel algorithms for suffix tree construction and generalizations of suffix trees to higher dimensions, which are important in multidimensional pattern matching. Work partially supported by the ESPRIT BRA ALCOM II under contract no. 7141 and by the Italian MURST Project "Algoritmi, Modelli di Calcolo e Strutture Informative". y Part of this work was done while the author was visiting AT&T Bell Laboratories. Email: grossi@di.uni...
Can Parallel Algorithms Enhance Serial Implementation? (Extended Abstract)
, 1996
"... The broad thesis presented in this paper suggests that the serial emulation of a parallel algorithm has the potential advantage of running on a serial machine faster than a standard serial algorithm for the same problem. It is too early to reach definite conclusions regarding the significance of th ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
The broad thesis presented in this paper suggests that the serial emulation of a parallel algorithm has the potential advantage of running on a serial machine faster than a standard serial algorithm for the same problem. It is too early to reach definite conclusions regarding the significance of this thesis. However, using some imagination, validity of the thesis and some arguments supporting it may lead to several farreaching outcomes: (1) Reliance on "predictability of reference" in the design of computer systems will increase. (2) Parallel algorithms will be taught as part of the standard computer science and engineering undergraduate curriculum irrespective of whether (or when) parallel processing will become ubiquitous in the generalpurpose computing world. (3) A strategic agenda for highperformance parallel computing: A multistage agenda, which in no stage compromises userfriendliness of the programmer 's...
Optimal Logarithmic Time Randomized Suffix Tree Construction
 In Proc 23rd ICALP
, 1996
"... The su#x tree of a string, the fundamental data structure in the area of combinatorial pattern matching, has many elegant applications. In this paper, we present a novel, simple sequential algorithm for the construction of su#x trees. We are also able to parallelize our algorithm so that we settl ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
The su#x tree of a string, the fundamental data structure in the area of combinatorial pattern matching, has many elegant applications. In this paper, we present a novel, simple sequential algorithm for the construction of su#x trees. We are also able to parallelize our algorithm so that we settle the main open problem in the construction of su#x trees: we give a Las Vegas CRCW PRAM algorithm that constructs the su#x tree of a binary string of length n in O(log n) time and O(n) work with high probability. In contrast, the previously known workoptimal algorithms, while deterministic, take# (log n) time.
Distributed generation of suffix arrays
 In 8th Annual Symposium on Combinatorial Pattern Matching
, 1997
"... ..."