Results 1  10
of
257
Suffix arrays: A new method for online string searches
 SIAM J. Comput
, 1993
"... Abstract. A new and conceptually simple data structure, called a suffix array, for online string searches is introduced in this paper. Constructing and querying suffix arrays is reduced to a sort and search paradigm that employs novel algorithms. The main advantage of suffix arrays over suffix tree ..."
Abstract

Cited by 646 (1 self)
 Add to MetaCart
Abstract. A new and conceptually simple data structure, called a suffix array, for online string searches is introduced in this paper. Constructing and querying suffix arrays is reduced to a sort and search paradigm that employs novel algorithms. The main advantage of suffix arrays over suffix trees is that, in practice, they use three to five times less space. From a complexity standpoint, suffix arrays permit online string searches of the type, "Is W a substring of A? " to be answered in time O(P + log N), where P is the length of W and N is the length of A, which is competitive with (and in some cases slightly better than) suffix trees. The only drawback is that in those instances where the underlying alphabet is finite and small, suffix trees can be constructed in O (N) time in the worst case, versus O (N log N) time for suffix arrays. However, an augmented algorithm is given that, regardless of the alphabet size, constructs suffix arrays in O (N) expected time, albeit with lesser.space efficiency. It is believed that suffix arrays will prove to be better in practice than suffix trees for many applications.
A Guided Tour to Approximate String Matching
 ACM Computing Surveys
, 1999
"... We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining t ..."
Abstract

Cited by 404 (38 self)
 Add to MetaCart
We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices according to each case. We conclude with some future work directions and open problems. 1
XRANK: Ranked Keyword Search over XML Documents
, 2003
"... We consider the problem of efficiently producing ranked results for keyword search queries over hyperlinked XML documents. Evaluating ..."
Abstract

Cited by 205 (1 self)
 Add to MetaCart
We consider the problem of efficiently producing ranked results for keyword search queries over hyperlinked XML documents. Evaluating
The LCA problem revisited
 In Latin American Theoretical INformatics
, 2000
"... Abstract. We present a very simple algorithm for the Least Common Ancestors problem. We thus dispel the frequently held notion that optimal LCA computation is unwieldy and unimplementable. Interestingly, this algorithm is a sequentialization of a previously known PRAM algorithm. 1 ..."
Abstract

Cited by 183 (6 self)
 Add to MetaCart
Abstract. We present a very simple algorithm for the Least Common Ancestors problem. We thus dispel the frequently held notion that optimal LCA computation is unwieldy and unimplementable. Interestingly, this algorithm is a sequentialization of a previously known PRAM algorithm. 1
An O(ND) Difference Algorithm and Its Variations
 Algorithmica
, 1986
"... The problems of finding a longest common subsequence of two sequences A and B and a shortest edit script for transforming A into B have long been known to be dual problems. In this paper, they are shown to be equivalent to finding a shortest/longest path in an edit graph. Using this perspective, a s ..."
Abstract

Cited by 155 (4 self)
 Add to MetaCart
The problems of finding a longest common subsequence of two sequences A and B and a shortest edit script for transforming A into B have long been known to be dual problems. In this paper, they are shown to be equivalent to finding a shortest/longest path in an edit graph. Using this perspective, a simple O(ND) time and space algorithm is developed where N is the sum of the lengths of A and B and D is the size of the minimum edit script for A and B. The algorithm performs well when differences are small (sequences are similar) and is consequently fast in typical applications. The algorithm is shown to have O(N +D expectedtime performance under a basic stochastic model. A refinement of the algorithm requires only O(N) space, and the use of suffix trees leads to an O(NlgN +D ) time variation.
Simple linear work suffix array construction
, 2003
"... Abstract. Suffix trees and suffix arrays are widely used and largely interchangeable index structures on strings and sequences. Practitioners prefer suffix arrays due to their simplicity and space efficiency while theoreticians use suffix trees due to lineartime construction algorithms and more exp ..."
Abstract

Cited by 149 (6 self)
 Add to MetaCart
Abstract. Suffix trees and suffix arrays are widely used and largely interchangeable index structures on strings and sequences. Practitioners prefer suffix arrays due to their simplicity and space efficiency while theoreticians use suffix trees due to lineartime construction algorithms and more explicit structure. We narrow this gap between theory and practice with a simple lineartime construction algorithm for suffix arrays. The simplicity is demonstrated with a C++ implementation of 50 effective lines of code. The algorithm is called DC3, which stems from the central underlying concept of difference cover. This view leads to a generalized algorithm, DC, that allows a spaceefficient implementation and, moreover, supports the choice of a space–time tradeoff. For any v ∈ [1, √ n], it runs in O(vn) time using O(n / √ v) space in addition to the input string and the suffix array. We also present variants of the algorithm for several parallel and hierarchical memory models of computation. The algorithms for BSP and EREWPRAM models are asymptotically faster than all previous suffix tree or array construction algorithms.
Optimal Suffix Tree Construction with Large Alphabets
, 1997
"... The suffix tree of a string is the fundamental data structure of combinatorial pattern matching. Weiner [Wei73], who introduced the data structure, gave an O(n) time algorithm algorithm for building the suffix tree of an n character string drawn from a constant size alphabet. In the comparison model ..."
Abstract

Cited by 129 (0 self)
 Add to MetaCart
The suffix tree of a string is the fundamental data structure of combinatorial pattern matching. Weiner [Wei73], who introduced the data structure, gave an O(n) time algorithm algorithm for building the suffix tree of an n character string drawn from a constant size alphabet. In the comparison model, there is a trivial\Omega\Gamma n log n) time lower bound based on sorting, and Weiner's algorithm matches this bound trivially. Since Weiner's paper, the main open question has been how to deal with integer alphabets. There is no superlinear lower bound, and the fastest known algorithm was the O(n log n) time comparison based algorithm. We settle this open problem by closing the gap: we build suffix trees in linear time for integer alphabet.
TreeJuxtaposer: scalable tree comparison using Focus+Context with guaranteed visibility
 ACM Transactions on Graphics
, 2003
"... Structural comparison of large trees is a difficult task that is only partially supported by current visualization techniques, which are mainly designed for browsing. We present TreeJuxtaposer, a system designed to support the comparison task for large trees of several hundred thousand nodes. We int ..."
Abstract

Cited by 108 (5 self)
 Add to MetaCart
Structural comparison of large trees is a difficult task that is only partially supported by current visualization techniques, which are mainly designed for browsing. We present TreeJuxtaposer, a system designed to support the comparison task for large trees of several hundred thousand nodes. We introduce the idea of “guaranteed visibility”, where highlighted areas are treated as landmarks that must remain visually apparent at all times. We propose a new methodology for detailed structural comparison between two trees and provide a new nearlylinear algorithm for computing the best corresponding node from one tree to another. In addition, we present a new rectilinear Focus+Context technique for navigation that is well suited to the dynamic linking of sidebyside views while guaranteeing landmark visibility and constant frame rates. These three contributions result in a system delivering a fluid exploration experience that scales both in the size of the dataset and the number of pixels in the display. We have based the design decisions for our system on the needs of a target audience of biologists who must understand the structural details of many phylogenetic, or evolutionary, trees. Our tool is also useful in many other application domains where tree comparison is needed, ranging from network management to call graph optimization to genealogy.
A few logs suffice to build (almost) all trees (I)
 II. THEORETICAL COMPUTER SCIENCE
, 1999
"... A phylogenetic tree (also called an "evolutionary tree") is a leaflabelled tree which represents the evolutionary history for a set of species, and the construction of such trees is a fundamental problem in biology. Here we address the issue of how many sequence sites are required in order to recov ..."
Abstract

Cited by 101 (24 self)
 Add to MetaCart
A phylogenetic tree (also called an "evolutionary tree") is a leaflabelled tree which represents the evolutionary history for a set of species, and the construction of such trees is a fundamental problem in biology. Here we address the issue of how many sequence sites are required in order to recover the tree with high probability when the sites evolve under standard Markovstyle i.i.d. mutation models. We provide analytic upper and lower bounds for the required sequence length, by developing a new (and polynomial time) algorithm. In particular we show that when the mutation probabilities are bounded the required sequence length can grow surprisingly slowly (a power of log n) in the number n of sequences, for almost all trees.
Ambivalent Data Structures For Dynamic 2EdgeConnectivity And k Smallest Spanning Trees
 SIAM J. Comput
, 1991
"... . Ambivalent data structures are presented for several problems on undirected graphs. These data structures are used in finding the k smallest spanning trees of a weighted undirected graph in O(m log #(m, n) + min{k 3/2 ,km 1/2 }) time, where m is the number of edges and n the number of vertice ..."
Abstract

Cited by 83 (1 self)
 Add to MetaCart
. Ambivalent data structures are presented for several problems on undirected graphs. These data structures are used in finding the k smallest spanning trees of a weighted undirected graph in O(m log #(m, n) + min{k 3/2 ,km 1/2 }) time, where m is the number of edges and n the number of vertices in the graph. The techniques are extended to find the k smallest spanning trees in an embedded planar graph in O(n + k(log n) 3 ) time. Ambivalent data structures are also used to dynamically maintain 2edgeconnectivity information. Edges and vertices can be inserted or deleted in O(m 1/2 ) time, and a query as to whether two vertices are in the same 2edgeconnected component can be answered in O(log n) time, where m and n are understood to be the current number of edges and vertices, respectively. Key words. analysis of algorithms, data structures, embedded planar graph, fully persistent data structures, k smallest spanning trees, minimum spanning tree, online updating, topology tr...