Results 1  10
of
11
Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract
 in Proceedings of the 32nd Annual ACM Symposium on the Theory of Computing
, 2000
"... Abstract. The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for spaceefficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text T consisting of n symbols drawn from a fixed al ..."
Abstract

Cited by 189 (17 self)
 Add to MetaCart
Abstract. The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for spaceefficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text T consisting of n symbols drawn from a fixed alphabet Σ. The text T can be represented in n lg Σ  bits by encoding each symbol with lg Σ  bits. The goal is to support fast online queries for searching any string pattern P of m symbols, with T being fully scanned only once, namely, when the index is created at preprocessing time. The text indexing schemes published in the literature are greedy in terms of space usage: they require Ω(n lg n) additional bits of space in the worst case. For example, in the standard unit cost RAM, suffix trees and suffix arrays need Ω(n) memory words, each of Ω(lg n) bits. These indexes are larger than the text itself by a multiplicative factor of Ω(lg Σ  n), which is significant when Σ is of constant size, such as in ascii or unicode. On the other hand, these indexes support fast searching, either in O(m lg Σ) timeorinO(m +lgn) time, plus an outputsensitive cost O(occ) for listing the occ pattern occurrences. We present a new text index that is based upon compressed representations of suffix arrays and suffix trees. It achieves a fast O(m / lg Σ  n +lgɛ Σ  n) search time in the worst case, for any constant
Dictionary matching and indexing with errors and don’t cares
 In STOC ’04
, 2004
"... This paper considers various flavors of the following online problem: preprocess a text or collection of strings, so that given a query string p, all matches of p with the text can be reported quickly. In this paper we consider matches in which a bounded number of mismatches are allowed, or in which ..."
Abstract

Cited by 50 (1 self)
 Add to MetaCart
This paper considers various flavors of the following online problem: preprocess a text or collection of strings, so that given a query string p, all matches of p with the text can be reported quickly. In this paper we consider matches in which a bounded number of mismatches are allowed, or in which a bounded number of “don’t care ” characters are allowed. The specific problems we look at are: indexing, in which there is a single text t, and we seek locations where p matches a substring of t; dictionary queries, in which a collection of strings is given upfront, and we seek those strings which match p in their entirety; and dictionary matching, in which a collection of strings is given upfront, and we seek those substrings of a (long) p which match an original string in its entirety. These are all instances of an alltoall matching problem, for which we provide a single solution. The performance bounds all have a similar character. For example, for the indexing problem with n = t  and m = p, the query time for k substitutions is O(m + (c1 log n) k k! # matches), with a data structure of size O(n (c2 log n) k k! and a preprocessing time of O(n (c2 log n) k), where c1, c2> k! 1 are constants. The deterministic preprocessing assumes a weakly nonuniform RAM model; this assumption is not needed if randomization is used in the preprocessing.
Indexing and Dictionary Matching with One Error (Extended Abstract)
, 1999
"... The indexing problem is the one where a text is preprocessed and subsequent queries of the form: "Find all occurrences of pattern P in the text" are answered in time proportional to the length of the query and the number of occurrences. In the dictionary matching problem a set of patterns is preproc ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
The indexing problem is the one where a text is preprocessed and subsequent queries of the form: "Find all occurrences of pattern P in the text" are answered in time proportional to the length of the query and the number of occurrences. In the dictionary matching problem a set of patterns is preprocessed and subsequent queries of the form: "Find all occurrences of dictionary patterns in text T" are answered in time proportional to the length of the text and the number of occurrences. There exist efficient worstcase solutions for the indexing problem and the dictionary matching problem, but none that find approximate occurrences of the patterns, i.e. where the pattern is within a bound edit (or hamming...
MultiMethod Dispatching: A Geometric Approach with Applications to String Matching Problems
, 1999
"... Current object oriented programming languages (OOPLs) rely on monomethod dispatching. Recent research has identified multimethods as a new, powerful feature to be added to OOPLs, and several experimental OOPLs now have multimethods. Their ultimate success and impact in practice depends, among ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
Current object oriented programming languages (OOPLs) rely on monomethod dispatching. Recent research has identified multimethods as a new, powerful feature to be added to OOPLs, and several experimental OOPLs now have multimethods. Their ultimate success and impact in practice depends, among other things, on whether multimethod dispatching can be supported efficiently. We show that the multimethod dispatching problem can be transformed to a geometric problem on multidimensional integer grids, for which we then develop a data structure that uses nearlinear space and has O(log log n) query time. This gives a solution whose performance almost matches that of the best known algorithm for standard monomethod dispatching. Our geometric data structure has other applications as well, namely in two string matching problems: matching multiple rectangular patterns against a rectangular query text, and approximate dictionary matching with edit distance at most one. Our results f...
Improved Bounds for Dictionary Lookup with One Error
 Information Processing Letters
, 2000
"... Given a dictionary S of n binary strings each of length m, we consider the problem of designing a data structure for S that supports dqueries; given a binary query string q of length m, a dquery reports if there exists a string in S within Hamming distance d of q. We construct a data structure for ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Given a dictionary S of n binary strings each of length m, we consider the problem of designing a data structure for S that supports dqueries; given a binary query string q of length m, a dquery reports if there exists a string in S within Hamming distance d of q. We construct a data structure for the case d = 1, that requires space O(n log m) and has query time O(1) in a cell probe model with word size m. This generalizes and improves the previous bounds of Yao and Yao for the problem in the bit probe model. The data structure can be constructed in randomized expected time O(nm). Key words: Data Structures, Dictionaries, Hashing, Hamming Distance 1 Introduction Minsky and Papert in 1969 posed the following problem, that has remained a challenge in data structure design [9]. Let S be a set of n binary strings of length m each. We want to construct a data structure for S that supports fast dqueries; that is, given a binary 1 BRICS (Basic Research in Computer Science), a Center o...
Dictionary LookUp Within Small Edit Distance
 In Proc. 8th Annual Intl. Computing and Combinatorics Conference (COCOON’02
, 2002
"... Let W be a dictionary consisting of n binary strings of length m each, represented as a trie. The usual dquery asks if there exists a string in W within Hamming distance d of a given binary query string q. We present an algorithm to determine if there is a member in W within edit distance d of ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Let W be a dictionary consisting of n binary strings of length m each, represented as a trie. The usual dquery asks if there exists a string in W within Hamming distance d of a given binary query string q. We present an algorithm to determine if there is a member in W within edit distance d of a given query string q of length m. The method takes time O(dm ) in the RAM model, independent of n, and requires O(dm) additional space.
Efficient approximate dictionary lookup over small alphabets
, 2005
"... Given a dictionary W consisting of n binary strings of length m each, a dquery asks if there exists a string in W within Hamming distance d of a given binary query string q. The problem was posed by Minsky and Papert in 1969 [10] as a challenge to data structure design. Efficient solutions have bee ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Given a dictionary W consisting of n binary strings of length m each, a dquery asks if there exists a string in W within Hamming distance d of a given binary query string q. The problem was posed by Minsky and Papert in 1969 [10] as a challenge to data structure design. Efficient solutions have been developed only for the special case when d = 1 (the 1query problem). We assume the standard RAM model of computation, and consider the case of the problem when alphabet size is arbitrary but finite, and d is small. We preprocess the dictionary, and construct an edgelabelled tree with bounded branching factor, and height. We present an algorithm to answer dictionary lookup within given distance d of a given query string q. The algorithm is efficient when the alphabet size is small, or the dictionary is sparse. In particular, for the dquery problem the algorithm takes time O(m(log 4/3 n − 1) d (log 2 n) d+1). This is an improvement over previously known algorithms for the dquery problem when d> 1. We also generalize the results for the case of the problem when edit distances are used. The algorithm can be modified such that it allows for words of different lengths as well as different lengths of query strings. 1
Large Scale Hamming Distance Query Processing
"... Abstract—Hamming distance has been widely used in many application domains, such as nearduplicate detection and pattern recognition. We study Hamming distance range query problems, where the goal is to find all strings in a database that are within a Hamming distance bound k from a query string. If ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract—Hamming distance has been widely used in many application domains, such as nearduplicate detection and pattern recognition. We study Hamming distance range query problems, where the goal is to find all strings in a database that are within a Hamming distance bound k from a query string. If k is fixed, we have a static Hamming distance range query problem. If k is part of the input, we have a dynamic Hamming distance range query problem. For the static problem, the prior art uses lots of memory due to its aggressive replication of the database. For the dynamic range query problem, as far as we know, there is no space and time efficient solution for arbitrary databases. In this paper, we first propose a static Hamming distance range query algorithm called HEngine s, which addresses the space issue in prior art by dynamically expanding the query on the fly. We then propose a dynamic Hamming distance range query algorithm called HEngine d, which addresses the limitation in prior art using a divideandconquer strategy. We implemented our algorithms and conducted sidebyside comparisons on large realworld and synthetic datasets. In our experiments, HEngine s uses 4.65 times less space and processes queries 16 % faster than the prior art, and HEngine d processes queries 46 times faster than linear scan while using only 1.7 times more space. A. Background I.
www.elsevier.com/locate/ipl Improved approximate common interval
, 2007
"... During the course of evolution, speciation results in the divergence of genomes that initially have the same gene order and content. If there is no selective pressure, successive rearrangements that are common in prokaryotic ..."
Abstract
 Add to MetaCart
During the course of evolution, speciation results in the divergence of genomes that initially have the same gene order and content. If there is no selective pressure, successive rearrangements that are common in prokaryotic
This document in subdirectory RS/99/50 / Improved Bounds for Dictionary Lookup with One Error
, 1999
"... Reproduction of all or part of this work is permitted for educational or research use on condition that this copyright notice is included in any copy. See back inner page for a list of recent BRICS Report Series publications. Copies may be obtained by contacting: BRICS ..."
Abstract
 Add to MetaCart
Reproduction of all or part of this work is permitted for educational or research use on condition that this copyright notice is included in any copy. See back inner page for a list of recent BRICS Report Series publications. Copies may be obtained by contacting: BRICS