Results 1  10
of
10
Succinct ordinal trees with levelancestor queries
 In SODA ’04: Proceedings of the Fifteenth annual ACMSIAM Symposium on Discrete Algorithms
, 2004
"... We consider succinct or spaceefficient representations of trees that efficiently support a variety of navigation operations. We focus on static ordinal trees, i.e., arbitrary static rooted trees where the children of each node are ordered. The set of operations is essentially the union of the sets ..."
Abstract

Cited by 41 (5 self)
 Add to MetaCart
We consider succinct or spaceefficient representations of trees that efficiently support a variety of navigation operations. We focus on static ordinal trees, i.e., arbitrary static rooted trees where the children of each node are ordered. The set of operations is essentially the union of the sets of operations supported by previous succinct
Random Access to GrammarCompressed Strings
, 2011
"... Let S be a string of length N compressed into a contextfree grammar S of size n. We present two representations of S achieving O(log N) random access time, and either O(n · αk(n)) construction time and space on the pointer machine model, or O(n) construction time and space on the RAM. Here, αk(n) is ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Let S be a string of length N compressed into a contextfree grammar S of size n. We present two representations of S achieving O(log N) random access time, and either O(n · αk(n)) construction time and space on the pointer machine model, or O(n) construction time and space on the RAM. Here, αk(n) is the inverse of the k th row of Ackermann’s function. Our representations also efficiently support decompression of any substring in S: we can decompress any substring of length m in the same complexity as a single random access query and additional O(m) time. Combining these results with fast algorithms for uncompressed approximate string matching leads to several efficient algorithms for approximate string matching on grammarcompressed strings without decompression. For instance, we can find all approximate occurrences of a pattern P with at most k errors in time O(n(min{P k, k 4 + P } + log N) + occ), where occ is the number of occurrences of P in S. Finally, we are able to generalize our results to navigation and other operations on grammarcompressed trees. All of the above bounds significantly improve the currently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data structure, two ”biased” weighted ancestor data structures, and a compact representation of heavypaths in grammars.
Property matching and weighted matching
 In CPM
, 2006
"... In many pattern matching applications the text has some properties attached to various of its parts. Pattern Matching with Properties (Property Matching, for short), involves a string matching between the pattern and the text, and the requirement that the text part satisfies some property. Some imme ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
In many pattern matching applications the text has some properties attached to various of its parts. Pattern Matching with Properties (Property Matching, for short), involves a string matching between the pattern and the text, and the requirement that the text part satisfies some property. Some immediate examples come from molecular biology where it has long been a practice to consider special areas in the genome by their structure. It is straightforward to do sequential matching in a text with properties. However, indexing in a text with properties becomes difficult if we desire the time to be output dependent. We present an algorithm for indexing a text with properties in O(n log Σ  + n log log n) time for preprocessing and O(P  log Σ  + toccπ) per query, where n is the length of the text, P is the sought pattern, and toccπ is the number of occurrences of the pattern that satisfy some property π. As a practical use of Property Matching we show how to solve Weighted Matching problems using techniques from Property Matching. Weighted sequences have been recently introduced as a tool to handle a set of sequences that are not identical but have many local similarities. The weighted sequence is a “statistical image ” of this set, where we are given the probability of every symbol’s occurrence at every text location. Weighted matching problems are pattern matching problems where the given text is weighted. We present a reduction from Weighted Matching to Property Matching that allows offtheshelf solutions to numerous weighted matching problems including indexing (which is nontrivial without this reduction). Assuming that one seeks the occurrence of pattern P with probability ɛ in weighted text T of length n, we reduce the problem to a property matching problem of pattern P in text T ′ of length O(n ( 1 ɛ)2 log 1 ɛ). 1
Efficient string matching algorithms for combinatorial universal denoising
 In Proc. of IEEE Data Compression Conference (DCC), Snowbird
, 2005
"... Inspired by the combinatorial denoising method DUDE [13], we present efficient algorithms for implementing this idea for arbitrary contexts or for using it within subsequences. We also propose effective, efficient denoising error estimators so we can find the best denoising of an input sequence over ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Inspired by the combinatorial denoising method DUDE [13], we present efficient algorithms for implementing this idea for arbitrary contexts or for using it within subsequences. We also propose effective, efficient denoising error estimators so we can find the best denoising of an input sequence over different context lengths. Our methods are simple, drawing from string matching methods and radix sorting. We also present experimental results of our proposed algorithms. 1
The architecture of a software library for string processing
, 1997
"... We present our project to develop a software library of basic tools and data structures for string processing. Our goal is to provide an environment for testing new algorithms as well as for prototyping. The library has a natural hierarchy comprising basic objects such as the alphabet and strings, d ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We present our project to develop a software library of basic tools and data structures for string processing. Our goal is to provide an environment for testing new algorithms as well as for prototyping. The library has a natural hierarchy comprising basic objects such as the alphabet and strings, data structures to manipulate these objects, and powerful algorithmic techniques driving these data structures. Furthermore, it has the natural taxonomy imposed by the underlying string processing tasks (such as static/dynamic, offline/online, exact/approximate). We believe that our architecture presents a unified view of string processing encompassing recently developed techniques and insights this may be of independent interest to those who seek an introduction to this field. Our design is preliminary and we hope to refine it based on feedback.
Partitioning a Polygon into Two Mirror Congruent Pieces
"... Polygon decomposition problems are well studied in the literature [6], yet many variants of these problems remain open. In this paper, we are interested in partitioning a polygon into mirror congruent pieces. Symmetry detection ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Polygon decomposition problems are well studied in the literature [6], yet many variants of these problems remain open. In this paper, we are interested in partitioning a polygon into mirror congruent pieces. Symmetry detection
CrossDocument Pattern Matching
 COMBINATORIAL PATTERN MATCHING, HELSINKI: FINLAND
, 2012
"... We study a new variant of the string matching problem called crossdocument string matching, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern itself is a substring of another document. Several variants ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We study a new variant of the string matching problem called crossdocument string matching, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern itself is a substring of another document. Several variants of this problem are considered, and efficient linearspace solutions are proposed with query time bounds that either do not depend at all on the pattern size or depend on it in a very limited way (doubly logarithmic). As a side result, we propose an improved solution to the weighted level ancestor problem.
Pattern matching in LempelZiv compressed strings: fast, simple, and deterministic
"... Countless variants of the LempelZiv compression are widely used in many reallife applications. This paper is concerned with a natural modification of the classical pattern matching problem inspired by the popularity of such compression methods: given an uncompressed pattern s[1.. m] and a LempelZ ..."
Abstract
 Add to MetaCart
Countless variants of the LempelZiv compression are widely used in many reallife applications. This paper is concerned with a natural modification of the classical pattern matching problem inspired by the popularity of such compression methods: given an uncompressed pattern s[1.. m] and a LempelZiv representation of a string t[1.. N], does 2 N n s occur in t? Farach and Thorup [6] gave a randomized O(n log + m) time solution for this problem, where n is the size of the compressed representation of t. Building on the methods of [4] and [7], we improve their result by developing a faster and fully deterministic O(n log N n +m) might be of order n, so for such inputs the improvement is very significant. A (tiny) fragment of our method can be used to give an asymptotically optimal solution for the substring hashing problem considered by Farach and Muthukrishnan [5].
The Property Suffix Tree with Dynamic Properties
"... Recently there has been much interest in the Property Indexing Problem ([1],[7],[8]), where one is interested to preprocess a text T of size n over alphabet Σ (which we assume is of constant size), and a set of intervals π over the text positions, such that give a query pattern P of size m we can re ..."
Abstract
 Add to MetaCart
Recently there has been much interest in the Property Indexing Problem ([1],[7],[8]), where one is interested to preprocess a text T of size n over alphabet Σ (which we assume is of constant size), and a set of intervals π over the text positions, such that give a query pattern P of size m we can report all of the occurrences of P in T which are completely contained within some interval from π. This type of matching is extremely helpful in scenarios in molecular biology where it has long been a practice to consider special areas in the genome by their structure. The work done so far has focused on the static version of this problem where the intervals are given apriori and never changed. This paper is the first to focus on several dynamic settings of π including an incremental version where new intervals are inserted into π, decremental version where intervals are deleted from π, fully dynamic version where intervals may be inserted or deleted to or from π, or batched insertions where a set of intervals is inserted into π. In particular, the batched version provides us with a new (optimal) algorithm for the static case. 1
Computing Discriminating and Generic Words
 STRING PROCESSING AND INFORMATION RETRIEVAL, CARTAGENA DE INDIAS: COLOMBIA
, 2012
"... We study the following three problems of computing generic or discriminating words for a given collection of documents. Given a pattern P and a threshold d, we want to report (i) all longest extensions of P which occur in at least d documents, (ii) all shortest extensions of P which occur in less th ..."
Abstract
 Add to MetaCart
We study the following three problems of computing generic or discriminating words for a given collection of documents. Given a pattern P and a threshold d, we want to report (i) all longest extensions of P which occur in at least d documents, (ii) all shortest extensions of P which occur in less than d documents, and (iii) all shortest extensions of P which occur only in d selected documents. For these problems, we propose efficient algorithms based on suffix trees and using advanced data structure techniques. For problem (i), we propose an optimal solution with constant running time per output word.