Results 1  10
of
25
Approximate String Matching over ZivLempel Compressed Text
, 2000
"... We present the first nontrivial algorithm for approximate pattern matching on compressed text. The format we choose is the ZivLempel family. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text allowing up to k inse ..."
Abstract

Cited by 43 (13 self)
 Add to MetaCart
We present the first nontrivial algorithm for approximate pattern matching on compressed text. The format we choose is the ZivLempel family. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text allowing up to k insertions, deletions and substitutions. On LZ78/LZW we need O(mkn + R) time in the worst case and O(k ) +R) on average where is the alphabet size. The experimental results show a practical speedup over the basic approach of up to 2X for moderate m and small k. We extend the algorithms to more general compression formats and approximate matching models.
Application of LempelZiv factorization to the approximation of grammarbased compression
, 2003
"... We introduce new type of contextfree grammars, AVLgrammars, and show theirappl7#B#BZ87 to grammarbased compression. Using this type of grammars we present O(nl7 time and O(lZ n)ratio approximation ofminimal grammarbased compression of a given string oflZM,k n over anal,UMJ, # and O(klU n) t ..."
Abstract

Cited by 42 (1 self)
 Add to MetaCart
We introduce new type of contextfree grammars, AVLgrammars, and show theirappl7#B#BZ87 to grammarbased compression. Using this type of grammars we present O(nl7 time and O(lZ n)ratio approximation ofminimal grammarbased compression of a given string oflZM,k n over anal,UMJ, # and O(klU n) time transformation of LZ77 encoding of size k into a grammarbased encoding of size O(klU n).
A General Practical Approach to Pattern Matching over ZivLempel Compressed Text
, 1998
"... . We address the problem of string matching on ZivLempel compressed text. The goal is to search a pattern in a text without uncompressing it. This is a highly relevant issue to keep compressed text databases where efficient searching is still possible. We develop a general technique for string matc ..."
Abstract

Cited by 42 (8 self)
 Add to MetaCart
. We address the problem of string matching on ZivLempel compressed text. The goal is to search a pattern in a text without uncompressing it. This is a highly relevant issue to keep compressed text databases where efficient searching is still possible. We develop a general technique for string matching when the text comes as a sequence of blocks. This abstracts the essential features of ZivLempel compression. We then apply the scheme to each particular type of compression. We present the first algorithm to find all the matches of a pattern in a text compressed using LZ77. When we apply our scheme to LZ78, we obtain a much more efficient search algorithm, which is faster than uncompressing the text and then searching on it. Finally, we propose a new hybrid compression scheme which is between LZ77 and LZ78, being in practice as good to compress as LZ77 and as fast to search in as LZ78. 1 Introduction String matching is one of the most pervasive problems in computer science, with appli...
Recognizing string graphs in NP
 J. of Computer and System Sciences
"... A string graph is the intersection graph of a set of curves in the plane. Each curve is represented by a vertex, and an edge between two vertices means that the corresponding curves intersect. We show that string graphs can be recognized in NP. The recognition problem was not known to be decidable u ..."
Abstract

Cited by 25 (4 self)
 Add to MetaCart
A string graph is the intersection graph of a set of curves in the plane. Each curve is represented by a vertex, and an edge between two vertices means that the corresponding curves intersect. We show that string graphs can be recognized in NP. The recognition problem was not known to be decidable until very recently, when two independent papers established exponential upper bounds on the number of intersections needed to realize a string graph (Pach and Tóth, 2001; Schaefer and ˇ Stefankovič, 2001). These results implied that the recognition problem lies in NEXP. In the present paper we improve this by showing that the recognition problem for string graphs is in NP, and therefore NPcomplete, since Kratochvíl showed that the recognition problem is NPhard (Kratochvíl, 1991b). The result has consequences for the computational complexity of problems in graph drawing, and topological inference. We also show that the string graph problem is decidable for surfaces of arbitrary genus. Key words: String graphs, NPcompleteness, graph drawing, topological inference, Euler diagrams
Multiple Pattern Matching in LZW Compressed Text
 In Proc. DCC'98
, 1998
"... In this paper we address the problem of searching in LZW compressed text directly, and present a new algorithm for finding multiple patterns bysimulating the moveofthe AhoCorasick pattern matching machine. The new algorithm finds all occurrences of multiple patterns whereas the algorithm propose ..."
Abstract

Cited by 23 (9 self)
 Add to MetaCart
In this paper we address the problem of searching in LZW compressed text directly, and present a new algorithm for finding multiple patterns bysimulating the moveofthe AhoCorasick pattern matching machine. The new algorithm finds all occurrences of multiple patterns whereas the algorithm proposed by Amir, Benson, and Farach finds only the first occurrence of a single pattern.
A Unifying Framework for Compressed Pattern Matching
 In Proc. 6th International Symp. on String Processing and Information Retrieval
, 1999
"... We introduce a general framework which is suitable to capture an essence of compressed pattern matching according to various dictionary based compressions, and propose a compressed pattern matching algorithm for the framework. The goal is to find all occurrences of a pattern in a text without decomp ..."
Abstract

Cited by 22 (6 self)
 Add to MetaCart
We introduce a general framework which is suitable to capture an essence of compressed pattern matching according to various dictionary based compressions, and propose a compressed pattern matching algorithm for the framework. The goal is to find all occurrences of a pattern in a text without decompression, which is one of the most active topics in string matching. Our framework includes such compression methods as LempelZiv family, (LZ77, LZSS, LZ78, LZW), bytepair encoding, and the static dictionary based method. Technically, our pattern matching algorithm extends that for LZW compressed text presented by Amir, Benson and Farach. 1 Introduction Pattern matching is one of the most fundamental operations in string processing. The problem is to find all occurrences of a given pattern in a given text. A lot of classical or advanced pattern matching algorithms have been proposed (see [3, 2]). Data compression is another most important research topic, whose aim is to reduce its space u...
An Improved Pattern Matching Algorithm for Strings in Terms of StraightLine Programs
 In Proc. 8th Ann. Symp. on Combinatorial Pattern Matching, volume 1264 of Lecture Notes in Computer Science
, 1997
"... We show an efficient pattern matching algorithm for strings that are succinctly described in terms of straightline programs, in which the constants are symbols and the only operation is the concatenation. In this paper, both text T and pattern P are given by straightline programs T and P . The len ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
We show an efficient pattern matching algorithm for strings that are succinctly described in terms of straightline programs, in which the constants are symbols and the only operation is the concatenation. In this paper, both text T and pattern P are given by straightline programs T and P . The length of the text T (pattern P , resp.) may grows exponentially with respect to its description size jT j = n (jPj = m, resp.). We show a new combinatorial property concerning with the periodic occurrences in a text. Based on this property, we develop an O(n 2 m 2 ) time algorithm using O(nm) space, which outputs a compact representation of all occurrences of P in T . This is superior to the algorithm proposed by Karpinski et al.[11], which runs in O((n +m) 4 log (n +m)) time using O((n+m) 3 ) space, and finds only one occurrence. Moreover, our algorithm is much simpler than theirs. 1 Introduction The string pattern matching is a task to find all occurrences of a pattern in a text. In...
Algorithms on Compressed Strings and Arrays
 In Proc. 26th Ann. Conf. on Current Trends in Theory and Practice of Infomatics
, 1999
"... . We survey the complexity issues related to several algorithmic problems for compressed one and twodimensional texts without explicit decompression: patternmatching, equalitytesting, computation of regularities, subsegment extraction, language membership, and solvability of word equations. Our ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
. We survey the complexity issues related to several algorithmic problems for compressed one and twodimensional texts without explicit decompression: patternmatching, equalitytesting, computation of regularities, subsegment extraction, language membership, and solvability of word equations. Our basic problem is one and twodimensional patternmatching together with its variations. For some types of compression the patternmatching problems are infeasible (NPhard), for other types they are solvable in polynomial time and we discuss how to reduce the degree of corresponding polynomials. 1 Introduction In the last decade a new stream of research related to data compression has emerged: algorithms on compressed objects. It has been caused by the increase in the volume of data and the need to store and transmit masses of information in compressed form. The compressed information has to be quickly accessed and processed without explicit decompression. In this paper we consider severa...
On the Complexity of Pattern Matching for Highly Compressed TwoDimensional Texts
, 1997
"... We consider the complexity of problems related to 2dimensional texts (2dtexts) described succinctly. In a succinct description, larger rectangular subtexts are defined in terms of smaller parts in a way similar to that of LempelZiv compression for 1dimensional texts, or in shortly described str ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
We consider the complexity of problems related to 2dimensional texts (2dtexts) described succinctly. In a succinct description, larger rectangular subtexts are defined in terms of smaller parts in a way similar to that of LempelZiv compression for 1dimensional texts, or in shortly described strings as in [9], or in hierarchical graphs described by contextfree graph grammars. A given 2dtext T with many internal repetitions can have a hierarchical description (denoted Compress(T )) which is up to exponentially smaller and which can be the only part of the input for a patternmatching algorithm which gives information about T . Such a hierarchical description is given in terms of a straightline program, see [9] or, equivalently, a 2dimensional grammar. We consider compressed patternmatching, where the input consists of a 2dpattern P and of a hierarchical description of a 2dtext T , and fully compressed patternmatching, where the input consists of hierarchical descriptions of...
ShiftAnd Approach to Pattern Matching in LZW Compressed Text
 In Proc. CPM'99, LNCS 1645
, 1999
"... This paper considers the ShiftAnd approach to the problem of pattern matching in LZW compressed text, and gives a new algorithm that solves it. The algorithm is indeed fast when a pattern length is at most 32, or the word length. After an O(m + #)timeandO(#) space preprocessing of a patter ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
This paper considers the ShiftAnd approach to the problem of pattern matching in LZW compressed text, and gives a new algorithm that solves it. The algorithm is indeed fast when a pattern length is at most 32, or the word length. After an O(m + #)timeandO(#) space preprocessing of a pattern, it scans an LZW compressed text in O(n + r) time and reports all occurrences of the pattern, where n is the compressed text length, m is the pattern length, and r is the number of the pattern occurrences. Experimental results show that it runs approximately 1.5 times faster than a decompression followed by a simple search using the ShiftAnd algorithm. Moreover, the algorithm can be extended to the generalized pattern matching, to the pattern matching with k mismatches, and to the multiple pattern matching, like the ShiftAnd algorithm.