Results 1 - 10
of
21
Approximate String Matching over Ziv-Lempel Compressed Text
, 2000
"... We present the first nontrivial algorithm for approximate pattern matching on compressed text. The format we choose is the Ziv-Lempel family. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text allowing up to k inse ..."
Abstract
-
Cited by 38 (11 self)
- Add to MetaCart
We present the first nontrivial algorithm for approximate pattern matching on compressed text. The format we choose is the Ziv-Lempel family. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text allowing up to k insertions, deletions and substitutions. On LZ78/LZW we need O(mkn + R) time in the worst case and O(k ) +R) on average where is the alphabet size. The experimental results show a practical speedup over the basic approach of up to 2X for moderate m and small k. We extend the algorithms to more general compression formats and approximate matching models.
A General Practical Approach to Pattern Matching over Ziv-Lempel Compressed Text
, 1998
"... . We address the problem of string matching on Ziv-Lempel compressed text. The goal is to search a pattern in a text without uncompressing it. This is a highly relevant issue to keep compressed text databases where efficient searching is still possible. We develop a general technique for string matc ..."
Abstract
-
Cited by 38 (7 self)
- Add to MetaCart
. We address the problem of string matching on Ziv-Lempel compressed text. The goal is to search a pattern in a text without uncompressing it. This is a highly relevant issue to keep compressed text databases where efficient searching is still possible. We develop a general technique for string matching when the text comes as a sequence of blocks. This abstracts the essential features of Ziv-Lempel compression. We then apply the scheme to each particular type of compression. We present the first algorithm to find all the matches of a pattern in a text compressed using LZ77. When we apply our scheme to LZ78, we obtain a much more efficient search algorithm, which is faster than uncompressing the text and then searching on it. Finally, we propose a new hybrid compression scheme which is between LZ77 and LZ78, being in practice as good to compress as LZ77 and as fast to search in as LZ78. 1 Introduction String matching is one of the most pervasive problems in computer science, with appli...
Application of Lempel-Ziv factorization to the approximation of grammar-based compression
, 2003
"... We introduce new type of context-free grammars, AVL-grammars, and show theirappl7#B#BZ87 to grammar-based compression. Using this type of grammars we present O(nl7 time and O(lZ n)-ratio approximation ofminimal grammar-based compression of a given string oflZM,k n over anal,UMJ, # and O(klU n) t ..."
Abstract
-
Cited by 36 (1 self)
- Add to MetaCart
We introduce new type of context-free grammars, AVL-grammars, and show theirappl7#B#BZ87 to grammar-based compression. Using this type of grammars we present O(nl7 time and O(lZ n)-ratio approximation ofminimal grammar-based compression of a given string oflZM,k n over anal,UMJ, # and O(klU n) time transformation of LZ77 encoding of size k into a grammar-based encoding of size O(klU n).
Multiple Pattern Matching in LZW Compressed Text
- In Proc. DCC'98
, 1998
"... In this paper we address the problem of searching in LZW compressed text directly, and present a new algorithm for finding multiple patterns bysimulating the moveofthe Aho-Corasick pattern matching machine. The new algorithm finds all occurrences of multiple patterns whereas the algorithm propose ..."
Abstract
-
Cited by 22 (9 self)
- Add to MetaCart
In this paper we address the problem of searching in LZW compressed text directly, and present a new algorithm for finding multiple patterns bysimulating the moveofthe Aho-Corasick pattern matching machine. The new algorithm finds all occurrences of multiple patterns whereas the algorithm proposed by Amir, Benson, and Farach finds only the first occurrence of a single pattern.
A Unifying Framework for Compressed Pattern Matching
- In Proc. 6th International Symp. on String Processing and Information Retrieval
, 1999
"... We introduce a general framework which is suitable to capture an essence of compressed pattern matching according to various dictionary based compressions, and propose a compressed pattern matching algorithm for the framework. The goal is to find all occurrences of a pattern in a text without decomp ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
We introduce a general framework which is suitable to capture an essence of compressed pattern matching according to various dictionary based compressions, and propose a compressed pattern matching algorithm for the framework. The goal is to find all occurrences of a pattern in a text without decompression, which is one of the most active topics in string matching. Our framework includes such compression methods as Lempel-Ziv family, (LZ77, LZSS, LZ78, LZW), byte-pair encoding, and the static dictionary based method. Technically, our pattern matching algorithm extends that for LZW compressed text presented by Amir, Benson and Farach. 1 Introduction Pattern matching is one of the most fundamental operations in string processing. The problem is to find all occurrences of a given pattern in a given text. A lot of classical or advanced pattern matching algorithms have been proposed (see [3, 2]). Data compression is another most important research topic, whose aim is to reduce its space u...
An Improved Pattern Matching Algorithm for Strings in Terms of Straight-Line Programs
- In Proc. 8th Ann. Symp. on Combinatorial Pattern Matching, volume 1264 of Lecture Notes in Computer Science
, 1997
"... We show an efficient pattern matching algorithm for strings that are succinctly described in terms of straight-line programs, in which the constants are symbols and the only operation is the concatenation. In this paper, both text T and pattern P are given by straight-line programs T and P . The len ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
We show an efficient pattern matching algorithm for strings that are succinctly described in terms of straight-line programs, in which the constants are symbols and the only operation is the concatenation. In this paper, both text T and pattern P are given by straight-line programs T and P . The length of the text T (pattern P , resp.) may grows exponentially with respect to its description size jT j = n (jPj = m, resp.). We show a new combinatorial property concerning with the periodic occurrences in a text. Based on this property, we develop an O(n 2 m 2 ) time algorithm using O(nm) space, which outputs a compact representation of all occurrences of P in T . This is superior to the algorithm proposed by Karpinski et al.[11], which runs in O((n +m) 4 log (n +m)) time using O((n+m) 3 ) space, and finds only one occurrence. Moreover, our algorithm is much simpler than theirs. 1 Introduction The string pattern matching is a task to find all occurrences of a pattern in a text. In...
Algorithms on Compressed Strings and Arrays
- In Proc. 26th Ann. Conf. on Current Trends in Theory and Practice of Infomatics
, 1999
"... . We survey the complexity issues related to several algorithmic problems for compressed one- and two-dimensional texts without explicit decompression: pattern-matching, equality-testing, computation of regularities, subsegment extraction, language membership, and solvability of word equations. Our ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
. We survey the complexity issues related to several algorithmic problems for compressed one- and two-dimensional texts without explicit decompression: pattern-matching, equality-testing, computation of regularities, subsegment extraction, language membership, and solvability of word equations. Our basic problem is one- and two-dimensional pattern-matching together with its variations. For some types of compression the pattern-matching problems are infeasible (NP-hard), for other types they are solvable in polynomial time and we discuss how to reduce the degree of corresponding polynomials. 1 Introduction In the last decade a new stream of research related to data compression has emerged: algorithms on compressed objects. It has been caused by the increase in the volume of data and the need to store and transmit masses of information in compressed form. The compressed information has to be quickly accessed and processed without explicit decompression. In this paper we consider severa...
Shift-And Approach to Pattern Matching in LZW Compressed Text
- In Proc. CPM'99, LNCS 1645
, 1999
"... This paper considers the Shift-And approach to the problem of pattern matching in LZW compressed text, and gives a new algorithm that solves it. The algorithm is indeed fast when a pattern length is at most 32, or the word length. After an O(m + |#|)timeandO(|#|) space preprocessing of a patter ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
This paper considers the Shift-And approach to the problem of pattern matching in LZW compressed text, and gives a new algorithm that solves it. The algorithm is indeed fast when a pattern length is at most 32, or the word length. After an O(m + |#|)timeandO(|#|) space preprocessing of a pattern, it scans an LZW compressed text in O(n + r) time and reports all occurrences of the pattern, where n is the compressed text length, m is the pattern length, and r is the number of the pattern occurrences. Experimental results show that it runs approximately 1.5 times faster than a decompression followed by a simple search using the Shift-And algorithm. Moreover, the algorithm can be extended to the generalized pattern matching, to the pattern matching with k mismatches, and to the multiple pattern matching, like the Shift-And algorithm.
Recognizing string graphs in NP
- J. of Computer and System Sciences
"... A string graph is the intersection graph of a set of curves in the plane. Each curve is represented by a vertex, and an edge between two vertices means that the corresponding curves intersect. We show that string graphs can be recognized in NP. The recognition problem was not known to be decidable u ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
A string graph is the intersection graph of a set of curves in the plane. Each curve is represented by a vertex, and an edge between two vertices means that the corresponding curves intersect. We show that string graphs can be recognized in NP. The recognition problem was not known to be decidable until very recently, when two independent papers established exponential upper bounds on the number of intersections needed to realize a string graph (Pach and Tóth, 2001; Schaefer and ˇ Stefankovič, 2001). These results implied that the recognition problem lies in NEXP. In the present paper we improve this by showing that the recognition problem for string graphs is in NP, and therefore NP-complete, since Kratochvíl showed that the recognition problem is NP-hard (Kratochvíl, 1991b). The result has consequences for the computational complexity of problems in graph drawing, and topological inference. We also show that the string graph problem is decidable for surfaces of arbitrary genus. Key words: String graphs, NP-completeness, graph drawing, topological inference, Euler diagrams
On the Complexity of Pattern Matching for Highly Compressed Two-Dimensional Texts
, 1997
"... We consider the complexity of problems related to 2-dimensional texts (2d-texts) described succinctly. In a succinct description, larger rectangular sub-texts are defined in terms of smaller parts in a way similar to that of Lempel-Ziv compression for 1-dimensional texts, or in shortly described str ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
We consider the complexity of problems related to 2-dimensional texts (2d-texts) described succinctly. In a succinct description, larger rectangular sub-texts are defined in terms of smaller parts in a way similar to that of Lempel-Ziv compression for 1-dimensional texts, or in shortly described strings as in [9], or in hierarchical graphs described by context-free graph grammars. A given 2d-text T with many internal repetitions can have a hierarchical description (denoted Compress(T )) which is up to exponentially smaller and which can be the only part of the input for a pattern-matching algorithm which gives information about T . Such a hierarchical description is given in terms of a straight-line program, see [9] or, equivalently, a 2-dimensional grammar. We consider compressed pattern-matching, where the input consists of a 2d-pattern P and of a hierarchical description of a 2d-text T , and fully compressed pattern-matching, where the input consists of hierarchical descriptions of...

