Results 1  10
of
16
Offline compression by greedy textual substitution
 PROC. IEEE
, 2000
"... Greedy offline textual substitution refers to the following approach to compression or structural inference. Given a long textstring x, a substring w is identified such that replacing all instances of w in x except one by a suitable pair of pointers yields the highest possible contraction of x; the ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
Greedy offline textual substitution refers to the following approach to compression or structural inference. Given a long textstring x, a substring w is identified such that replacing all instances of w in x except one by a suitable pair of pointers yields the highest possible contraction of x; the process is then repeated on the contracted textstring until substrings capable of producing contractions can no longer be found. This paper examines computational issues arising in the implementation of this paradigm and describes some applications and experiments.
Algorithms on Compressed Strings and Arrays
 In Proc. 26th Ann. Conf. on Current Trends in Theory and Practice of Infomatics
, 1999
"... . We survey the complexity issues related to several algorithmic problems for compressed one and twodimensional texts without explicit decompression: patternmatching, equalitytesting, computation of regularities, subsegment extraction, language membership, and solvability of word equations. Our ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
. We survey the complexity issues related to several algorithmic problems for compressed one and twodimensional texts without explicit decompression: patternmatching, equalitytesting, computation of regularities, subsegment extraction, language membership, and solvability of word equations. Our basic problem is one and twodimensional patternmatching together with its variations. For some types of compression the patternmatching problems are infeasible (NPhard), for other types they are solvable in polynomial time and we discuss how to reduce the degree of corresponding polynomials. 1 Introduction In the last decade a new stream of research related to data compression has emerged: algorithms on compressed objects. It has been caused by the increase in the volume of data and the need to store and transmit masses of information in compressed form. The compressed information has to be quickly accessed and processed without explicit decompression. In this paper we consider severa...
Some Theory and Practice of Greedy Offline Textual Substitution
 Proc. Data Compression Conference, IEEE Computer
, 1998
"... Purdue University and Universit�a di Padova Greedy o��line textual substitution refers to the following steepest descent approach ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
Purdue University and Universit�a di Padova Greedy o��line textual substitution refers to the following steepest descent approach
An Optimal O(log log n) Time Parallel Algorithm for Detecting all Squares in a String
, 1995
"... An optimal O(log log n) time concurrentread concurrentwrite parallel algorithm for detecting all squares in a string is presented. A tight lower bound shows that over general alphabets this is the fastest possible optimal algorithm. When p processors are available the bounds become \Theta(d n ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
An optimal O(log log n) time concurrentread concurrentwrite parallel algorithm for detecting all squares in a string is presented. A tight lower bound shows that over general alphabets this is the fastest possible optimal algorithm. When p processors are available the bounds become \Theta(d n log n p e + log log d1+p=ne 2p). The algorithm uses an optimal parallel stringmatching algorithm together with periodicity properties to locate the squares within the input string.
Efficient String Algorithmics
, 1992
"... Problems involving strings arise in many areas of computer science and have numerous practical applications. We consider several problems from a theoretical perspective and provide efficient algorithms and lower bounds for these problems in sequential and parallel models of computation. In the sequ ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
Problems involving strings arise in many areas of computer science and have numerous practical applications. We consider several problems from a theoretical perspective and provide efficient algorithms and lower bounds for these problems in sequential and parallel models of computation. In the sequential setting, we present new algorithms for the string matching problem improving the previous bounds on the number of comparisons performed by such algorithms. In parallel computation, we present tight algorithms and lower bounds for the string matching problem, for finding the periods of a string, for detecting squares and for finding initial palindromes.
Optimal Parallel Dictionary Matching and Compression (Extended Abstract)
 7th Annual ACM Symposium on Parallel Algorithms and Architectures
, 1995
"... ) Martin Farach S. Muthukrishnan y Rutgers University DIMACS April 26, 1995 Abstract Emerging applications in multimedia and the Human Genome Project require storage and searching of large databases of strings  a task for which parallelism seems the only hope. In this paper, we consider the ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
) Martin Farach S. Muthukrishnan y Rutgers University DIMACS April 26, 1995 Abstract Emerging applications in multimedia and the Human Genome Project require storage and searching of large databases of strings  a task for which parallelism seems the only hope. In this paper, we consider the parallelism in some of the fundamental problems in compressing strings and in matching large dictionaries of patterns against texts. We present the first workoptimal algorithms for these wellstudied problems including the classical dictionary matching problem, optimal compression with a static dictionary and the universal data compression with dynamic dictionary of Lempel and Ziv. All our algorithms are randomized and they are of the Las Vegas type. Furthermore, they are fast, working in time logarithmic in the input size. Additionally, our algorithms seem suitable for a distributed implementation. 1 Introduction Large data bases of strings from multimedia applications and the Human G...
String Pattern Matching For A Deluge Survival Kit
, 2000
"... String Pattern Matching concerns itself with algorithmic and combinatorial issues related to matching and searching on linearly arranged sequences of symbols, arguably the simplest possible discrete structures. As unprecedented volumes of sequence data are amassed, disseminated and shared at an incr ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
String Pattern Matching concerns itself with algorithmic and combinatorial issues related to matching and searching on linearly arranged sequences of symbols, arguably the simplest possible discrete structures. As unprecedented volumes of sequence data are amassed, disseminated and shared at an increasing pace, effective access to, and manipulation of such data depend crucially on the efficiency with which strings are structured, compressed, transmitted, stored, searched and retrieved. This paper samples from this perspective, and with the authors' own bias, a rich arsenal of ideas and techniques developed in more than three decades of history.
Practical Parallel LempelZiv Factorization
"... In the age of big data, the need for efficient data compression algorithms has grown. A widely used data compression method is the LempelZiv77 (LZ77) method, being a subroutine in popular compression packages such as gzip and PKZIP. There has been a lot of recent effort on developing practical se ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
In the age of big data, the need for efficient data compression algorithms has grown. A widely used data compression method is the LempelZiv77 (LZ77) method, being a subroutine in popular compression packages such as gzip and PKZIP. There has been a lot of recent effort on developing practical sequential algorithms for LempelZiv factorization (equivalent to LZ77 compression), but research in practical parallel implementations has been less satisfactory. In this work, we present a simple workefficient parallel algorithm for LempelZiv factorization. We show theoretically that our algorithm requires linear work and runs in O(log 2 n) time (randomized) for constant alphabets and O(n ɛ) time (ɛ < 1) for integer alphabets. We present experimental results showing that our algorithm is efficient and achieves good speedup with respect to the best sequential implementations of LempelZiv factorization.
Efficient String Matching on Coded Texts
 IN PROCEEDINGS OF COMBINATORIAL PATTERN MATCHING, 6TH ANNUAL SYMPOSIUM (CPM'95
, 1994
"... The so called "four Russians technique" is often used to speed up algorithms by encoding several data items in a single memory cell. Given a sequence of n symbols over a constant size alphabet, one can encode the sequence into O(n=) memory cells in O(log ) time using n= log processors. T ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
The so called "four Russians technique" is often used to speed up algorithms by encoding several data items in a single memory cell. Given a sequence of n symbols over a constant size alphabet, one can encode the sequence into O(n=) memory cells in O(log ) time using n= log processors. This paper presents an efficient CRCWPRAM stringmatching algorithm for coded texts that takes O(log log(m=)) time 1 making only O(n=) operations, an improvement by a factor of = O(logn) on the number of operations used in previous algorithms. Using this stringmatching algorithm one can test if a string is squarefree and find all palindromes in a string in O(log log n) time using n= log log n processors.
Testing SquareFreeness of Strings Compressed by Balanced Straight Line Program
"... In this paper we study the problem of deciding whether a given compressed string contains a square. A string x is called a square if x = zz and z = uk implies k = 1 and u = z. A string w is said to be squarefree if no substrings of w are squares. Many efficient algorithms to test if a given string ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
In this paper we study the problem of deciding whether a given compressed string contains a square. A string x is called a square if x = zz and z = uk implies k = 1 and u = z. A string w is said to be squarefree if no substrings of w are squares. Many efficient algorithms to test if a given string is squarefree, have been developed so far. However, very little is known for testing squarefreeness of a given compressed string. In this paper, we give an O(max(n 2, n log 2 N))time O(n 2)space solution to test squarefreeness of a given compressed string, where n and N are the size of a given compressed string and the corresponding decompressed string, respectively. Our input strings are compressed by balanced straight line program (BSLP). We remark that BSLP has exponential compression, that is, N = O(2 n). Hence no decompressthentest approaches can be better than our method in the worst case.