## Optimal Parallel Dictionary Matching and Compression (Extended Abstract) (1995)

Venue: | 7th Annual ACM Symposium on Parallel Algorithms and Architectures |

Citations: | 6 - 3 self |

### BibTeX

@INPROCEEDINGS{Farach95optimalparallel,

author = {Martin Farach and S. Muthukrishnan},

title = {Optimal Parallel Dictionary Matching and Compression (Extended Abstract)},

booktitle = {7th Annual ACM Symposium on Parallel Algorithms and Architectures},

year = {1995},

pages = {244--253}

}

### Years of Citing Articles

### OpenURL

### Abstract

) Martin Farach S. Muthukrishnan y Rutgers University DIMACS April 26, 1995 Abstract Emerging applications in multi-media and the Human Genome Project require storage and searching of large databases of strings -- a task for which parallelism seems the only hope. In this paper, we consider the parallelism in some of the fundamental problems in compressing strings and in matching large dictionaries of patterns against texts. We present the first work-optimal algorithms for these well-studied problems including the classical dictionary matching problem, optimal compression with a static dictionary and the universal data compression with dynamic dictionary of Lempel and Ziv. All our algorithms are randomized and they are of the Las Vegas type. Furthermore, they are fast, working in time logarithmic in the input size. Additionally, our algorithms seem suitable for a distributed implementation. 1 Introduction Large data bases of strings from multi-media applications and the Human G...

### Citations

1210 | A universal algorithm for sequential data compression
- Ziv, Lempel
- 1977
(Show Context)
Citation Context ...lied on applying a general purpose shortest-paths routine [2]; this step is work-inefficient due to the well-known transitive closure bottleneck [16]. Dynamic Dictionary Compression. LZ1 [20] and LZ2 =-=[30]-=- are two well-known dynamic compression schemes. LZ1 is known to give better compressions in practice; for example, see Unix compress and gnuzip. Nonetheless, LZ2 is implemented in practice because of... |

658 |
Fast pattern matching in strings
- Knuth, Morris, et al.
- 1977
(Show Context)
Citation Context ...ompression [2] lies in computing shortest paths in graphs. Dictionary Matching. Within two years of the discovery of the classical linear time string matching algorithm due to Knuth, Morris and Pratt =-=[19]-=-, Aho and Corasick [3] designed a linear time (hence, optimal) algorithm for dictionary matching by generalizing the finite automaton construction in [19] to a set of strings. In the mid-eighties, Gal... |

395 |
Some complexity questions related to distributive computing
- Yao
- 1979
(Show Context)
Citation Context ... implementation on a network of workstations [24]. In fact, in this setting, we can conclude from Communication Complexity that even checking equality of strings requires randomization for efficiency =-=[29]-=-. Thus the randomization in our algorithms seems well justified. Section 2 contains a review of results we use. Our algorithms for dictionary matching, dynamic compression and static compression are d... |

311 | Efficient randomized pattern-matching algorithms
- Karp, Rabin
- 1987
(Show Context)
Citation Context ...ree ofsD. Then we trace down from the root starting from each of the desired text locations independently. The key is that string comparison along the edges and separators are done using fingerprints =-=[17]-=-. Step 1B. We now compute S[i] for all positions which are not a multiple of L. We process each window T [kL \Gamma (L \Gamma 1); : : : ; kL] independently, for integer k 2 [1; n=L]. Given S[i], we sh... |

288 |
A survey of parallel algorithms for shared-memory machines
- Karp, Ramachandran
- 1988
(Show Context)
Citation Context ...n contrast, previous approaches to this problem have relied on applying a general purpose shortest-paths routine [2]; this step is work-inefficient due to the well-known transitive closure bottleneck =-=[16]-=-. Dynamic Dictionary Compression. LZ1 [20] and LZ2 [30] are two well-known dynamic compression schemes. LZ1 is known to give better compressions in practice; for example, see Unix compress and gnuzip.... |

239 |
On the complexity of finite sequences
- Lempel, Ziv
- 1976
(Show Context)
Citation Context ...se: what is the dictionary, and how do we pick the best word from this dictionary so as to minimize the number of references in such a parsing. The well-known LZ1 compression scheme of Lempel and Ziv =-=[20]-=- makes the following choices. The dictionary is all substrings S[x; y] of S such that xsi. In this case, since the dictionary is always changing, that is, new strings are added to the dictionary as lo... |

200 |
Design and implementation of an efficient priority queue
- BOAS, KAAS, et al.
- 1977
(Show Context)
Citation Context ... and work. Lemma 2.4 ([6]) Given an array, A, of n numbers, we can compute, for each location i, the nearest positionsj such that j ! i and A[j] ! A[i] in O(log log n) time with O(n) work. Lemma 2.5 (=-=[26]-=-) A subset of numbers from the universes1 : : : N can be maintained under insert, delete, extract maximum or minimum and find predecessor or successor queries in O(log log N ) time using O(s) space wh... |

160 |
Data Compression: Methods and Theory
- Storer
- 1988
(Show Context)
Citation Context ...dictionaries of patterns against texts. These two areas of study have an intimately linked history and are amongst the most intensively studied problems in Computer Science. (For compression see e.g. =-=[25]-=- and for dictionary matching see e.g. [3, 18, 22, 5, 4]). In this paper, we present the first work-optimal algorithms for these problems in a parallel setting. Furthermore, all of our algorithms are f... |

45 |
An optimal randomized parallel algorithm for finding connected components in a graph
- Gazit
- 1991
(Show Context)
Citation Context ...log n) expected time and O(n) expected work if the string has constant-sized alphabet, and in O(logn) expected time and O(n log log n) expected work if it has a polynomial-sized alphabet. Lemma 2.2 ( =-=[14]-=-) Given a graph on n vertices and m edges, its connected components can be determined in O(log n) time and O(m) work. Lemma 2.3 ([7]) Given an array A of n numbers, it can be pre-processed in O(log n)... |

39 | Optimal parallel algorithms for string matching - Galil - 1985 |

38 |
Efficient string matching
- Aho, Corasick
- 1975
(Show Context)
Citation Context ...ese two areas of study have an intimately linked history and are amongst the most intensively studied problems in Computer Science. (For compression see e.g. [25] and for dictionary matching see e.g. =-=[3, 18, 22, 5, 4]-=-). In this paper, we present the first work-optimal algorithms for these problems in a parallel setting. Furthermore, all of our algorithms are fast, working in time logarithmic in the input size. Com... |

38 |
Highly parallelizable problems
- Berkman, Breslauer, et al.
- 1989
(Show Context)
Citation Context ...processed in O(log n) time and O(n) work so any range-maxima query (that is, given [i; j], return the maximum value in A[i]; A[i + 1]; : : : ; A[j]) can be processed in O(1) time and work. Lemma 2.4 (=-=[6]-=-) Given an array, A, of n numbers, we can compute, for each location i, the nearest positionsj such that j ! i and A[j] ! A[i] in O(log log n) time with O(n) work. Lemma 2.5 ([26]) A subset of numbers... |

29 |
Optimal parallel pattern matching in strings
- VISHKIN
- 1985
(Show Context)
Citation Context ...3] designed a linear time (hence, optimal) algorithm for dictionary matching by generalizing the finite automaton construction in [19] to a set of strings. In the mid-eighties, Galil [12] and Vishkin =-=[27]-=- designed the first work-optimal string matching algorithms, which have since been extended significantly [28, 13, 9]. However, a work-optimal algorithm for dictionary matching has remained elusive. A... |

25 |
Improved deterministic parallel integer sorting
- Bhatt, Diks, et al.
- 1991
(Show Context)
Citation Context ...lgorithm is the construction of a suffix tree -- this has the well-known bottleneck of parallel integer sorting for which the best known algorithm is suboptimal by an O(log log d) factor in this case =-=[8]-=-. The work bound of our algorithm improves on the previously best algorithm [22] unless the alphabet size is superexponential in m. In what follows, we assume that the alphabet size is constant, as is... |

25 |
Deterministic sampling - A new technique for fast pattern matching
- Vishkin
- 1991
(Show Context)
Citation Context ...ton construction in [19] to a set of strings. In the mid-eighties, Galil [12] and Vishkin [27] designed the first work-optimal string matching algorithms, which have since been extended significantly =-=[28, 13, 9]-=-. However, a work-optimal algorithm for dictionary matching has remained elusive. As in the case of [19], the finite automaton based approach of [3] is inherently sequential. Recent progress on parall... |

24 |
Adaptive dictionary matching
- Amir, Farach
- 1991
(Show Context)
Citation Context ...ese two areas of study have an intimately linked history and are amongst the most intensively studied problems in Computer Science. (For compression see e.g. [25] and for dictionary matching see e.g. =-=[3, 18, 22, 5, 4]-=-). In this paper, we present the first work-optimal algorithms for these problems in a parallel setting. Furthermore, all of our algorithms are fast, working in time logarithmic in the input size. Com... |

20 |
Recursive *-tree parallel data-structure
- Berkman, Vishkin
- 1989
(Show Context)
Citation Context ...) expected work if it has a polynomial-sized alphabet. Lemma 2.2 ( [14]) Given a graph on n vertices and m edges, its connected components can be determined in O(log n) time and O(m) work. Lemma 2.3 (=-=[7]-=-) Given an array A of n numbers, it can be pre-processed in O(log n) time and O(n) work so any range-maxima query (that is, given [i; j], return the maximum value in A[i]; A[i + 1]; : : : ; A[j]) can ... |

19 | Optimally fast parallel algorithms for preprocessing and pattern matching in one and two dimensions
- Cole, Crochemore, et al.
- 1993
(Show Context)
Citation Context ...ton construction in [19] to a set of strings. In the mid-eighties, Galil [12] and Vishkin [27] designed the first work-optimal string matching algorithms, which have since been extended significantly =-=[28, 13, 9]-=-. However, a work-optimal algorithm for dictionary matching has remained elusive. As in the case of [19], the finite automaton based approach of [3] is inherently sequential. Recent progress on parall... |

19 |
A constant-time optimal parallel string-matching algorithm
- Galil
- 1992
(Show Context)
Citation Context ...ton construction in [19] to a set of strings. In the mid-eighties, Galil [12] and Vishkin [27] designed the first work-optimal string matching algorithms, which have since been extended significantly =-=[28, 13, 9]-=-. However, a work-optimal algorithm for dictionary matching has remained elusive. As in the case of [19], the finite automaton based approach of [3] is inherently sequential. Recent progress on parall... |

18 | Efficient randomized dictionary matching algorithms
- Amir, Farach, et al.
- 1992
(Show Context)
Citation Context ...ese two areas of study have an intimately linked history and are amongst the most intensively studied problems in Computer Science. (For compression see e.g. [25] and for dictionary matching see e.g. =-=[3, 18, 22, 5, 4]-=-). In this paper, we present the first work-optimal algorithms for these problems in a parallel setting. Furthermore, all of our algorithms are fast, working in time logarithmic in the input size. Com... |

15 |
Optimal parallel suffix-prefix matching algorithm and applications. manuscript
- Kedem, Landau, et al.
- 1988
(Show Context)
Citation Context |

12 |
Parallel algorithms for optimal compression using dictionaries with the pre x property
- Agostino, Storer
- 1992
(Show Context)
Citation Context ...ng in O(log d+logn) time and O(n) work 1 . The previously known best algorithm for this problem takes time O(log 2 n) and O(n 3 log 2 n) work, or alternately, takes time O(logn) and O(n 4 log n) work =-=[2]-=- after\Omega\Gamma d log d) work for preprocessing. Dynamic Dictionary Compression: We present the first known optimal algorithm for LZ1 compression as well for its uncompression. Our algorithm constr... |

12 |
Efficient parallel algorithms to test square-freeness and factorize
- Crochemore, Rytter
- 1991
(Show Context)
Citation Context ...lgorithm reconstructs the string in O(logn) time and O(n) work. Here we make the standard assumption [23] that n is known. The previously known best algorithms perform O(n log n) work for compression =-=[23, 10]-=- as well as for uncompression [23]. 1.2 Technical Contribution The suffix tree of a string (defined in Section 2) is a versatile data structure in string processing. All our algorithms crucially rely ... |

11 |
P-Complete problems in data compression
- Agostino
- 1991
(Show Context)
Citation Context .... Nonetheless, LZ2 is implemented in practice because of the simplicity of its sequential implementation. Curiously, while we provide optimal work RNC algorithm for LZ1 compression, LZ2 is P-Complete =-=[1]-=- (hence unlikely to have (R)NC algorithms). Versions of our algorithms seem suitable for distributed implementation on a network of workstations [24]. In fact, in this setting, we can conclude from Co... |

9 | Parallel algorithms for data compression - Gonzalez-Smith, Storer - 1985 |

8 | String matching with preprocessing of text and pattern - Naor - 1991 |

5 |
An optimal logarithmic time, randomized parallel string matching algorithm
- Farach, Muthukrishnan
- 1996
(Show Context)
Citation Context ...f a string (defined in Section 2) is a versatile data structure in string processing. All our algorithms crucially rely on a recently discovered algorithm for constructing the suffix tree of a string =-=[11]-=-. While this work-optimal algorithm paved way for our work optimal algorithms for dictionary matching and compression, we stress that the suffix tree construction was not the inherent bottleneck in th... |

2 |
Highly efficient parallel dictionary matching
- Muthukrishnan, Palem
- 1993
(Show Context)
Citation Context |

1 |
A time and space efficient algorithm for dynamic method look-up in object oriented programming languages
- Muthukrishnan
- 1995
(Show Context)
Citation Context ...ll the nearest colored ancestors problem on trees. The sequential version of our solution for this problem has already been applied successfully in compilers for Object Oriented Programming Languages =-=[21]-=-. Also, interestingly, we have devised a very fast procedure that checks the output of the basic Monte Carlo dictionary matching algorithm: therefore, our dictionary matching algorithm is of the Las V... |

1 |
A distributed dictionary matching implementation
- Papadipoyli
- 1994
(Show Context)
Citation Context ...NC algorithm for LZ1 compression, LZ2 is P-Complete [1] (hence unlikely to have (R)NC algorithms). Versions of our algorithms seem suitable for distributed implementation on a network of workstations =-=[24]-=-. In fact, in this setting, we can conclude from Communication Complexity that even checking equality of strings requires randomization for efficiency [29]. Thus the randomization in our algorithms se... |