Parallel Algorithms with Optimal Speedup for Bounded Treewidth
 Proceedings 22nd International Colloquium on Automata, Languages and Programming
, 1995
Abstract

We describe the first parallel algorithm with optimal speedup for constructing minimumwidth tree decompositions of graphs of bounded treewidth. On nvertex input graphs, the algorithm works in O((logn)^2) time using O(n) operations on the EREW PRAM. We also give faster parallel algorithms with optimal speedup for the problem of deciding whether the treewidth of an input graph is bounded by a given constant and for a variety of problems on graphs of bounded treewidth, including all decision problems expressible in monadic secondorder logic. On nvertex input graphs, the algorithms use O(n) operations together with O(log n log n) time on the EREW PRAM, or O(log n) time on the CRCW PRAM.
Shared Memory Simulations with TripleLogarithmic Delay (Extended Abstract)
, 1995
Abstract

) Artur Czumaj 1 , Friedhelm Meyer auf der Heide 2 , and Volker Stemann 1 1 Heinz Nixdorf Institute, University of Paderborn, D33095 Paderborn, Germany 2 Heinz Nixdorf Institute and Department of Computer Science, University of Paderborn, D33095 Paderborn, Germany Abstract. We consider the problem of simulating a PRAM on a distributed memory machine (DMM). Our main result is a randomized algorithm that simulates each step of an nprocessor CRCW PRAM on an nprocessor DMM with O(log log log n log n) delay, with high probability. This is an exponential improvement on all previously known simulations. It can be extended to a simulation of an (n log log log n log n) processor EREW PRAM on an nprocessor DMM with optimal delay O(log log log n log n), with high probability. Finally a lower bound of \Omega (log log log n=log log log log n) expected time is proved for a large class of randomized simulations that includes all known simulations. 1 Introduction Para...
Optimally Fast Parallel Algorithms for Preprocessing and Pattern Matching in One and Two Dimensions
, 1993
Abstract

All algorithms below are optimal alphabetindependent parallel CRCW PRAM algorithms. In one dimension: Given a pattern string of length m for the stringmatching problem, we design an algorithm that computes a deterministic sample of a sufficiently long substring in constant time. This problem used to be a bottleneck in the pattern preprocessing for one and twodimensional pattern matching. The best previous time bound was O(log 2 m= log log m). We use this algorithm to obtain the following results. 1. Improving the preprocessing of the constanttime text search algorithm [12] from O(log 2 m= log log m) to O(log log m), which is now best possible. 2. A constanttime deterministic stringmatching algorithm in the case that the text length n satisfies n = \Omega\Gamma m 1+ffl ) for a constant ffl ? 0. 3. A simple probabilistic stringmatching algorithm that has constant time with high probability for random input. 4. A constant expected time LasVegas algorithm for computing t...
On the Power of Arrays with Reconfigurable Optical Buses
, 1996
Abstract

This paper examines some computational aspects of different arrays enhanced with optical pipelined buses. The array processors with optical pipelined buses (APPB) are shown to be extremely flexible, as demonstrated by their ability to efficiently simulate different variants of PRAMs and bounded degree networks. A model of computation is introduced, the array with reconfigurable optical buses (AROB), which combines some of the advantages and characteristics of the classical reconfigurable networks (RN) and the APPB. A number of applications of the APPB and AROB are presented, and their power is investigated. It is shown that beside AROB's capability of simulating classical reconfigurable networks, the enhanced communication mechanisms allow for an important system reduction when compared with the classical RNs. Keywords: optical interconnections, pipelined optical buses, reconfigurable networks, bounded degree networks, PRAM models. 1 Introduction Interprocessor communication networks...
Optimal Deterministic Approximate Parallel Prefix Sums and Their Applications
 In Proc. Israel Symp. on Theory and Computing Systems (ISTCS'95
, 1995
Abstract

We show that extremely accurate approximation to the prefix sums of a sequence of n integers can be computed deterministically in O(log log n) time using O(n= log log n) processors in the Common CRCW PRAM model. This complements randomized approximation methods obtained recently by Goodrich, Matias and Vishkin and improves previous deterministic results obtained by Hagerup and Raman. Furthermore, our results completely match a lower bound obtained recently by Chaudhuri. Our results have many applications. Using them we improve upon the best known time bounds for deterministic approximate selection and for deterministic padded sorting. 1 Introduction The computation of prefix sums is one of the most basic tools in the design of fast parallel algorithms (see Blelloch [9] and J'aJ'a [33]). Prefixsums can be computed in O(logn) time and linear work in the EREW PRAM model (Ladner and Fischer [34]) and in O(log n= log log n) and linear work in the Common CRCW PRAM model (Cole and Vishkin...
Optimal Logarithmic Time Randomized Suffix Tree Construction
 In Proc 23rd ICALP
, 1996
Abstract

The su#x tree of a string, the fundamental data structure in the area of combinatorial pattern matching, has many elegant applications. In this paper, we present a novel, simple sequential algorithm for the construction of su#x trees. We are also able to parallelize our algorithm so that we settle the main open problem in the construction of su#x trees: we give a Las Vegas CRCW PRAM algorithm that constructs the su#x tree of a binary string of length n in O(log n) time and O(n) work with high probability. In contrast, the previously known workoptimal algorithms, while deterministic, take# (log n) time.
CONTENTION RESOLUTION IN HASHING BASED SHARED MEMORY SIMULATIONS
, 2000
Abstract

In this paper we study the problem of simulating shared memory on the distributed memory machine (DMM). Our approach uses multiple copies of shared memory cells, distributed among the memory modules of the DMM via universal hashing. The main aim is to design strategies that resolve contention at the memory modules. Extending results and methods from random graphs and very fast randomized algorithms, we present new simulation techniques that enable us to improve the previously best results exponentially. In particular, we show that an nprocessor CRCW PRAM can be simulated by an nprocessor DMM with delay O(log log log n log ∗ n), with high probability. Next we describe a general technique that can be used to turn these simulations into timeprocessor optimal ones, in the case of EREW PRAMs to be simulated. We obtain a timeprocessor optimal simulation of an (n log log log n log ∗ n)processor EREW PRAM on an nprocessor DMM with delay O(log log log n log ∗ n), with high probability. When an (n log log log n log ∗ n)processor CRCW PRAM is simulated, the delay is only by a log ∗ n factor larger. We further demonstrate that the simulations presented can not be significantly improved using our techniques. We show an Ω(log log log n / log log log log n) lower bound on the expected delay for a class of PRAM simulations, called topological simulations, that covers all previously known simulations as well as the simulations presented in the paper.
Efficient String Algorithmics
, 1992
Abstract

Problems involving strings arise in many areas of computer science and have numerous practical applications. We consider several problems from a theoretical perspective and provide efficient algorithms and lower bounds for these problems in sequential and parallel models of computation. In the sequential setting, we present new algorithms for the string matching problem improving the previous bounds on the number of comparisons performed by such algorithms. In parallel computation, we present tight algorithms and lower bounds for the string matching problem, for finding the periods of a string, for detecting squares and for finding initial palindromes.
Randomization helps to perform independent tasks reliably, Random Structures and Algorithms
Abstract

This paper is about algorithms that schedule tasks to be performed in a distributed failureprone environment, when processors communicate by messagepassing, and when tasks are independent and of unit length. The processors work under synchrony and may fail by crashing. Failure patterns are imposed by adversaries. The question how the power of adversaries affects the optimality of randomized algorithmic solutions is among the problems studied. Linearlybounded adversaries may fail up to a constant fraction of the processors. Weaklyadaptive adversaries have to select, prior to the start of an execution, a subset of processors to be failureprone, and then may fail only the selected processors, at arbitrary steps, in the course of the execution. Strongly adaptive adversaries have a total number of failures as the only restriction on failure patterns. The measures of complexity are work, measured as the available processor steps, and communication, measured as the number of pointtopoint messages. A randomized algorithm is developed, that attains both O(n log ∗ n) expected work and O(n log ∗ n) expected communication, against weaklyadaptive linearlybounded adversaries, in the case when the numbers of tasks and processors are both equal to n. This is in contrast with the performance of algorithms against stronglyadaptive linearlybounded adversaries, that has to be Ω(n log n / log log n) in terms of work. Key words: distributed algorithm, randomized algorithm, message passing, crash failures, adaptive adversary, independent tasks, load balancing, lower bound.