Results 11  20
of
29
Retrieval of scattered information by EREW, CREW and CRCW PRAMs
, 1992
"... The kcompaction problem arises when k out of n cells in an array are nonempty and the contents of these cells must be moved to the first k locations in the array. Parallel algorithms for kcompaction have obvious applications in processor allocation and load balancing; kcompaction is also an im ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
The kcompaction problem arises when k out of n cells in an array are nonempty and the contents of these cells must be moved to the first k locations in the array. Parallel algorithms for kcompaction have obvious applications in processor allocation and load balancing; kcompaction is also an important subroutine in many recently developed parallel algorithms. We show that any EREW PRAM that solves the kcompaction problem requires \Omega\Gamma p log n) time, even if the number of processors is arbitrarily large and k = 2. On the CREW PRAM, we show that every nprocessor algorithm for kcompaction problem requires\Omega\Gammaqui log n) time, even if k = 2. Finally, we show that O(log k) time can be achieved on the ROBUST PRAM, a very weak CRCW PRAM model.
A Note on Reducing Parallel Model Simulations to Integer Sorting
, 1995
"... We show that simulating a step of a fetch&add pram model on an erew pram model can be made as efficient as integer sorting. In particular, we present several efficient reductions of the simulation problem to various integer sorting problems. By using some recent algorithms for integer sorting, w ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We show that simulating a step of a fetch&add pram model on an erew pram model can be made as efficient as integer sorting. In particular, we present several efficient reductions of the simulation problem to various integer sorting problems. By using some recent algorithms for integer sorting, we get simulation algorithms on crew and erew that take o(n lg n) operations where n is the number of processors in the simulated crcw machine. Previous simulations were using \Theta(n lg n) operations. Some of the more interesting simulation results are obtained by using a bootstrapping technique with a crcw pram algorithm for hashing. 1 Introduction The concurrentread concurrentwrite (crcw) pram programmer's model is commonly used for designing parallel algorithms. On the other hand, the weaker exclusivewrite pram models are sometimes considered closer to realization. Therefore, while it is more convenient to design algorithms for the stronger crcw model, an extra effort is sometimes neede...
Simple Fast Parallel Hashing by Oblivious Execution
 AT&T Bell Laboratories
, 1994
"... A hash table is a representation of a set in a linear size data structure that supports constanttime membership queries. We show how to construct a hash table for any given set of n keys in O(lg lg n) parallel time with high probability, using n processors on a weak version of a crcw pram. Our algo ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
A hash table is a representation of a set in a linear size data structure that supports constanttime membership queries. We show how to construct a hash table for any given set of n keys in O(lg lg n) parallel time with high probability, using n processors on a weak version of a crcw pram. Our algorithm uses a novel approach of hashing by "oblivious execution" based on probabilistic analysis to circumvent the parity lower bound barrier at the nearlogarithmic time level. The algorithm is simple and is sketched by the following: 1. Partition the input set into buckets by a random polynomial of constant degree. 2. For t := 1 to O(lg lg n) do (a) Allocate M t memory blocks, each of size K t . (b) Let each bucket select a block at random, and try to injectively map its keys into the block using a random linear function. Buckets that fail carry on to the next iteration. The crux of the algorithm is a careful a priori selection of the parameters M t and K t . The algorithm uses only O(lg lg...
A Simple and Practical LinearWork Parallel Algorithm for Connectivity
"... Graph connectivity is a fundamental problem in computer science with many important applications. Sequentially, connectivity can be done in linear work easily using breadthfirst search or depthfirst search. There have been many parallel algorithms for connectivity, however the simpler parallel al ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Graph connectivity is a fundamental problem in computer science with many important applications. Sequentially, connectivity can be done in linear work easily using breadthfirst search or depthfirst search. There have been many parallel algorithms for connectivity, however the simpler parallel algorithms require superlinear work, and the linearwork polylogarithmicdepth parallel algorithms are very complicated and not amenable to implementation. In this work, we address this gap by describing a simple and practical expected linearwork, polylogarithmic depth parallel algorithm for graph connectivity. Our algorithm is based on a recent parallel algorithm for generating lowdiameter graph decompositions by Miller et al. [44], which uses parallel breadthfirst searches. We discuss a (modest) variant of their decomposition algorithm which preserves the theoretical complexity while leading to simpler and faster implementations. We experimentally compare the connectivity algorithms using both the original decomposition algorithm and our modified decomposition algorithm. We also experimentally compare against the fastest existing parallel connectivity implementations (which are not theoretically linearwork and polylogarithmicdepth) and show that our implementations are competitive for various input graphs. In addition, we compare our implementations to sequential connectivity algorithms and show that on 40 cores we achieve good speedup relative to the sequential implementations for many input graphs. We discuss the various optimizations used in our implementations and present an extensive experimental analysis of the performance. Our algorithm is the first parallel connectivity algorithm that is both theoretically and practically efficient.
Multicore Triangle Computations Without Tuning
"... Abstract—Triangle counting and enumeration has emerged as a basic tool in largescale network analysis, fueling the development of algorithms that scale to massive graphs. Most of the existing algorithms, however, are designed for the distributedmemory setting or the externalmemory setting, and ca ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract—Triangle counting and enumeration has emerged as a basic tool in largescale network analysis, fueling the development of algorithms that scale to massive graphs. Most of the existing algorithms, however, are designed for the distributedmemory setting or the externalmemory setting, and cannot take full advantage of a multicore machine, whose capacity has grown to accommodate even the largest of realworld graphs. This paper describes the design and implementation of simple and fast multicore parallel algorithms for exact, as well as approximate, triangle counting and other triangle computations that scale to billions of nodes and edges. Our algorithms are provably cachefriendly, easy to implement in a language that supports dynamic parallelism, such as Cilk Plus or OpenMP, and do not require parameter tuning. On a 40core machine with twoway hyperthreading, our parallel exact global and local triangle counting algorithms obtain speedups of 17–50x on a set of realworld and synthetic graphs, and are faster than previous parallel exact triangle counting algorithms. We can compute the exact triangle count of the Yahoo Web graph (over 6 billion edges) in under 1.5 minutes. In addition, for approximate triangle counting, we are able to approximate the count for the Yahoo graph to within 99.6 % accuracy in under 10 seconds, and for a given accuracy we are much faster than existing parallel approximate triangle counting implementations. I.
On a ParallelAlgorithms Method for String Matching Problems
, 1994
"... Suffix trees are the main datastructure in string matching algorithmics. There are several serial algorithms for suffix tree construction which run in linear time, but the number of operations in the only parallel algorithm available, due to Apostolico, Iliopoulos, Landau, Schieber and Vishkin, is ..."
Abstract
 Add to MetaCart
Suffix trees are the main datastructure in string matching algorithmics. There are several serial algorithms for suffix tree construction which run in linear time, but the number of operations in the only parallel algorithm available, due to Apostolico, Iliopoulos, Landau, Schieber and Vishkin, is proportional to n log n. The algorithm is based on labeling substrings, similar to a classical serial algorithm, with the same operations bound, by Karp, Miller and Rosenberg. We show how to break symmetries that occur in the process of assigning labels using the Deterministic Coin Tossing (DCT) technique, and thereby reduce the number of labeled substrings to linear.
RNAi Inexact Match Gene Family Knockdown
, 2007
"... appreciated for their kindness, expert guidance, and meaningful technical discussions. Professor Darko Stefanovic generously donated unlimited computing time on a high quality platform, and additionally gave personal time devoted to many thoughtful discussions. ..."
Abstract
 Add to MetaCart
(Show Context)
appreciated for their kindness, expert guidance, and meaningful technical discussions. Professor Darko Stefanovic generously donated unlimited computing time on a high quality platform, and additionally gave personal time devoted to many thoughtful discussions.
Comments on Integer Sorting on SumCRCW
"... Abstract Given an array X of n elements from a restricted domain of integers [1, n]. The integer sorting problem is the rearrangement of n integers in ascending order. We study the first optimal deterministic sublogarithmic algorithm for integer sorting on CRCW PRAM. We give two comments on the alg ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Given an array X of n elements from a restricted domain of integers [1, n]. The integer sorting problem is the rearrangement of n integers in ascending order. We study the first optimal deterministic sublogarithmic algorithm for integer sorting on CRCW PRAM. We give two comments on the algorithm. The first comment is the algorithm not runs in sublogarithmic time for any distribution of input data. The second comment is the cost of the algorithm is not linear. Then, we modify the algorithm to be optimal in sense of cost with a restriction on the input data. Our modification algorithm has time complexity log n n log log n O ( log log n) using log n SumCRCW processors. Also, the algorithm has linear space. I.
Converting High Probability into NearlyConstant Time  with Applications to Parallel Hashing
, 1991
"... ) Yossi Matias Uzi Vishkin University of Maryland & TelAviv University Abstract We present a new paradigm for efficient randomized parallel algorithms that needs O(log n) time, where O(x) means `O(x) expected'. It leads to: (1) constructing a perfect hash function for n elements in O(l ..."
Abstract
 Add to MetaCart
(Show Context)
) Yossi Matias Uzi Vishkin University of Maryland & TelAviv University Abstract We present a new paradigm for efficient randomized parallel algorithms that needs O(log n) time, where O(x) means `O(x) expected'. It leads to: (1) constructing a perfect hash function for n elements in O(log n log(log n)) time and O(n) operations; (2) an algorithm for generating a random permutation in O(log n) time, using n processors or in O(log n log(log n)) time and O(n) operations; and (3) an efficient optimizer: consider a parallel algorithm that runs in t time using p processors; since at each time unit some of the processors may be idle, we let x, the total number of actual operations, be the sum over all nonidle processors at every time unit; assuming the algorithm belongs to a certain kind, it can be adapted to run in O(t+log n log(log n)) time (additive overhead!) using x=(t + log n log(log n)) processors. We also get an optimal integer sorting algorithm. Given...