Results 11  20
of
32
Retrieval of scattered information by EREW, CREW and CRCW PRAMs
, 1992
"... The kcompaction problem arises when k out of n cells in an array are nonempty and the contents of these cells must be moved to the first k locations in the array. Parallel algorithms for kcompaction have obvious applications in processor allocation and load balancing; kcompaction is also an im ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
The kcompaction problem arises when k out of n cells in an array are nonempty and the contents of these cells must be moved to the first k locations in the array. Parallel algorithms for kcompaction have obvious applications in processor allocation and load balancing; kcompaction is also an important subroutine in many recently developed parallel algorithms. We show that any EREW PRAM that solves the kcompaction problem requires \Omega\Gamma p log n) time, even if the number of processors is arbitrarily large and k = 2. On the CREW PRAM, we show that every nprocessor algorithm for kcompaction problem requires\Omega\Gammaqui log n) time, even if k = 2. Finally, we show that O(log k) time can be achieved on the ROBUST PRAM, a very weak CRCW PRAM model.
A Parallel DFA Minimization Algorithm
"... Abstract. In this paper,we have considered the state minimization problem for Deterministic Finite Automata (DFA). An efficient parallel algorithm for solving the problem on an arbitrary CRCW PRAM has been proposed. For n number of states and k number of inputs in Σ of the DFA to be minimized,the al ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper,we have considered the state minimization problem for Deterministic Finite Automata (DFA). An efficient parallel algorithm for solving the problem on an arbitrary CRCW PRAM has been proposed. For n number of states and k number of inputs in Σ of the DFA to be minimized,the algorithm runs in O(kn log n) timeand) processors. uses O ( n log n 1
A Note on Reducing Parallel Model Simulations to Integer Sorting
, 1995
"... We show that simulating a step of a fetch&add pram model on an erew pram model can be made as efficient as integer sorting. In particular, we present several efficient reductions of the simulation problem to various integer sorting problems. By using some recent algorithms for integer sorting, w ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We show that simulating a step of a fetch&add pram model on an erew pram model can be made as efficient as integer sorting. In particular, we present several efficient reductions of the simulation problem to various integer sorting problems. By using some recent algorithms for integer sorting, we get simulation algorithms on crew and erew that take o(n lg n) operations where n is the number of processors in the simulated crcw machine. Previous simulations were using \Theta(n lg n) operations. Some of the more interesting simulation results are obtained by using a bootstrapping technique with a crcw pram algorithm for hashing. 1 Introduction The concurrentread concurrentwrite (crcw) pram programmer's model is commonly used for designing parallel algorithms. On the other hand, the weaker exclusivewrite pram models are sometimes considered closer to realization. Therefore, while it is more convenient to design algorithms for the stronger crcw model, an extra effort is sometimes neede...
Simple Fast Parallel Hashing by Oblivious Execution
 AT&T Bell Laboratories
, 1994
"... A hash table is a representation of a set in a linear size data structure that supports constanttime membership queries. We show how to construct a hash table for any given set of n keys in O(lg lg n) parallel time with high probability, using n processors on a weak version of a crcw pram. Our algo ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
A hash table is a representation of a set in a linear size data structure that supports constanttime membership queries. We show how to construct a hash table for any given set of n keys in O(lg lg n) parallel time with high probability, using n processors on a weak version of a crcw pram. Our algorithm uses a novel approach of hashing by "oblivious execution" based on probabilistic analysis to circumvent the parity lower bound barrier at the nearlogarithmic time level. The algorithm is simple and is sketched by the following: 1. Partition the input set into buckets by a random polynomial of constant degree. 2. For t := 1 to O(lg lg n) do (a) Allocate M t memory blocks, each of size K t . (b) Let each bucket select a block at random, and try to injectively map its keys into the block using a random linear function. Buckets that fail carry on to the next iteration. The crux of the algorithm is a careful a priori selection of the parameters M t and K t . The algorithm uses only O(lg lg...
A Simple and Practical LinearWork Parallel Algorithm for Connectivity
"... Graph connectivity is a fundamental problem in computer science with many important applications. Sequentially, connectivity can be done in linear work easily using breadthfirst search or depthfirst search. There have been many parallel algorithms for connectivity, however the simpler parallel al ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Graph connectivity is a fundamental problem in computer science with many important applications. Sequentially, connectivity can be done in linear work easily using breadthfirst search or depthfirst search. There have been many parallel algorithms for connectivity, however the simpler parallel algorithms require superlinear work, and the linearwork polylogarithmicdepth parallel algorithms are very complicated and not amenable to implementation. In this work, we address this gap by describing a simple and practical expected linearwork, polylogarithmic depth parallel algorithm for graph connectivity. Our algorithm is based on a recent parallel algorithm for generating lowdiameter graph decompositions by Miller et al. [44], which uses parallel breadthfirst searches. We discuss a (modest) variant of their decomposition algorithm which preserves the theoretical complexity while leading to simpler and faster implementations. We experimentally compare the connectivity algorithms using both the original decomposition algorithm and our modified decomposition algorithm. We also experimentally compare against the fastest existing parallel connectivity implementations (which are not theoretically linearwork and polylogarithmicdepth) and show that our implementations are competitive for various input graphs. In addition, we compare our implementations to sequential connectivity algorithms and show that on 40 cores we achieve good speedup relative to the sequential implementations for many input graphs. We discuss the various optimizations used in our implementations and present an extensive experimental analysis of the performance. Our algorithm is the first parallel connectivity algorithm that is both theoretically and practically efficient.
Multicore Triangle Computations Without Tuning
"... Abstract—Triangle counting and enumeration has emerged as a basic tool in largescale network analysis, fueling the development of algorithms that scale to massive graphs. Most of the existing algorithms, however, are designed for the distributedmemory setting or the externalmemory setting, and ca ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract—Triangle counting and enumeration has emerged as a basic tool in largescale network analysis, fueling the development of algorithms that scale to massive graphs. Most of the existing algorithms, however, are designed for the distributedmemory setting or the externalmemory setting, and cannot take full advantage of a multicore machine, whose capacity has grown to accommodate even the largest of realworld graphs. This paper describes the design and implementation of simple and fast multicore parallel algorithms for exact, as well as approximate, triangle counting and other triangle computations that scale to billions of nodes and edges. Our algorithms are provably cachefriendly, easy to implement in a language that supports dynamic parallelism, such as Cilk Plus or OpenMP, and do not require parameter tuning. On a 40core machine with twoway hyperthreading, our parallel exact global and local triangle counting algorithms obtain speedups of 17–50x on a set of realworld and synthetic graphs, and are faster than previous parallel exact triangle counting algorithms. We can compute the exact triangle count of the Yahoo Web graph (over 6 billion edges) in under 1.5 minutes. In addition, for approximate triangle counting, we are able to approximate the count for the Yahoo graph to within 99.6 % accuracy in under 10 seconds, and for a given accuracy we are much faster than existing parallel approximate triangle counting implementations. I.
Using Learning and Difficulty of Prediction to Decrease Computation: A Fast Sort and Priority Queue on Entropy Bounded Inputs
"... There is an upsurge in interest in the Markov model and also more general stationary ergodic stochastic distributions in theoretical computer science community recently, (e.g. see [Vitter, Krishnan, FOCS91], [Karlin, Philips, Raghavan, FOCS92] [Raghavan92]) for use of Markov models for online algor ..."
Abstract
 Add to MetaCart
There is an upsurge in interest in the Markov model and also more general stationary ergodic stochastic distributions in theoretical computer science community recently, (e.g. see [Vitter, Krishnan, FOCS91], [Karlin, Philips, Raghavan, FOCS92] [Raghavan92]) for use of Markov models for online algorithms e.g., cashing and prefetching). Their results used the fact that compressible sources are predictable (and vise versa), and show that online algorithms can improve their performance by prediction. Actual page access sequences are in fact somewhat compressible, so their predictive methods can be of benefit. This paper investigates the interesting idea of decreasing computation by using learning in the opposite way, namely to determine the difficulty of prediction. That is, we will approximately learn the input distribution, and then improve the performance of the computation when the input is not too predictable, rather than the reverse. To our knowledge, this is first case of a computational problem where we do not assume any particular fixed input distribution and yet computation is decreased when the input is less predictable, rather than the reverse. We concentrate our investigation on a basic computational problem: sorting and a basic data structure problem: maintaining a priority queue. We present the first known case of sorting and priority queue algorithms whose complexity depends on the binary entropy H ≤ 1 of input keys where assume that input keys are generated from an unknown but arbitrary stationary ergodic source. This is, we assume that each of the input keys can be each arbitrarily long, but have entropy H. Note that H can be
On a ParallelAlgorithms Method for String Matching Problems
, 1994
"... Suffix trees are the main datastructure in string matching algorithmics. There are several serial algorithms for suffix tree construction which run in linear time, but the number of operations in the only parallel algorithm available, due to Apostolico, Iliopoulos, Landau, Schieber and Vishkin, is ..."
Abstract
 Add to MetaCart
Suffix trees are the main datastructure in string matching algorithmics. There are several serial algorithms for suffix tree construction which run in linear time, but the number of operations in the only parallel algorithm available, due to Apostolico, Iliopoulos, Landau, Schieber and Vishkin, is proportional to n log n. The algorithm is based on labeling substrings, similar to a classical serial algorithm, with the same operations bound, by Karp, Miller and Rosenberg. We show how to break symmetries that occur in the process of assigning labels using the Deterministic Coin Tossing (DCT) technique, and thereby reduce the number of labeled substrings to linear.