Results 1  10
of
19
Computing with Very Weak Random Sources
, 1994
"... For any fixed 6> 0, we show how to simulate RP algorithms in time nO(‘Ogn) using the output of a 6source wath minentropy R‘. Such a weak random source is asked once for R bits; it outputs an Rbit string such that any string has probability at most 2Rc. If 6> 1 l/(k + l), our BPP simulations tak ..."
Abstract

Cited by 73 (7 self)
 Add to MetaCart
For any fixed 6> 0, we show how to simulate RP algorithms in time nO(‘Ogn) using the output of a 6source wath minentropy R‘. Such a weak random source is asked once for R bits; it outputs an Rbit string such that any string has probability at most 2Rc. If 6> 1 l/(k + l), our BPP simulations take time no(‘og(k)n) (log(k) is the logarithm iterated k times). We also gave a polynomialtime BPP simulation using ChorGoldreich sources of minentropy Ro(’), which is optimal. We present applications to timespace tradeoffs, expander constructions, and the hardness of approximation. Also of interest is our randomnessefficient Leflover Hash Lemma, found independently by Goldreich & Wigderson.
Can a SharedMemory Model Serve as a Bridging Model for Parallel Computation?
, 1999
"... There has been a great deal of interest recently in the development of generalpurpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style fo ..."
Abstract

Cited by 42 (11 self)
 Add to MetaCart
There has been a great deal of interest recently in the development of generalpurpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style for designing algorithms when compared with the PRAM model. Indeed, while many consider data parallelism as a convenient style, and the sharedmemory abstraction as an easytouse platform, the bandwidth limitations of current machines have diverted much attention to messagepassing and distributedmemory models (such as the BSP and LogP) that account more properly for these limitations. In this paper we consider the question of whether a sharedmemory model can serve as an effective bridging model for parallel computation. In particular, can a sharedmemory model be as effective as, say, the BSP? As a candidate for a bridging model, we introduce the Queuing SharedMemory (QSM) model, which accounts for limited communication bandwidth while still providing a simple sharedmemory abstraction. We substantiate the ability of the QSM to serve as a bridging model by providing a simple workpreserving emulation of the QSM on both the BSP, and on a related model, the (d, x)BSP. We present evidence that the features of the QSM are essential to its effectiveness as a bridging model. In addition, we describe scenarios
A Fast, Parallel Spanning Tree Algorithm for Symmetric Multiprocessors (SMPs) (Extended Abstract)
, 2004
"... Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial probl ..."
Abstract

Cited by 31 (11 self)
 Add to MetaCart
Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial problems that have simple and efficient sequential implementations and fast PRAM algorithms, but often have no known efficient parallel implementations. In this paper we present a new randomized algorithm and implementation with superior performance that for the firsttime achieves parallel speedup on arbitrary graphs (both regular and irregular topologies) when compared with the best sequential implementation for finding a spanning tree. This new algorithm uses several techniques to give an expected running time that scales linearly with the number p of processors for suitably large inputs (n> p 2). As the spanning tree problem is notoriously hard for any parallel implementation to achieve reasonable speedup, our study may shed new light on implementing PRAM algorithms for sharedmemory parallel computers. The main results of this paper are 1. A new and practical spanning tree algorithm for symmetric multiprocessors that exhibits parallel speedups on graphs with regular and irregular topologies; and 2. An experimental study of parallel spanning tree algorithms that reveals the superior performance of our new approach compared with the previous algorithms. The source code for these algorithms is freelyavailable from our web site hpc.ece.unm.edu.
Improved Algorithms via Approximations of Probability Distributions
 Journal of Computer and System Sciences
, 1997
"... We present two techniques for approximating probability distributions. The first is a simple method for constructing the smallbias probability spaces introduced by Naor & Naor. We show how to efficiently combine this construction with the method of conditional probabilities to yield improved NC alg ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
We present two techniques for approximating probability distributions. The first is a simple method for constructing the smallbias probability spaces introduced by Naor & Naor. We show how to efficiently combine this construction with the method of conditional probabilities to yield improved NC algorithms for many problems such as set discrepancy, finding large cuts in graphs, finding large acyclic subgraphs etc. The second is a construction of small probability spaces approximating general independent distributions, which is of smaller size than the constructions of Even, Goldreich, Luby, Nisan & Velickovi'c. Such approximations are useful, e.g., for the derandomization of certain randomized algorithms. Keywords. Derandomization, parallel algorithms, discrepancy, graph coloring, small sample spaces, explicit constructions. 1 Introduction Derandomization, the development of general tools to derive efficient deterministic algorithms from their randomized counterparts, has blossomed ...
Fast SharedMemory Algorithms for Computing the Minimum Spanning Forest of Sparse Graphs
, 2006
"... ..."
CGMgraph/CGMlib: Implementing and Testing CGM Graph Algorithms on PC Clusters
 International Journal of High Performance Computing Applications
, 2003
"... In this paper, we present CGMgraph, the first integrated library of parallel graph methods for PCclu8(T9 based on CGM algo rithms. CGMgraph implements parallel methods for variou graph prob lems. Ou implementations of deterministic list ranking, Eu er tou con nected components, spanning forest, and ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
In this paper, we present CGMgraph, the first integrated library of parallel graph methods for PCclu8(T9 based on CGM algo rithms. CGMgraph implements parallel methods for variou graph prob lems. Ou implementations of deterministic list ranking, Eu er tou con nected components, spanning forest, and bipartite graph detection are, to ou r knowledge, the first e#cient implementations for PC clu sters.Ou library also inclu des CGMlib, a library of basic CGM tools su ch as sort ing, prefix su m, one to all broadcast, all to one gather, h Relation, all to all broadcast, array balancing, and CGM partitioning. Both libraries are available for download at http://cgm.dehne.net. 1
On the architectural requirements for efficient execution of graph algorithms
 In Proc. 34th Int’l Conf. on Parallel Processing (ICPP
, 2005
"... Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to noncontiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, conti ..."
Abstract

Cited by 15 (7 self)
 Add to MetaCart
Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to noncontiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, contiguous memory accesses, and so are inefficient platforms for such algorithms. Few parallel graph algorithms outperform their best sequential implementation on SMP clusters due to long memory latencies and high synchronization costs. In this paper, we consider the performance and scalability of two graph algorithms, list ranking and connected components, on two classes of sharedmemory computers: symmetric multiprocessors such as the Sun Enterprise servers and multithreaded architectures
Explicit ordispersers with polylogarithmic degree
 J. ACM
, 1998
"... An (N,M,T)ORdisperser is a bipartite multigraph G = (V,W,E) withV  = N, and W  = M, having the following expansion property: any subset of V having at least T vertices has a neighbor set of size at least M/2. For any pair of constants ξ,λ,1 ≥ ξ>λ ≥ 0, any sufficiently large N, andforany (log ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
An (N,M,T)ORdisperser is a bipartite multigraph G = (V,W,E) withV  = N, and W  = M, having the following expansion property: any subset of V having at least T vertices has a neighbor set of size at least M/2. For any pair of constants ξ,λ,1 ≥ ξ>λ ≥ 0, any sufficiently large N, andforany (log N)ξ (log N)λ T ≥ 2, M ≤ 2, we give an explicit elementary construction of an (N,M,T)ORdisperser such that the outdegree of any vertex in V is at most polylogarithmic in N. Using this with known applications of ORdispersers yields several results. First, our construction implies that the complexity class StrongRP defined by Sipser, equals RP. Second, for any fixed η>0, we give the first polynomialtime simulation of RP algorithms using the output of any “ηminimally random ” source. For any integral R>0, such a source accepts a single request for an Rbit string and generates the string according to a distribution that assigns probability at most 2−Rη to any string. It is minimally random in the sense that any weaker source is
Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines
 In Proceedings of Supercomputing '95
, 1996
"... : We present and analyze a portable, highperformance algorithm for finding connected components on modern distributed memory multiprocessors. The algorithm is a hybrid of the classic DFS on the subgraph local to each processor and a variant of the ShiloachVishkin PRAM algorithm on the global colle ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
: We present and analyze a portable, highperformance algorithm for finding connected components on modern distributed memory multiprocessors. The algorithm is a hybrid of the classic DFS on the subgraph local to each processor and a variant of the ShiloachVishkin PRAM algorithm on the global collection of subgraphs. We implement the algorithm in SplitC and measure performance on the the Cray T3D, the Meiko CS2, and the Thinking Machines CM5 using a class of graphs derived from cluster dynamics methods in computational physics. On a 256 processor Cray T3D, the implementation outperforms all previous solutions by an order of magnitude. A characterization of graph parameters allows us to select graphs that highlight key performance features. We study the effects of these parameters and machine characteristics on the balance of time between the local and global phases of the algorithm and find that edge density, surfacetovolume ratio, and relative communication cost dominate perform...
ConnectedComponents Algorithms For MeshConnected Parallel Computers
 PRESENTED AT THE 3RD DIMACS IMPLEMENTATION CHALLENGE WORKSHOP
, 1995
"... We present efficient parallel algorithms for finding the connected components of sparse and dense graphs using a meshconnected parallel computer. We start with a PRAM algorithm with work complexity O(n²log n). The algorithm performs O(logn) reduction and broadcast operations on within the rows an ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We present efficient parallel algorithms for finding the connected components of sparse and dense graphs using a meshconnected parallel computer. We start with a PRAM algorithm with work complexity O(n²log n). The algorithm performs O(logn) reduction and broadcast operations on within the rows and columns of a mesh connected computer. Next, a representation of the adjacency matrix for a sparse graph with m edges is chosen that preserves the communication structure of the algorithm but improves the work bound to O((n + m)logn). This work bound can be improved to the optimal O(n +m) bound through the use of graph contraction. In architectures like the MasPar MP1 and MP2, parallel row and column operations of the form described achieve high performance relative to unrestricted concurrent accesses typically found in parallel connected component algorithms for sparse graphs and exhibit no locality dependence. We present MasPar MP1 performance figures for implementations of the a...