Results 1 
8 of
8
A Fast, Parallel Spanning Tree Algorithm for Symmetric Multiprocessors (SMPs) (Extended Abstract)
, 2004
"... Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial probl ..."
Abstract

Cited by 31 (11 self)
 Add to MetaCart
Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial problems that have simple and efficient sequential implementations and fast PRAM algorithms, but often have no known efficient parallel implementations. In this paper we present a new randomized algorithm and implementation with superior performance that for the firsttime achieves parallel speedup on arbitrary graphs (both regular and irregular topologies) when compared with the best sequential implementation for finding a spanning tree. This new algorithm uses several techniques to give an expected running time that scales linearly with the number p of processors for suitably large inputs (n> p 2). As the spanning tree problem is notoriously hard for any parallel implementation to achieve reasonable speedup, our study may shed new light on implementing PRAM algorithms for sharedmemory parallel computers. The main results of this paper are 1. A new and practical spanning tree algorithm for symmetric multiprocessors that exhibits parallel speedups on graphs with regular and irregular topologies; and 2. An experimental study of parallel spanning tree algorithms that reveals the superior performance of our new approach compared with the previous algorithms. The source code for these algorithms is freelyavailable from our web site hpc.ece.unm.edu.
Fast SharedMemory Algorithms for Computing the Minimum Spanning Forest of Sparse Graphs
, 2006
"... ..."
Practical Parallel Algorithms for Minimum Spanning Trees
 In Workshop on Advances in Parallel and Distributed Systems
, 1998
"... We study parallel algorithms for computing the minimum spanning tree of a weighted undirected graph G with n vertices and m edges. We consider an input graph G with m=n p, where p is the number of processors. For this case, we show that simple algorithms with dataindependent communication patterns ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
We study parallel algorithms for computing the minimum spanning tree of a weighted undirected graph G with n vertices and m edges. We consider an input graph G with m=n p, where p is the number of processors. For this case, we show that simple algorithms with dataindependent communication patterns are efficient, both in theory and in practice. The algorithms are evaluated theoretically using Valiant's BSP model of parallel computation and empirically through implementation results.
On the architectural requirements for efficient execution of graph algorithms
 In Proc. 34th Int’l Conf. on Parallel Processing (ICPP
, 2005
"... Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to noncontiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, conti ..."
Abstract

Cited by 15 (7 self)
 Add to MetaCart
Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to noncontiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, contiguous memory accesses, and so are inefficient platforms for such algorithms. Few parallel graph algorithms outperform their best sequential implementation on SMP clusters due to long memory latencies and high synchronization costs. In this paper, we consider the performance and scalability of two graph algorithms, list ranking and connected components, on two classes of sharedmemory computers: symmetric multiprocessors such as the Sun Enterprise servers and multithreaded architectures
The Euler tour technique and parallel rooted spanning tree
 In Proc. Int’l Conf. on Parallel Processing (ICPP
, 2004
"... Many parallel algorithms for graph problems start with finding a spanning tree and rooting the tree to define some structural relationship on the vertices which can be used by following problem specific computations. The generic procedure is to find an unrooted spanning tree and then root the spanni ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
Many parallel algorithms for graph problems start with finding a spanning tree and rooting the tree to define some structural relationship on the vertices which can be used by following problem specific computations. The generic procedure is to find an unrooted spanning tree and then root the spanning tree using the Euler tour technique. With a randomized worktime optimal unrooted spanning tree algorithm and worktime optimal list ranking, finding rooted spanning trees can be done worktime optimally on EREW PRAM w.h.p. Yet the Euler tour technique assumes as “given ” a circular adjacency list, it is not without implications though to construct the circular adjacency list for the spanning tree found on the fly by a spanning tree algorithm. In fact our experiments show that this “hidden ” step of constructing a circular adjacency list could take as much time as both spanning tree and list ranking combined. In this paper we present new efficient algorithms that find rooted spanning trees without using the Euler tour technique and incur little or no overhead over the underlying spanning tree algorithms. We also present two new approaches that construct Euler tours efficiently when the circular adjacency list is not given. One is a deterministic PRAM algorithm and the other is a randomized algorithm in the symmetric multiprocessor (SMP) model. The randomized algorithm takes a novel approach for the problems of constructing the Euler tour and rooting a tree. It computes a rooted spanning tree first, then constructs an Euler tour directly for the tree using depthfirst traversal. The tour constructed is cachefriendly with adjacent edges in the
Parallel programming with object assemblies
 In OOPSLA, 2009
"... We present Chorus, a highlevel parallel programming model suitable for irregular, heapmanipulating applications like mesh refinement and epidemic simulations, and JChorus, an implementation of the model on top of Java. One goal of Chorus is to express the dynamic and instancedependent patterns of ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
We present Chorus, a highlevel parallel programming model suitable for irregular, heapmanipulating applications like mesh refinement and epidemic simulations, and JChorus, an implementation of the model on top of Java. One goal of Chorus is to express the dynamic and instancedependent patterns of memory access that are common in typical irregular applications. Its other focus is locality of effects: the property that in many of the same applications, typical imperative commands only affect small, local regions in the shared heap. Chorus addresses dynamism and locality through the unifying abstraction of an object assembly: a local region in a shared data structure equipped with a shortlived, speculative thread of control. The thread of control in an assembly can only access objects within the assembly. While objects can migrate from assembly to assembly, such migration is local—i.e., objects only move from one assembly to a neighboring one—and does not lead to aliasing. Programming primitives include a merge operation, by which an assembly merges with an adjacent assembly, and a split operation, which splits an assembly into smaller ones. Our abstractions are race and deadlockfree, and inherently datacentric. We demonstrate that Chorus and JChorus allow natural programming of several important applications exhibiting irregular dataparallelism. We also present an implementation of JChorus based on a manytoone mapping of assemblies to lowerlevel threads, and report on preliminary performance numbers.
Lockfree parallel algorithms: An experimental study
 In Proceedings of the 11th International Conference High Performance Computing
, 2004
"... Abstract. Lockfree shared data structures in the setting of distributed computing have received a fair amount of attention. Major motivations of lockfree data structures include increasing fault tolerance of a (possibly heterogeneous) system and getting rid of the problems associated with critical ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Abstract. Lockfree shared data structures in the setting of distributed computing have received a fair amount of attention. Major motivations of lockfree data structures include increasing fault tolerance of a (possibly heterogeneous) system and getting rid of the problems associated with critical sections such as priority inversion and deadlock. For parallel computers with closelycoupled processors and shared memory, these issues are no longer major concerns. While many of the results are applicable especially when the model used is shared memory multiprocessors, no prior studies have considered improving the performance of a parallel implementation by way of lockfree programming. As a matter of fact, often times in practice lock free data structures in a distributed setting do not perform as well as those that use locks. As the data structures and algorithms for parallel computing are often drastically different from those in distributed computing, it is possible that lockfree programs perform better. In this paper we compare the similarity and difference of lockfree programming in both distributed and parallel computing environments and explore the possibility of adapting lockfree programming to parallel computing to improve performances. Lockfree programming also provides a new way of simulating PRAM and asynchronous PRAM algorithms on current parallel machines.
Techniques for Designing Efficient Parallel Graph Algorithms for SMPs and Multicore Processors
"... Abstract. Graph problems are finding increasing applications in high performance computing disciplines. Although many regular problems can be solved efficiently in parallel, obtaining efficient implementations for irregular graph problems remains a challenge. We propose techniques for designing and ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Abstract. Graph problems are finding increasing applications in high performance computing disciplines. Although many regular problems can be solved efficiently in parallel, obtaining efficient implementations for irregular graph problems remains a challenge. We propose techniques for designing and implementing efficient parallel algorithms for graph problems on symmetric multiprocessors and chip multiprocessors with a case study of parallel tree and connectivity algorithms. The problems we study represent a wide range of irregular problems that have fast theoretic parallel algorithms but no known efficient parallel implementations that achieve speedup without serious restricting assumptions about the inputs. We believe our techniques will be of practical impact in solving largescale graph problems.