Results 1 -
8 of
8
A Fast, Parallel Spanning Tree Algorithm for Symmetric Multiprocessors (SMPs) (Extended Abstract)
, 2004
"... Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial probl ..."
Abstract
-
Cited by 27 (11 self)
- Add to MetaCart
Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial problems that have simple and efficient sequential implementations and fast PRAM algorithms, but often have no known efficient parallel implementations. In this paper we present a new randomized algorithm and implementation with superior performance that for the first-time achieves parallel speedup on arbitrary graphs (both regular and irregular topologies) when compared with the best sequential implementation for finding a spanning tree. This new algorithm uses several techniques to give an expected running time that scales linearly with the number p of processors for suitably large inputs (n> p 2). As the spanning tree problem is notoriously hard for any parallel implementation to achieve reasonable speedup, our study may shed new light on implementing PRAM algorithms for shared-memory parallel computers. The main results of this paper are 1. A new and practical spanning tree algorithm for symmetric multiprocessors that exhibits parallel speedups on graphs with regular and irregular topologies; and 2. An experimental study of parallel spanning tree algorithms that reveals the superior performance of our new approach compared with the previous algorithms. The source code for these algorithms is freely-available from our web site hpc.ece.unm.edu.
Fast Shared-Memory Algorithms for Computing the Minimum Spanning Forest of Sparse Graphs
, 2006
"... ..."
On the architectural requirements for efficient execution of graph algorithms
- In Proc. 34th Int’l Conf. on Parallel Processing (ICPP
, 2005
"... Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to non-contiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, conti ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to non-contiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, contiguous memory accesses, and so are inefficient platforms for such algorithms. Few parallel graph algorithms outperform their best sequential implementation on SMP clusters due to long memory latencies and high synchronization costs. In this paper, we consider the performance and scalability of two graph algorithms, list ranking and connected components, on two classes of sharedmemory computers: symmetric multiprocessors such as the Sun Enterprise servers and multithreaded architectures
Practical Parallel Algorithms for Minimum Spanning Trees
- In Workshop on Advances in Parallel and Distributed Systems
, 1998
"... We study parallel algorithms for computing the minimum spanning tree of a weighted undirected graph G with n vertices and m edges. We consider an input graph G with m=n p, where p is the number of processors. For this case, we show that simple algorithms with dataindependent communication patterns ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
We study parallel algorithms for computing the minimum spanning tree of a weighted undirected graph G with n vertices and m edges. We consider an input graph G with m=n p, where p is the number of processors. For this case, we show that simple algorithms with dataindependent communication patterns are efficient, both in theory and in practice. The algorithms are evaluated theoretically using Valiant's BSP model of parallel computation and empirically through implementation results.
Lock-free parallel algorithms: An experimental study
- In Proceedings of the 11th International Conference High Performance Computing
, 2004
"... Abstract. Lock-free shared data structures in the setting of distributed computing have received a fair amount of attention. Major motivations of lock-free data structures include increasing fault tolerance of a (possibly heterogeneous) system and getting rid of the problems associated with critical ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Abstract. Lock-free shared data structures in the setting of distributed computing have received a fair amount of attention. Major motivations of lock-free data structures include increasing fault tolerance of a (possibly heterogeneous) system and getting rid of the problems associated with critical sections such as priority inversion and deadlock. For parallel computers with closely-coupled processors and shared memory, these issues are no longer major concerns. While many of the results are applicable especially when the model used is shared memory multiprocessors, no prior studies have considered improving the performance of a parallel implementation by way of lock-free programming. As a matter of fact, often times in practice lock free data structures in a distributed setting do not perform as well as those that use locks. As the data structures and algorithms for parallel computing are often drastically different from those in distributed computing, it is possible that lock-free programs perform better. In this paper we compare the similarity and difference of lock-free programming in both distributed and parallel computing environments and explore the possibility of adapting lock-free programming to parallel computing to improve performances. Lock-free programming also provides a new way of simulating PRAM and asynchronous PRAM algorithms on current parallel machines.
The Euler tour technique and parallel rooted spanning tree
- In Proc. Int’l Conf. on Parallel Processing (ICPP
, 2004
"... Many parallel algorithms for graph problems start with finding a spanning tree and rooting the tree to define some structural relationship on the vertices which can be used by following problem specific computations. The generic procedure is to find an unrooted spanning tree and then root the spanni ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Many parallel algorithms for graph problems start with finding a spanning tree and rooting the tree to define some structural relationship on the vertices which can be used by following problem specific computations. The generic procedure is to find an unrooted spanning tree and then root the spanning tree using the Euler tour technique. With a randomized work-time optimal unrooted spanning tree algorithm and work-time optimal list ranking, finding rooted spanning trees can be done work-time optimally on EREW PRAM w.h.p. Yet the Euler tour technique assumes as “given ” a circular adjacency list, it is not without implications though to construct the circular adjacency list for the spanning tree found on the fly by a spanning tree algorithm. In fact our experiments show that this “hidden ” step of constructing a circular adjacency list could take as much time as both spanning tree and list ranking combined. In this paper we present new efficient algorithms that find rooted spanning trees without using the Euler tour technique and incur little or no overhead over the underlying spanning tree algorithms. We also present two new approaches that construct Euler tours efficiently when the circular adjacency list is not given. One is a deterministic PRAM algorithm and the other is a randomized algorithm in the symmetric multiprocessor (SMP) model. The randomized algorithm takes a novel approach for the problems of constructing the Euler tour and rooting a tree. It computes a rooted spanning tree first, then constructs an Euler tour directly for the tree using depth-first traversal. The tour constructed is cache-friendly with adjacent edges in the
Parallel programming with object assemblies
- In OOPSLA, 2009
"... We present Chorus, a high-level parallel programming model suitable for irregular, heap-manipulating applications like mesh refinement and epidemic simulations, and JChorus, an implementation of the model on top of Java. One goal of Chorus is to express the dynamic and instance-dependent patterns of ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We present Chorus, a high-level parallel programming model suitable for irregular, heap-manipulating applications like mesh refinement and epidemic simulations, and JChorus, an implementation of the model on top of Java. One goal of Chorus is to express the dynamic and instance-dependent patterns of memory access that are common in typical irregular applications. Its other focus is locality of effects: the property that in many of the same applications, typical imperative commands only affect small, local regions in the shared heap. Chorus addresses dynamism and locality through the unifying abstraction of an object assembly: a local region in a shared data structure equipped with a short-lived, speculative thread of control. The thread of control in an assembly can only access objects within the assembly. While objects can migrate from assembly to assembly, such migration is local—i.e., objects only move from one assembly to a neighboring one—and does not lead to aliasing. Programming primitives include a merge operation, by which an assembly merges with an adjacent assembly, and a split operation, which splits an assembly into smaller ones. Our abstractions are race and deadlock-free, and inherently data-centric. We demonstrate that Chorus and JChorus allow natural programming of several important applications exhibiting irregular data-parallelism. We also present an implementation of JChorus based on a many-to-one mapping of assemblies to lower-level threads, and report on preliminary performance numbers.
Techniques for Designing Efficient Parallel Graph Algorithms for SMPs and Multicore Processors
"... Abstract. Graph problems are finding increasing applications in high performance computing disciplines. Although many regular problems can be solved efficiently in parallel, obtaining efficient implementations for irregular graph problems remains a challenge. We propose techniques for designing and ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. Graph problems are finding increasing applications in high performance computing disciplines. Although many regular problems can be solved efficiently in parallel, obtaining efficient implementations for irregular graph problems remains a challenge. We propose techniques for designing and implementing efficient parallel algorithms for graph problems on symmetric multiprocessors and chip multiprocessors with a case study of parallel tree and connectivity algorithms. The problems we study represent a wide range of irregular problems that have fast theoretic parallel algorithms but no known efficient parallel implementations that achieve speedup without serious restricting assumptions about the inputs. We believe our techniques will be of practical impact in solving largescale graph problems.

