Results 1 - 10
of
11
A Fast, Parallel Spanning Tree Algorithm for Symmetric Multiprocessors (SMPs) (Extended Abstract)
, 2004
"... Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial probl ..."
Abstract
-
Cited by 27 (11 self)
- Add to MetaCart
Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial problems that have simple and efficient sequential implementations and fast PRAM algorithms, but often have no known efficient parallel implementations. In this paper we present a new randomized algorithm and implementation with superior performance that for the first-time achieves parallel speedup on arbitrary graphs (both regular and irregular topologies) when compared with the best sequential implementation for finding a spanning tree. This new algorithm uses several techniques to give an expected running time that scales linearly with the number p of processors for suitably large inputs (n> p 2). As the spanning tree problem is notoriously hard for any parallel implementation to achieve reasonable speedup, our study may shed new light on implementing PRAM algorithms for shared-memory parallel computers. The main results of this paper are 1. A new and practical spanning tree algorithm for symmetric multiprocessors that exhibits parallel speedups on graphs with regular and irregular topologies; and 2. An experimental study of parallel spanning tree algorithms that reveals the superior performance of our new approach compared with the previous algorithms. The source code for these algorithms is freely-available from our web site hpc.ece.unm.edu.
Evaluating Arithmetic Expressions Using Tree Contraction: A Fast and Scalable Parallel Implementation for Symmetric Multiprocessors (SMPs)
- Proc. 9th Int’l Conf. on High Performance Computing (HiPC 2002), volume 2552 of Lecture Notes in Computer Science
, 2002
"... The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer to the ideal PRAM parallel computer. In this paper, we develop new techniques for designing a uniform shared-memory algorithm from a PRAM algorithm and present the res ..."
Abstract
-
Cited by 23 (8 self)
- Add to MetaCart
The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer to the ideal PRAM parallel computer. In this paper, we develop new techniques for designing a uniform shared-memory algorithm from a PRAM algorithm and present the results of an extensive experimental study demonstrating that the resulting programs scale nearly linearly across a significant range of processors and across the entire range of instance sizes tested. This linear speedup with the number of processors is one of the first ever attained in practice for intricate combinatorial problems. The example we present in detail here is for evaluating arithmetic expression trees using the algorithmic techniques of list ranking and tree contraction; this problem is not only of interest in its own right, but is representativeof a large class of irregular combinatorial problems that have simple and efficient sequential implementations and fast PRAM algorithms, but have no known efficient parallel implementations. Our results thus offer promise for bridging the gap between the theory and practice of shared-memory parallel algorithms.
Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors
- The 12th International Conference on High Performance Computing (HiPC 2005)
, 2005
"... Graph theoretic problems are representative of fundamental computations in traditional and emerging scientific disciplines like scientific computing and computational biology, as well as applications in national security. We present our design and implementation of a graph theory application that su ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
Graph theoretic problems are representative of fundamental computations in traditional and emerging scientific disciplines like scientific computing and computational biology, as well as applications in national security. We present our design and implementation of a graph theory application that supports the kernels from the Scalable Synthetic Compact Applications (SSCA) benchmark suite, developed under the DARPA High Productivity Computing Systems (HPCS) program. This synthetic benchmark consists of four kernels that require irregular access to a large, directed, weighted multi-graph. We have developed a parallel implementation of this benchmark in C using the POSIX thread library for commodity symmetric multiprocessors (SMPs). In this paper, we primarily discuss the data layout choices and algorithmic design issues for each kernel, and also present execution time and benchmark validation results.
On the architectural requirements for efficient execution of graph algorithms
- In Proc. 34th Int’l Conf. on Parallel Processing (ICPP
, 2005
"... Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to non-contiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, conti ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to non-contiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, contiguous memory accesses, and so are inefficient platforms for such algorithms. Few parallel graph algorithms outperform their best sequential implementation on SMP clusters due to long memory latencies and high synchronization costs. In this paper, we consider the performance and scalability of two graph algorithms, list ranking and connected components, on two classes of sharedmemory computers: symmetric multiprocessors such as the Sun Enterprise servers and multithreaded architectures
Parallel Shortest Path Algorithms for Solving . . .
, 2006
"... We present an experimental study of the single source shortest path problem with non-negative edge weights (NSSP) on large-scale graphs using the ∆-stepping parallel algorithm. We report performance results on the Cray MTA-2, a multithreaded parallel computer. The MTA-2 is a high-end shared memory s ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
We present an experimental study of the single source shortest path problem with non-negative edge weights (NSSP) on large-scale graphs using the ∆-stepping parallel algorithm. We report performance results on the Cray MTA-2, a multithreaded parallel computer. The MTA-2 is a high-end shared memory system offering two unique features that aid the efficient parallel implementation of irregular algorithms: the ability to exploit fine-grained parallelism, and low-overhead synchronization primitives. Our implementation exhibits remarkable parallel speedup when compared with competitive sequential algorithms, for low-diameter sparse graphs. For instance, ∆-stepping on a directed scale-free graph of 100 million vertices and 1 billion edges takes less than ten seconds on 40 processors of the MTA-2, with a relative speedup of close to 30. To our knowledge, these are the first performance results of a shortest path problem on realistic graph instances in the order of billions of vertices and edges.
An experimental study of a parallel shortest path algorithm for solving large-scale graph instances
- Ninth Workshop on Algorithm Engineering and Experiments (ALENEX 2007)
, 2007
"... We present an experimental study of the single source shortest path problem with non-negative edge weights (NSSP) on large-scale graphs using the $\Delta$-stepping parallel algorithm. We report performance results on the Cray MTA-2, a multithreaded parallel computer. The MTA-2 is a high-end shared m ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
We present an experimental study of the single source shortest path problem with non-negative edge weights (NSSP) on large-scale graphs using the $\Delta$-stepping parallel algorithm. We report performance results on the Cray MTA-2, a multithreaded parallel computer. The MTA-2 is a high-end shared memory system offering two unique features that aid the efficient parallel implementation of irregular algorithms: the ability to exploit fine-grained parallelism, and low-overhead synchronization primitives. Our implementation exhibits remarkable parallel speedup when compared with competitive sequential algorithms, for low-diameter sparse graphs. For instance, $\Delta$-stepping on a directed scale-free graph of 100 million vertices and 1 billion edges takes less than ten seconds on 40 processors of the MTA-2, with a relative speedup of close to 30. To our knowledge, these are the first performance results of a shortest path problem on realistic graph instances in the order of billions of vertices and edges.
SWARM: A Parallel Programming Framework for Multicore Processors
, 2007
"... Due to fundamental physical limitations and power constraints, we are witnessing a radical change in commodity microprocessor architectures to multicore designs. Continued performance on multicore processors now requires the exploitation of concurrency at the algorithmic level. In this paper, we ide ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Due to fundamental physical limitations and power constraints, we are witnessing a radical change in commodity microprocessor architectures to multicore designs. Continued performance on multicore processors now requires the exploitation of concurrency at the algorithmic level. In this paper, we identify key issues in algorithm design for multicore processors and propose a computational model for these systems. We introduce SWARM (SoftWare and Algorithms for Running on Multi-core), a portable open-source parallel library of basic primitives that fully exploit multicore processors. Using this framework, we have implemented efficient parallel algorithms for important primitive operations such as prefixsums, pointer-jumping, symmetry breaking, and list ranking; for combinatorial problems such as sorting and selection; for parallel graph theoretic algorithms such as spanning tree, minimum spanning tree, graph decomposition, and tree contraction; and for computational genomics applications such as maximum parsimony. The main contributions of this paper are the design of the SWARM multicore framework, the presentation of a multicore algorithmic model, and validation results for this model. SWARM is freely available as open-source from
Computational grand challenges in assembling the tree of life: Problems and solutions
- The IEEE and ACM Supercomputing Conference 2005 (SC2005) Tutorial
, 2005
"... Abstract. The computation of ever larger as well as more accurate phylogenetic (evolutionary) trees with the ultimate goal to compute the tree of life represents one of the grand challenges in High Performance Computing (HPC) Bioinformatics. Unfortunately, the size of trees which can be computed in ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. The computation of ever larger as well as more accurate phylogenetic (evolutionary) trees with the ultimate goal to compute the tree of life represents one of the grand challenges in High Performance Computing (HPC) Bioinformatics. Unfortunately, the size of trees which can be computed in reasonable time based on elaborate evolutionary models is limited by the severe computational cost inherent to these methods. There exist two orthogonal research directions to overcome this challenging computational burden: First, the development of novel, faster, and more accurate heuristic algorithms and second, the application of high performance computing techniques. The goal of this chapter is to provide a comprehensive introduction to the field of computational evolutionary biology to an audience with computing background, interested in participating in research and/or commercial applications of this field. Moreover, we will cover leading-edge technical and algorithmic developments in the field and discuss open problems and potential solutions.
High-performance algorithm engineering for large-scale graph problems and computational biology
- In 4th International Workshop on Efficient and Experimental Algorithms
, 2005
"... Abstract. Many large-scale optimization problems rely on graph theoretic solutions; yet high-performance computing has traditionally focused on regular applications with high degrees of locality. We describe our novel methodology for designing and implementing irregular parallel algorithms that atta ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. Many large-scale optimization problems rely on graph theoretic solutions; yet high-performance computing has traditionally focused on regular applications with high degrees of locality. We describe our novel methodology for designing and implementing irregular parallel algorithms that attain significant performance on high-end computer systems. Our results for several fundamental graph theory problems are the first ever to achieve parallel speedups. Specifically, we have demonstrated for the first time that significant parallel speedups are attainable for arbitrary instances of a variety of graph problems and are developing a library of fundamental routines for discrete optimization (especially in computational biology) on shared-memory systems. Phylogenies derived from gene order data may prove crucial in answering some fundamental questions in biomolecular evolution. Highperformance algorithm engineering offers a battery of tools that can reduce, sometimes spectacularly, the running time of existing approaches.

