Results 1  10
of
22
Programming Parallel Algorithms
, 1996
"... In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a th ..."
Abstract

Cited by 193 (9 self)
 Add to MetaCart
In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a theoretical framework, many are quite efficient in practice or have key ideas that have been used in efficient implementations. This research on parallel algorithms has not only improved our general understanding ofparallelism but in several cases has led to improvements in sequential algorithms. Unf:ortunately there has been less success in developing good languages f:or prograftlftling parallel algorithftls, particularly languages that are well suited for teaching and prototyping algorithms. There has been a large gap between languages
A Fast, Parallel Spanning Tree Algorithm for Symmetric Multiprocessors (SMPs) (Extended Abstract)
, 2004
"... Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial probl ..."
Abstract

Cited by 31 (11 self)
 Add to MetaCart
Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial problems that have simple and efficient sequential implementations and fast PRAM algorithms, but often have no known efficient parallel implementations. In this paper we present a new randomized algorithm and implementation with superior performance that for the firsttime achieves parallel speedup on arbitrary graphs (both regular and irregular topologies) when compared with the best sequential implementation for finding a spanning tree. This new algorithm uses several techniques to give an expected running time that scales linearly with the number p of processors for suitably large inputs (n> p 2). As the spanning tree problem is notoriously hard for any parallel implementation to achieve reasonable speedup, our study may shed new light on implementing PRAM algorithms for sharedmemory parallel computers. The main results of this paper are 1. A new and practical spanning tree algorithm for symmetric multiprocessors that exhibits parallel speedups on graphs with regular and irregular topologies; and 2. An experimental study of parallel spanning tree algorithms that reveals the superior performance of our new approach compared with the previous algorithms. The source code for these algorithms is freelyavailable from our web site hpc.ece.unm.edu.
A Parallel Algorithm for Computing Minimum Spanning Trees
, 1992
"... We present a simple and implementable algorithm that computes a minimum spanning tree of an undirected weighted graph G = (V, E) of n = V vertices and m = E edges on an EREW PRAM in O(log 3=2 n) time using n+m processors. This represents a substantial improvement in the running time over the ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
We present a simple and implementable algorithm that computes a minimum spanning tree of an undirected weighted graph G = (V, E) of n = V vertices and m = E edges on an EREW PRAM in O(log 3=2 n) time using n+m processors. This represents a substantial improvement in the running time over the previous results for this problem using at the same time the weakest of the PRAM models. It also implies the existence of algorithms having the same complexity bounds for the EREW PRAM, for connectivity, ear decomposition, biconnectivity, strong orientation, stnumbering and Euler tours problems.
Parallel RealTime Optimization: Beyond Speedup
 PARALLEL PROCESSING LETTERS
, 1999
"... Traditionally, interest in parallel computation centered around the speedup provided by parallel algorithms over their sequential counterparts. In this paper, we ask a different type of question: Can parallel computers, due to their speed, do more than simply speed up the solution to a problem? ..."
Abstract

Cited by 27 (25 self)
 Add to MetaCart
Traditionally, interest in parallel computation centered around the speedup provided by parallel algorithms over their sequential counterparts. In this paper, we ask a different type of question: Can parallel computers, due to their speed, do more than simply speed up the solution to a problem? We show that for realtime optimization problems, a parallel computer can obtain a solution that is better than that obtained by a sequential one. Specifically, a sequential and a parallel algorithm are exhibited for the problem of computing the bestpossible approximation to the minimumweight spanning tree of a connected, undirected and weighted graph whose vertices and edges are not all available at the outset, but instead arrive in real time. While the parallel algorithm succeeds in computing the exact minimumweight spanning tree, the sequential algorithm can only manage to obtain an approximate solution. In the worst case, the ratio of the weight of the solution obtained seque...
Parallel Implementation of Algorithms for Finding Connected Components in Graphs
, 1997
"... In this paper, we describe our implementation of several parallel graph algorithms for finding connected components. Our implementation, with virtual processing, is on a 16,384processor MasPar MP1 using the language MPL. We present extensive test data on our code. In our previous projects [21, 22, ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
In this paper, we describe our implementation of several parallel graph algorithms for finding connected components. Our implementation, with virtual processing, is on a 16,384processor MasPar MP1 using the language MPL. We present extensive test data on our code. In our previous projects [21, 22, 23], we reported the implementation of an extensible parallel graph algorithms library. We developed general implementation and finetuning techniques without expending too much effort on optimizing each individual routine. We also handled the issue of implementing virtual processing. In this paper, we describe several algorithms and finetuning techniques that we developed for the problem of finding connected components in parallel; many of the finetuning techniques are of general interest, and should be applicable to code for other problems. We present data on the execution time and memory usage of our various implementations.
Practical Parallel Algorithms for Minimum Spanning Trees
 In Workshop on Advances in Parallel and Distributed Systems
, 1998
"... We study parallel algorithms for computing the minimum spanning tree of a weighted undirected graph G with n vertices and m edges. We consider an input graph G with m=n p, where p is the number of processors. For this case, we show that simple algorithms with dataindependent communication patterns ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
We study parallel algorithms for computing the minimum spanning tree of a weighted undirected graph G with n vertices and m edges. We consider an input graph G with m=n p, where p is the number of processors. For this case, we show that simple algorithms with dataindependent communication patterns are efficient, both in theory and in practice. The algorithms are evaluated theoretically using Valiant's BSP model of parallel computation and empirically through implementation results.
Connected Components in O(log 3/2 n) Parallel Time for the CREW PRAM
"... Finding the connected components of an undirected graph G = (V; E) on n = jV j vertices and m = jEj edges is a fundamental computational problem. The best known parallel algorithm for the CREW PRAM model runs in O(log 2 n) time using n 2 = log 2 n processors [6, 15]. For the CRCW PRAM model, ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
Finding the connected components of an undirected graph G = (V; E) on n = jV j vertices and m = jEj edges is a fundamental computational problem. The best known parallel algorithm for the CREW PRAM model runs in O(log 2 n) time using n 2 = log 2 n processors [6, 15]. For the CRCW PRAM model, in which concurrent writing is permitted, the best known algorithm runs in O(log n) time using slightly more than (n +m)= log n processors [26, 9, 5]. Simulating this algorithm on the weaker CREW model increases its running time to O(log 2 n) [10, 19, 29]. We present here a simple algorithm that runs in O(log 3=2 n) time using n +m CREW processors. Finding an o(log 2 n) parallel connectivity algorithm for this model was an open problem for many years. 1 Introduction Let G = (V; E) be an undirected graph on n = jV j vertices and m = jEj edges. A path p of length k is a sequence of edges (e 1 ; \Delta \Delta \Delta ; e i ; \Delta \Delta \Delta ; e k ) such that e i 2 E for i = 1; \...
On the architectural requirements for efficient execution of graph algorithms
 In Proc. 34th Int’l Conf. on Parallel Processing (ICPP
, 2005
"... Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to noncontiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, conti ..."
Abstract

Cited by 15 (7 self)
 Add to MetaCart
Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to noncontiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, contiguous memory accesses, and so are inefficient platforms for such algorithms. Few parallel graph algorithms outperform their best sequential implementation on SMP clusters due to long memory latencies and high synchronization costs. In this paper, we consider the performance and scalability of two graph algorithms, list ranking and connected components, on two classes of sharedmemory computers: symmetric multiprocessors such as the Sun Enterprise servers and multithreaded architectures