Results 1  10
of
114
Programming Parallel Algorithms
, 1996
"... In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a th ..."
Abstract

Cited by 193 (9 self)
 Add to MetaCart
In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a theoretical framework, many are quite efficient in practice or have key ideas that have been used in efficient implementations. This research on parallel algorithms has not only improved our general understanding ofparallelism but in several cases has led to improvements in sequential algorithms. Unf:ortunately there has been less success in developing good languages f:or prograftlftling parallel algorithftls, particularly languages that are well suited for teaching and prototyping algorithms. There has been a large gap between languages
NCApproximation Schemes for NP and PSPACEHard Problems for Geometric Graphs
, 1997
"... We present NC approximation schemes for a number of graph problems when restricted to geometric graphs including unit disk graphs and graphs drawn in a civilized manner. Our approximation schemes exhibit the same time versus performance tradeoff as the best known approximation schemes for planar gr ..."
Abstract

Cited by 93 (1 self)
 Add to MetaCart
We present NC approximation schemes for a number of graph problems when restricted to geometric graphs including unit disk graphs and graphs drawn in a civilized manner. Our approximation schemes exhibit the same time versus performance tradeoff as the best known approximation schemes for planar graphs. We also define the concept of precision unit disk graphs and show that for such graphs the approximation schemes have a better time versus performance tradeoff than the approximation schemes for arbitrary unit disk graphs. Moreover, compared to unit disk graphs, we show that for precision unit disk graphs, many more graph problems have efficient approximation schemes. Our NC approximation schemes can also be extended to obtain efficient NC approximation schemes for several PSPACEhard problems on unit disk graphs specified using a restricted version of the hierarchical specification language of Bentley, Ottmann and Widmayer. The approximation schemes for hierarchically specified un...
Provably efficient scheduling for languages with finegrained parallelism
 IN PROC. SYMPOSIUM ON PARALLEL ALGORITHMS AND ARCHITECTURES
, 1995
"... Many highlevel parallel programming languages allow for finegrained parallelism. As in the popular worktime framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A ..."
Abstract

Cited by 82 (25 self)
 Add to MetaCart
Many highlevel parallel programming languages allow for finegrained parallelism. As in the popular worktime framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A common concern in executing such programs is to schedule tasks to processors dynamically so as to minimize not only the execution time, but also the amount of space (memory) needed. Without careful scheduling, the parallel execution on p processors can use a factor of p or larger more space than a sequential implementation of the same program. This paper first identifies a class of parallel schedules that are provably efficient in both time and space. For any
CommunicationEfficient Parallel Sorting
, 1996
"... We study the problem of sorting n numbers on a pprocessor bulksynchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processortoprocessor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sort ..."
Abstract

Cited by 64 (2 self)
 Add to MetaCart
We study the problem of sorting n numbers on a pprocessor bulksynchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processortoprocessor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sorting methods that use internal computation time that is O( n log n p ) and a number of communication rounds that is O( log n log(h+1) ) for h = \Theta(n=p). The internal computation bound is optimal for any comparisonbased sorting algorithm. Moreover, the number of communication rounds is bounded by a constant for the (practical) situations when p n 1\Gamma1=c for a constant c 1. In fact, we show that our bound on the number of communication rounds is asymptotically optimal for the full range of values for p, for we show that just computing the "or" of n bits distributed evenly to the first O(n=h) of an arbitrary number of processors in a BSP computer requires\Omega\Gammaqui n= log(h...
Hundreds of Impossibility Results for Distributed Computing
 Distributed Computing
, 2003
"... We survey results from distributed computing that show tasks to be impossible, either outright or within given resource bounds, in various models. The parameters of the models considered include synchrony, faulttolerance, different communication media, and randomization. The resource bounds refe ..."
Abstract

Cited by 44 (4 self)
 Add to MetaCart
We survey results from distributed computing that show tasks to be impossible, either outright or within given resource bounds, in various models. The parameters of the models considered include synchrony, faulttolerance, different communication media, and randomization. The resource bounds refer to time, space and message complexity. These results are useful in understanding the inherent difficulty of individual problems and in studying the power of different models of distributed computing.
Can a SharedMemory Model Serve as a Bridging Model for Parallel Computation?
, 1999
"... There has been a great deal of interest recently in the development of generalpurpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style fo ..."
Abstract

Cited by 42 (11 self)
 Add to MetaCart
There has been a great deal of interest recently in the development of generalpurpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style for designing algorithms when compared with the PRAM model. Indeed, while many consider data parallelism as a convenient style, and the sharedmemory abstraction as an easytouse platform, the bandwidth limitations of current machines have diverted much attention to messagepassing and distributedmemory models (such as the BSP and LogP) that account more properly for these limitations. In this paper we consider the question of whether a sharedmemory model can serve as an effective bridging model for parallel computation. In particular, can a sharedmemory model be as effective as, say, the BSP? As a candidate for a bridging model, we introduce the Queuing SharedMemory (QSM) model, which accounts for limited communication bandwidth while still providing a simple sharedmemory abstraction. We substantiate the ability of the QSM to serve as a bridging model by providing a simple workpreserving emulation of the QSM on both the BSP, and on a related model, the (d, x)BSP. We present evidence that the features of the QSM are essential to its effectiveness as a bridging model. In addition, we describe scenarios
On External Memory MST, SSSP and Multiway Planar Graph Separation (Extended Abstract)
, 2000
"... Recently external memory graph algorithms have received considerable attention because massive graphs arise naturally in many applications involving massive data sets. Even though a large number of I/Oefficient graph algorithms have been developed, a number of fundamental problems still remain ..."
Abstract

Cited by 33 (11 self)
 Add to MetaCart
Recently external memory graph algorithms have received considerable attention because massive graphs arise naturally in many applications involving massive data sets. Even though a large number of I/Oefficient graph algorithms have been developed, a number of fundamental problems still remain open. In this paper we develop improved algorithms for the problem of computing a minimum spanning tree of a general graph G = (V; E), as well as new algorithms for the single source shortest paths and the multiway graph separation problems on planar graphs.
A Fast, Parallel Spanning Tree Algorithm for Symmetric Multiprocessors (SMPs) (Extended Abstract)
, 2004
"... Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial probl ..."
Abstract

Cited by 31 (11 self)
 Add to MetaCart
Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial problems that have simple and efficient sequential implementations and fast PRAM algorithms, but often have no known efficient parallel implementations. In this paper we present a new randomized algorithm and implementation with superior performance that for the firsttime achieves parallel speedup on arbitrary graphs (both regular and irregular topologies) when compared with the best sequential implementation for finding a spanning tree. This new algorithm uses several techniques to give an expected running time that scales linearly with the number p of processors for suitably large inputs (n> p 2). As the spanning tree problem is notoriously hard for any parallel implementation to achieve reasonable speedup, our study may shed new light on implementing PRAM algorithms for sharedmemory parallel computers. The main results of this paper are 1. A new and practical spanning tree algorithm for symmetric multiprocessors that exhibits parallel speedups on graphs with regular and irregular topologies; and 2. An experimental study of parallel spanning tree algorithms that reveals the superior performance of our new approach compared with the previous algorithms. The source code for these algorithms is freelyavailable from our web site hpc.ece.unm.edu.
Efficient LowContention Parallel Algorithms
 the 1994 ACM Symp. on Parallel Algorithms and Architectures
, 1994
"... The queueread, queuewrite (qrqw) parallel random access machine (pram) model permits concurrent reading and writing to shared memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. The qrqw pram model reflects the contention prope ..."
Abstract

Cited by 30 (11 self)
 Add to MetaCart
The queueread, queuewrite (qrqw) parallel random access machine (pram) model permits concurrent reading and writing to shared memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. The qrqw pram model reflects the contention properties of most commercially available parallel machines more accurately than either the wellstudied crcw pram or erew pram models, and can be efficiently emulated with only logarithmic slowdown on hypercubetype noncombining networks. This paper describes fast, lowcontention, workoptimal, randomized qrqw pram algorithms for the fundamental problems of load balancing, multiple compaction, generating a random permutation, parallel hashing, and distributive sorting. These logarithmic or sublogarithmic time algorithms considerably improve upon the best known erew pram algorithms for these problems, while avoiding the highcontention steps typical of crcw pram algorithms. An illustrative expe...
Parallel RealTime Optimization: Beyond Speedup
 PARALLEL PROCESSING LETTERS
, 1999
"... Traditionally, interest in parallel computation centered around the speedup provided by parallel algorithms over their sequential counterparts. In this paper, we ask a different type of question: Can parallel computers, due to their speed, do more than simply speed up the solution to a problem? ..."
Abstract

Cited by 27 (25 self)
 Add to MetaCart
Traditionally, interest in parallel computation centered around the speedup provided by parallel algorithms over their sequential counterparts. In this paper, we ask a different type of question: Can parallel computers, due to their speed, do more than simply speed up the solution to a problem? We show that for realtime optimization problems, a parallel computer can obtain a solution that is better than that obtained by a sequential one. Specifically, a sequential and a parallel algorithm are exhibited for the problem of computing the bestpossible approximation to the minimumweight spanning tree of a connected, undirected and weighted graph whose vertices and edges are not all available at the outset, but instead arrive in real time. While the parallel algorithm succeeds in computing the exact minimumweight spanning tree, the sequential algorithm can only manage to obtain an approximate solution. In the worst case, the ratio of the weight of the solution obtained seque...