Results 1 
3 of
3
SIMPLE: A methodology for programming high performance algorithms on clusters of symmetric multiprocessors (SMPs
 Journal of Parallel and Distributed Computing
, 1999
"... We describe a methodology for developing high performance programs running on clusters of SMP nodes. Our methodology is based on a small kernel (SIMPLE) of collective communication primitives that make e cient use of the hybrid shared and message passing environment. We illustrate the power of our m ..."
Abstract

Cited by 53 (13 self)
 Add to MetaCart
We describe a methodology for developing high performance programs running on clusters of SMP nodes. Our methodology is based on a small kernel (SIMPLE) of collective communication primitives that make e cient use of the hybrid shared and message passing environment. We illustrate the power of our methodology by presenting experimental results for sorting integers, twodimensional fast Fourier transforms (FFT), and constraintsatis ed searching. Our testbed is a cluster of DEC AlphaServer 2100 4/275 nodes interconnected by anATM switch.
A Practical Parallel Algorithm for Cycle Detection in Partitioned Digraphs
, 1999
"... Graph theoretic techniques are used in variety of important computational problems in the areas of computational physics, mechanics, and fluid flow. We present a new, parallel algorithm for detecting cycles in partitioned, directed graphs that is both scalable in the graph and machine size, and perf ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Graph theoretic techniques are used in variety of important computational problems in the areas of computational physics, mechanics, and fluid flow. We present a new, parallel algorithm for detecting cycles in partitioned, directed graphs that is both scalable in the graph and machine size, and performs well in practice. As an example, on a p = 64 processor cluster, we have solved an extremely large and difficult input problem with n = 2^28 vertices in less than five minutes. Our parallel algorithm uses a new graph representation, called PackedIntervals, has a theoretical running time for this input of log p+O +O , and achieves good speedup for any n >> p in practice. Our study includes both an efficient parallel algorithm and an experimental study.
An Improved Randomized Selection Algorithm With an Experimental Study
 In Proc. The 2nd Workshop on Algorithm Engineering and Experiments (ALENEX00
, 2000
"... This paper presents an efficient randomized highlevel parallel algorithm for finding the median given a set of elements distributed across a parallel machine. In fact, our algorithm solves the general selection problem that requires the determination of the element of rank k, for an arbitrarily giv ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
This paper presents an efficient randomized highlevel parallel algorithm for finding the median given a set of elements distributed across a parallel machine. In fact, our algorithm solves the general selection problem that requires the determination of the element of rank k, for an arbitrarily given integer k. Our general...