Results 1 -
3 of
3
SIMPLE: A methodology for programming high performance algorithms on clusters of symmetric multiprocessors (SMPs
- Journal of Parallel and Distributed Computing
, 1999
"... We describe a methodology for developing high performance programs running on clusters of SMP nodes. Our methodology is based on a small kernel (SIMPLE) of collective communication primitives that make e cient use of the hybrid shared and message passing environment. We illustrate the power of our m ..."
Abstract
-
Cited by 52 (13 self)
- Add to MetaCart
We describe a methodology for developing high performance programs running on clusters of SMP nodes. Our methodology is based on a small kernel (SIMPLE) of collective communication primitives that make e cient use of the hybrid shared and message passing environment. We illustrate the power of our methodology by presenting experimental results for sorting integers, two-dimensional fast Fourier transforms (FFT), and constraint-satis ed searching. Our testbed is a cluster of DEC AlphaServer 2100 4/275 nodes interconnected by anATM switch.
A Practical Parallel Algorithm for Cycle Detection in Partitioned Digraphs
, 1999
"... Graph theoretic techniques are used in variety of important computational problems in the areas of computational physics, mechanics, and fluid flow. We present a new, parallel algorithm for detecting cycles in partitioned, directed graphs that is both scalable in the graph and machine size, and perf ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Graph theoretic techniques are used in variety of important computational problems in the areas of computational physics, mechanics, and fluid flow. We present a new, parallel algorithm for detecting cycles in partitioned, directed graphs that is both scalable in the graph and machine size, and performs well in practice. As an example, on a p = 64 processor cluster, we have solved an extremely large and difficult input problem with n = 2^28 vertices in less than five minutes. Our parallel algorithm uses a new graph representation, called Packed-Intervals, has a theoretical running time for this input of log p+O +O , and achieves good speedup for any n >> p in practice. Our study includes both an efficient parallel algorithm and an experimental study.
An Improved Randomized Selection Algorithm With an Experimental Study
- In Proc. The 2nd Workshop on Algorithm Engineering and Experiments (ALENEX00
, 2000
"... This paper presents an efficient randomized high-level parallel algorithm for finding the median given a set of elements distributed across a parallel machine. In fact, our algorithm solves the general selection problem that requires the determination of the element of rank k, for an arbitrarily giv ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
This paper presents an efficient randomized high-level parallel algorithm for finding the median given a set of elements distributed across a parallel machine. In fact, our algorithm solves the general selection problem that requires the determination of the element of rank k, for an arbitrarily given integer k. Our general...

