Results 1  10
of
40
NESL: A nested dataparallel language (version 2.6
, 1993
"... The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Wright Laboratory or the U. S. Government. Keywords: Dataparallel, parallel algorithms, supe ..."
Abstract

Cited by 95 (7 self)
 Add to MetaCart
The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Wright Laboratory or the U. S. Government. Keywords: Dataparallel, parallel algorithms, supercomputers, nested parallelism, This report describes Nesl, a stronglytyped, applicative, dataparallel language. Nesl is intended to be used as a portable interface for programming a variety of parallel and vector computers, and as a basis for teaching parallel algorithms. Parallelism is supplied through a simple set of dataparallel constructs based on sequences, including a mechanism for applying any function over the elements of a sequence in parallel and a rich set of parallel functions that manipulate sequences. Nesl fully supports nested sequences and nested parallelism—the ability to take a parallel function and apply it over multiple instances in parallel. Nested parallelism is important for implementing algorithms with irregular nested loops (where the inner loop lengths depend on the outer iteration) and for divideandconquer algorithms. Nesl also provides a performance model for calculating the asymptotic performance of a program on
Deterministic Sorting and Randomized Median Finding on the BSP model
, 1996
"... We present new BSP algorithms for deterministic sorting and randomized median finding. We sort n general keys by using a partitioning scheme that achieves the requirements of efficiency (oneoptimality) and insensitivity against data skew (the accuracy of the splitting keys depends solely on the ste ..."
Abstract

Cited by 48 (23 self)
 Add to MetaCart
We present new BSP algorithms for deterministic sorting and randomized median finding. We sort n general keys by using a partitioning scheme that achieves the requirements of efficiency (oneoptimality) and insensitivity against data skew (the accuracy of the splitting keys depends solely on the step distance, which can be adapted to meet the worstcase requirements of our application). Although we employ sampling in order to realize efficiency, we can give a precise worstcase estimation of the maximum imbalance which might occur. We also investigate optimal randomized BSP algorithms for the problem of finding the median of n elements that require, with highprobability, 3n=(2p) + o(n=p) number of comparisons, for a wide range of values of n and p. Experimental results for the two algorithms are also presented.
Efficient LowContention Parallel Algorithms
 the 1994 ACM Symp. on Parallel Algorithms and Architectures
, 1994
"... The queueread, queuewrite (qrqw) parallel random access machine (pram) model permits concurrent reading and writing to shared memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. The qrqw pram model reflects the contention prope ..."
Abstract

Cited by 31 (12 self)
 Add to MetaCart
The queueread, queuewrite (qrqw) parallel random access machine (pram) model permits concurrent reading and writing to shared memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. The qrqw pram model reflects the contention properties of most commercially available parallel machines more accurately than either the wellstudied crcw pram or erew pram models, and can be efficiently emulated with only logarithmic slowdown on hypercubetype noncombining networks. This paper describes fast, lowcontention, workoptimal, randomized qrqw pram algorithms for the fundamental problems of load balancing, multiple compaction, generating a random permutation, parallel hashing, and distributive sorting. These logarithmic or sublogarithmic time algorithms considerably improve upon the best known erew pram algorithms for these problems, while avoiding the highcontention steps typical of crcw pram algorithms. An illustrative expe...
Implementations of Randomized Sorting on Large Parallel Machines
"... Flashsort [RV83,86] and Samplesort [HC83] are related parallel sorting algorithms proposed in the literature. Both utilize a sophisticated randomized sampling technique to form a splitter set, but Samplesort distributes the splitter set to each processor while Flashsort uses splitterdirected routin ..."
Abstract

Cited by 28 (1 self)
 Add to MetaCart
Flashsort [RV83,86] and Samplesort [HC83] are related parallel sorting algorithms proposed in the literature. Both utilize a sophisticated randomized sampling technique to form a splitter set, but Samplesort distributes the splitter set to each processor while Flashsort uses splitterdirected routing. In this
Efficient Parallel Algorithms for Computing All Pair Shortest Paths in Directed Graphs
, 1997
"... . We present parallel algorithms for computing all pair shortest paths in directed graphs. Our algorithm has time complexity O( f (n)/p + I (n) log n) on the PRAM using p processors, where I (n) is log n on the EREW PRAM, log log n on the CCRW PRAM, f (n) is o(n 3 ). On the randomized CRCW PRAM we a ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
. We present parallel algorithms for computing all pair shortest paths in directed graphs. Our algorithm has time complexity O( f (n)/p + I (n) log n) on the PRAM using p processors, where I (n) is log n on the EREW PRAM, log log n on the CCRW PRAM, f (n) is o(n 3 ). On the randomized CRCW PRAM we are able to achieve time complexity O(n 3 /p + log n) using p processors. Key Words. Analysis of algorithms, Design of algorithms, Parallel algorithms, Graph algorithms, Shortest path. 1. Introduction. A number of known algorithms compute the all pair shortest paths in graphs and digraphs with n vertices by using O(n 3 ) operations [D], [Fl], [J]. All these algorithms, however, use at least n1 recursive steps in the worst case and thus require at least the order of n time in their parallel implementation, even if the number of available processors is not bounded. O(n) time and n 2 processor bounds can indeed be achieved, for instance, in the straightforward parallelization of th...
The QueueRead QueueWrite Asynchronous PRAM Model
 EuroPar'96 Parallel Processing, Lecture Notes in Computer Science
, 1998
"... This paper presents results for the queueread, queuewrite asynchronous parallel random access machine (qrqw asynchronous pram) model, which is the asynchronous variant of the qrqw pram model. The qrqw pram family of models, which was introduced earlier by the authors, permit concurrent reading ..."
Abstract

Cited by 23 (8 self)
 Add to MetaCart
This paper presents results for the queueread, queuewrite asynchronous parallel random access machine (qrqw asynchronous pram) model, which is the asynchronous variant of the qrqw pram model. The qrqw pram family of models, which was introduced earlier by the authors, permit concurrent reading and writing to shared memory locations, but each memory location is viewed as having a queue which can service at most one request at a time. In the basic qrqw pram model each processor executes a series of reads to shared memory locations, a series of local computation steps, and a series of writes to shared memory locations, and then synchronizes with all other processors; thus this can be viewed as a bulksynchronous model. In contrast, in the qrqw asynchronous pram model discussed in this paper, there is no imposed bulksynchronization between processors, and each processor proceeds at its own pace. Thus, the qrqw asynchronous pram serves as a better model for designing and analyz...
Derivation of Randomized Sorting and Selection Algorithms, in Parallel Algorithm Derivation And Program Transformation, edited by
, 1993
"... In this paper we systematically derive randomized algorithms (both sequential and parallel) for sorting and selection from basic principles and fundamental techniques like random sampling. We prove several sampling lemmas which will find independent applications. The new algorithms derived here are ..."
Abstract

Cited by 22 (18 self)
 Add to MetaCart
In this paper we systematically derive randomized algorithms (both sequential and parallel) for sorting and selection from basic principles and fundamental techniques like random sampling. We prove several sampling lemmas which will find independent applications. The new algorithms derived here are the most efficient known. From among other results, we have an efficient algorithm for sequential sorting. The problem of sorting has attracted so much attention because of its vital importance. Sorting with as few comparisons as possible while keeping the storage size minimum is a long standing open problem. This problem is referred to as ‘the minimum storage sorting ’ [10] in the literature. The previously best known minimum storage sorting algorithm is due to Frazer and McKellar [10]. The expected number of comparisons made by this algorithm is n log n + O(n log log n). The algorithm we derive in this paper makes only an expected n log n + O(n ω(n)) number of comparisons, for any function ω(n) that tends to infinity. A variant of this algorithm makes no more than n log n + O(n log log n) comparisons on any input of size n with overwhelming probability. We also prove high probability bounds for several randomized algorithms for which only expected bounds have been proven so far.
Ultrafast expected time parallel algorithms
 Proc. of the 2nd SODA
, 1991
"... It has been shown previously that sorting n items into n locations with a polynomial number of processors requires Ω(log n/log log n) time. We sidestep this lower bound with the idea of Padded Sorting, or sorting n items into n + o(n) locations. Since many problems do not rely on the exact rank of s ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
It has been shown previously that sorting n items into n locations with a polynomial number of processors requires Ω(log n/log log n) time. We sidestep this lower bound with the idea of Padded Sorting, or sorting n items into n + o(n) locations. Since many problems do not rely on the exact rank of sorted items, a Padded Sort is often just as useful as an unpadded sort. Our algorithm for Padded Sort runs on the Tolerant CRCW PRAM and takes Θ(log log n/log log log n) expected time using n log log log n/log log n processors, assuming the items are taken from a uniform distribution. Using similar techniques we solve some computational geometry problems, including Voronoi Diagram, with the same processor and time bounds, assuming points are taken from a uniform distribution in the unit square. Further, we present an Arbitrary CRCW PRAM algorithm to solve the Closest Pair problem in constant expected time with n processors regardless of the distribution of points. All of these algorithms achieve linear speedup in expected time over their optimal serial counterparts. 1 Research done while at the University of Michigan and supported by an AT&T Fellowship.
Communication Efficient Data Structures on the BSP model with Applications
 IN PROCEEDINGS OF EUROPAR'96
, 1996
"... The implementation of data structures on distributed memory models such as the BulkSynchronous Parallel (BSP) model, rather than shared memory ones such as the Parallel Random Access Machine (PRAM), offers a serious challenge. In this work we undertake the architecture independent study of the comp ..."
Abstract

Cited by 18 (8 self)
 Add to MetaCart
The implementation of data structures on distributed memory models such as the BulkSynchronous Parallel (BSP) model, rather than shared memory ones such as the Parallel Random Access Machine (PRAM), offers a serious challenge. In this work we undertake the architecture independent study of the computation and communication requirements of searching ordered hlevel graphs, which include many of the standard data structures. We propose multiway search as a general tool for the design, analysis and implementation of BSP algorithms. This technique allows elegant highlevel design and analysis of algorithms, using data structures similar to those of sequential models. Applications to computational geometry and sorting are also presented. In particular, our new randomized sorting algorithm improves previously known BSP randomized sorting algorithms upon the amount of parallel slackness required to achieve optimality. Moreover, our methods are within a 1 + o(1) multiplicative factor of the ...
Desnakification of Mesh Sorting Algorithms
 Proc. 2nd European Symp. on Algorithms, LNCS 855
, 1994
"... In all recent nearoptimal sorting algorithms for meshes, the packets are sorted with respect to some snakelike indexing. In this paper we present deterministic algorithms for sorting with respect to the more natural rowmajor indexing. For 11 sorting on an n \Theta n mesh, we give an algorithm t ..."
Abstract

Cited by 15 (11 self)
 Add to MetaCart
In all recent nearoptimal sorting algorithms for meshes, the packets are sorted with respect to some snakelike indexing. In this paper we present deterministic algorithms for sorting with respect to the more natural rowmajor indexing. For 11 sorting on an n \Theta n mesh, we give an algorithm that runs in 2\Deltan+o(n) steps, matching the distance bound, with maximal queue size five. It is considerably simpler than earlier algorithms. Another algorithm performs kk sorting in k \Delta n=2 + o(k \Delta n) steps, matching the bisection bound. Furthermore, we present uniaxial algorithms for rowmajor sorting. Uniaxial algorithms have clear practical and theoretical advantages over biaxial algorithms. We show that 11 sorting can be performed in 2 1 = 2 \Delta n + o(n) steps. Alternatively, this problem is solved with maximal queue size five in 4 1 = 3 \Delta n steps, without any additional terms. For practically important values of n, this algorithm is much faster than any alg...