Results 1  10
of
25
Implementation of a Portable Nested DataParallel Language
 Journal of Parallel and Distributed Computing
, 1994
"... This paper gives an overview of the implementation of Nesl, a portable nested dataparallel language. This language and its implementation are the first to fully support nested data structures as well as nested dataparallel function calls. These features allow the concise description of parallel alg ..."
Abstract

Cited by 182 (27 self)
 Add to MetaCart
This paper gives an overview of the implementation of Nesl, a portable nested dataparallel language. This language and its implementation are the first to fully support nested data structures as well as nested dataparallel function calls. These features allow the concise description of parallel algorithms on irregular data, such as sparse matrices and graphs. In addition, they maintain the advantages of dataparallel languages: a simple programming model and portability. The current Nesl implementation is based on an intermediate language called Vcode and a library of vector routines called Cvl. It runs on the Connection Machine CM2, the Cray YMP C90, and serial machines. We compare initial benchmark results of Nesl with those of machinespecific code on these machines for three algorithms: leastsquares linefitting, median finding, and a sparsematrix vector product. These results show that Nesl's performance is competitive with that of machinespecific codes for regular dense da...
NESL: A Nested DataParallel Language
 CARNEGIE MELLON UNIVERSITY
, 1992
"... This report describes NESL, a stronglytyped, applicative, dataparallel language. NESL is intended to be used as a portable interface for programming a variety of parallel and vector supercomputers, and as a basis for teaching parallel algorithms. Parallelism is supplied through a simple set of dat ..."
Abstract

Cited by 144 (4 self)
 Add to MetaCart
This report describes NESL, a stronglytyped, applicative, dataparallel language. NESL is intended to be used as a portable interface for programming a variety of parallel and vector supercomputers, and as a basis for teaching parallel algorithms. Parallelism is supplied through a simple set of dataparallel constructs based on vectors, including a mechanism for applying any function over the elements of a vector in parallel, and a broad set of parallel functions that manipulate vectors. NESL fully supports nested vectors and nested parallelismthe ability to take a parallel function and then apply it over multiple instances in parallel. Nested parallelism is important for implementing algorithms with complex and dynamically changing data structures, such as required in many graph or sparse matrix algorithms. NESL also provides a mechanism for calculating the asymptotic running time for a program on various parallel machine models, including the parallel random access machine (PRAM).
Geometric Mesh Partitioning: Implementation and Experiments
"... We investigate a method of dividing an irregular mesh into equalsized pieces with few interconnecting edges. The method’s novel feature is that it exploits the geometric coordinates of the mesh vertices. It is based on theoretical work of Miller, Teng, Thurston, and Vavasis, who showed that certain ..."
Abstract

Cited by 105 (19 self)
 Add to MetaCart
We investigate a method of dividing an irregular mesh into equalsized pieces with few interconnecting edges. The method’s novel feature is that it exploits the geometric coordinates of the mesh vertices. It is based on theoretical work of Miller, Teng, Thurston, and Vavasis, who showed that certain classes of “wellshaped” finite element meshes have good separators. The geometric method is quite simple to implement: we describe a Matlab code for it in some detail. The method is also quite efficient and effective: we compare it with some other methods, including spectral bisection.
NESL: A nested dataparallel language (version 2.6
, 1993
"... The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Wright Laboratory or the U. S. Government. Keywords: Dataparallel, parallel algorithms, supe ..."
Abstract

Cited by 97 (7 self)
 Add to MetaCart
The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Wright Laboratory or the U. S. Government. Keywords: Dataparallel, parallel algorithms, supercomputers, nested parallelism, This report describes Nesl, a stronglytyped, applicative, dataparallel language. Nesl is intended to be used as a portable interface for programming a variety of parallel and vector computers, and as a basis for teaching parallel algorithms. Parallelism is supplied through a simple set of dataparallel constructs based on sequences, including a mechanism for applying any function over the elements of a sequence in parallel and a rich set of parallel functions that manipulate sequences. Nesl fully supports nested sequences and nested parallelism—the ability to take a parallel function and apply it over multiple instances in parallel. Nested parallelism is important for implementing algorithms with irregular nested loops (where the inner loop lengths depend on the outer iteration) and for divideandconquer algorithms. Nesl also provides a performance model for calculating the asymptotic performance of a program on
Tardiness bounds under global edf scheduling on a multiprocessor
 In RTSS ’05: Proceedings of the 26th IEEE International RealTime Systems Symposium
, 2005
"... This paper considers the scheduling of soft realtime sporadic task systems under global EDF on an identical multiprocessor. Though Pfair scheduling is theoretically optimal for hard realtime task systems on multiprocessors, it can incur signicant runtime overhead. Hence, other scheduling algorit ..."
Abstract

Cited by 46 (33 self)
 Add to MetaCart
This paper considers the scheduling of soft realtime sporadic task systems under global EDF on an identical multiprocessor. Though Pfair scheduling is theoretically optimal for hard realtime task systems on multiprocessors, it can incur signicant runtime overhead. Hence, other scheduling algorithms that are not optimal, including EDF, have continued to receive considerable attention. However, prior research on such algorithms has focussed mostly on hard realtime systems, where, to ensure that all deadlines are met, approximately 50 % of the available processing capacity will have to be sacriced in the worst case. This may be overkill for soft realtime systems that can tolerate deadline misses by bounded amounts (i.e., bounded tardiness). In this paper, we derive tardiness bounds under preemptive and nonpreemptive global EDF on multiprocessors when the total utilization of a task system is not restricted and may equal the number of processors. Our tardiness bounds depend on pertask utilizations and execution costs the lower these values, the lower the tardiness bounds. As a nal remark, we note that global EDF may be superior to partitioned EDF for multiprocessorbased soft realtime systems in that the latter does not offer any scope to improve system utilization even if bounded tardiness can be tolerated. ¤Work supported by NSF grants CCR 0204312, CCR 0309825, and CCR 0408996. The rst author was also supported by an IBM Ph.D. fellowship.
Introspective Sorting and Selection Algorithms
 Software Practice and Experience
, 1997
"... Quicksort is the preferred inplace sorting algorithm in many contexts, since its average computing time on uniformly distributed inputs is \Theta(N log N) and it is in fact faster than most other sorting algorithms on most inputs. Its drawback is that its worstcase time bound is \Theta(N ). Previo ..."
Abstract

Cited by 39 (1 self)
 Add to MetaCart
Quicksort is the preferred inplace sorting algorithm in many contexts, since its average computing time on uniformly distributed inputs is \Theta(N log N) and it is in fact faster than most other sorting algorithms on most inputs. Its drawback is that its worstcase time bound is \Theta(N ). Previous attempts to protect against the worst case by improving the way quicksort chooses pivot elements for partitioning have increased the average computing time too muchone might as well use heapsort, which has a \Theta(N log N) worstcase time bound but is on the average 2 to 5 times slower than quicksort. A similar dilemma exists with selection algorithms (for finding the ith largest element) based on partitioning. This paper describes a simple solution to this dilemma: limit the depth of partitioning, and for subproblems that exceed the limit switch to another algorithm with a better worstcase bound. Using heapsort as the "stopper" yields a sorting algorithm that is just as fast as quicksort in the average case but also has an \Theta(N log N) worst case time bound. For selection, a hybrid of Hoare's find algorithm, which is linear on average but quadratic in the worst case, and the BlumFloydPrattRivestTarjan algorithm is as fast as Hoare's algorithm in practice, yet has a linear worstcase time bound. Also discussed are issues of implementing the new algorithms as generic algorithms and accurately measuring their performance in the framework of the C++ Standard Template Library.
Design and Implementation of a Practical Parallel Delaunay Algorithm
, 1999
"... This paper describes the design and implementation of a practical parallel algorithm for Delaunay triangulation that works well on general distributions. Although there have been many theoretical parallel algorithms for the problem, and some implementations based on bucketing that work well for unif ..."
Abstract

Cited by 33 (3 self)
 Add to MetaCart
This paper describes the design and implementation of a practical parallel algorithm for Delaunay triangulation that works well on general distributions. Although there have been many theoretical parallel algorithms for the problem, and some implementations based on bucketing that work well for uniform distributions, there has been little work on implementations for general distributions. We use the well known reduction of 2D Delaunay triangulation to find the 3D convex hull of points on a paraboloid. Based on this reduction we developed a variant of the Edelsbrunner and Shi 3D convex hull algorithm, specialized for the case when the point set lies on a paraboloid. This simplification reduces the work required by the algorithm (number of operations) from O(n log^2 n) to O(n log n). The depth (parallel time) is O(log^3 n) on a CREW PRAM. The algorithm is simpler than previous O(n log n) work parallel algorithms leading to smaller constants. Initial experiments using a variety of distributions showed that our parallel algorithm was within a factor of 2 in work from the best sequential algorithm. Based on these promising results, the algorithm was implemented using C and an MPIbased toolkit. Compared with previous work, the resulting implementation achieves significantly better speedups over good sequential code, does not assume a uniform distribution of points, and is widely portable due to its use of MPI as a communication mechanism. Results are presented for the IBM SP2, Cray T3D, SGI Power Challenge, and DEC AlphaCluster.
Spaceefficient geometric divideandconquer algorithms
 Comput. Geom
"... We develop a number of spaceefficient tools including an approach to simulate divideandconquer spaceefficiently, stably selecting and unselecting a subset from a sorted set, and computing the kth smallest element in one dimension from a multidimensional set that is sorted in another dimension. ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
We develop a number of spaceefficient tools including an approach to simulate divideandconquer spaceefficiently, stably selecting and unselecting a subset from a sorted set, and computing the kth smallest element in one dimension from a multidimensional set that is sorted in another dimension. We then apply these tools to solve several geometric problems that have solutions using some form of divideandconquer. Specifically, we present solutions running in O(n logn) time using O(1) extra memory given inputs of size n for the closest pair problem and the bichromatic closest pair problem. For the orthogonal line segment intersection problem, we solve the problem in O(n logn + k) time using O(1) extra space where n is the number of horizontal and vertical line segments and k is the number of intersections. 1
Implementation and Evaluation of an Efficient Parallel Delaunay Triangulation Algorithm
 in Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures
, 1997
"... This paper describes the derivation of an empirically efficient parallel twodimensional Delaunay triangulation program from a theoretically efficient CREW PRAM algorithm. Compared to previous work, the resulting implementation is not limited to datasets with a uniform distribution of points, achiev ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
This paper describes the derivation of an empirically efficient parallel twodimensional Delaunay triangulation program from a theoretically efficient CREW PRAM algorithm. Compared to previous work, the resulting implementation is not limited to datasets with a uniform distribution of points, achieves significantly better speedups over good serial code, and is widely portable due to its use of MPI as a communication mechanism. Results are presented for a looselycoupled cluster of workstations, a distributedmemory multicomputer, and a sharedmemory multiprocessor. The Machiavelli toolkit used to transform the nested data parallelism inherent in the divideandconquer algorithm into achievable task and data parallelism is also described and compared to previous techniques.
An efficient algorithm for the approximate median selection problem
 in Proceedings of the 4th Italian Conference on Algorithms and Complexity, ser. Lecture Notes in Computer Sciences
, 2000
"... We present an efficient algorithm for the approximate median selection problem. The algorithm works inplace; it is fast and easy to implement. For a large array it returns, with high probability, a very good estimate of the true median. The running time is linear in the length n of the input. The a ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
We present an efficient algorithm for the approximate median selection problem. The algorithm works inplace; it is fast and easy to implement. For a large array it returns, with high probability, a very good estimate of the true median. The running time is linear in the length n of the input. The algorithm performs fewer than 4 1 3n comparisons and 3n exchanges on the average. We present analytical results of the performance of the algorithm, as well as experimental illustrations of its precision. 1.