Results 1  10
of
78
Programming Parallel Algorithms
, 1996
"... In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a th ..."
Abstract

Cited by 193 (9 self)
 Add to MetaCart
In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a theoretical framework, many are quite efficient in practice or have key ideas that have been used in efficient implementations. This research on parallel algorithms has not only improved our general understanding ofparallelism but in several cases has led to improvements in sequential algorithms. Unf:ortunately there has been less success in developing good languages f:or prograftlftling parallel algorithftls, particularly languages that are well suited for teaching and prototyping algorithms. There has been a large gap between languages
Implementation of a Portable Nested DataParallel Language
 Journal of Parallel and Distributed Computing
, 1994
"... This paper gives an overview of the implementation of Nesl, a portable nested dataparallel language. This language and its implementation are the first to fully support nested data structures as well as nested dataparallel function calls. These features allow the concise description of parallel alg ..."
Abstract

Cited by 177 (26 self)
 Add to MetaCart
This paper gives an overview of the implementation of Nesl, a portable nested dataparallel language. This language and its implementation are the first to fully support nested data structures as well as nested dataparallel function calls. These features allow the concise description of parallel algorithms on irregular data, such as sparse matrices and graphs. In addition, they maintain the advantages of dataparallel languages: a simple programming model and portability. The current Nesl implementation is based on an intermediate language called Vcode and a library of vector routines called Cvl. It runs on the Connection Machine CM2, the Cray YMP C90, and serial machines. We compare initial benchmark results of Nesl with those of machinespecific code on these machines for three algorithms: leastsquares linefitting, median finding, and a sparsematrix vector product. These results show that Nesl's performance is competitive with that of machinespecific codes for regular dense da...
Provably efficient scheduling for languages with finegrained parallelism
 IN PROC. SYMPOSIUM ON PARALLEL ALGORITHMS AND ARCHITECTURES
, 1995
"... Many highlevel parallel programming languages allow for finegrained parallelism. As in the popular worktime framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A ..."
Abstract

Cited by 81 (23 self)
 Add to MetaCart
Many highlevel parallel programming languages allow for finegrained parallelism. As in the popular worktime framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A common concern in executing such programs is to schedule tasks to processors dynamically so as to minimize not only the execution time, but also the amount of space (memory) needed. Without careful scheduling, the parallel execution on p processors can use a factor of p or larger more space than a sequential implementation of the same program. This paper first identifies a class of parallel schedules that are provably efficient in both time and space. For any
The Design, Implementation, and Evaluation of Jade
 ACM Transactions on Programming Languages and Systems
, 1998
"... this article we discuss the design goals and decisions that determined the final form of Jade and present an overview of the Jade implementation. We also present our experience using Jade to implement several complete scientific and engineering applications. We use this experience to evaluate how th ..."
Abstract

Cited by 62 (4 self)
 Add to MetaCart
this article we discuss the design goals and decisions that determined the final form of Jade and present an overview of the Jade implementation. We also present our experience using Jade to implement several complete scientific and engineering applications. We use this experience to evaluate how the different Jade language features were used in practice and how well Jade as a whole supports the process of developing parallel applications. We find that the basic idea of preserving the serial semantics simplifies the program development process, and that the concept of using data access specifications to guide the parallelization offers significant advantages over more traditional controlbased approaches. We also find that the Jade data model can interact poorly with concurrency patterns that write disjoint pieces of a single aggregate data structure, although this problem arises in only one of the applications. Categories and Subject Descriptors: D.1.3 [Programming Te
A Semantics for Shape
 Science of Computer Programming
, 1995
"... Shapely types separate data, represented by lists, from shape, or structure. This separation supports shape polymorphism, where operations are defined for arbitrary shapes, and shapely operations, for which the shape of the result is determined by that of the input, permitting static shape checking. ..."
Abstract

Cited by 60 (18 self)
 Add to MetaCart
Shapely types separate data, represented by lists, from shape, or structure. This separation supports shape polymorphism, where operations are defined for arbitrary shapes, and shapely operations, for which the shape of the result is determined by that of the input, permitting static shape checking. The shapely types are closed under the formation of fixpoints, and hence include the usual algebraic types of lists, trees, etc. They also include other standard data structures such as arrays, graphs and records. 1 Introduction The values of a shapely type are uniquely determined by their shape and their data. The shape can be thought of as a structure with holes or positions, into which data elements (stored in a list) can be inserted. The use of shape in computing is widespread, but till now it has not, apparently, been the subject of independent study. The body of the paper presents a semantics for shape, based on elementary ideas from category theory. First, let us consider some examp...
Synchronous Kahn Networks
, 1996
"... Synchronous dataflow is a programming paradigm which has been successfully applied in reactive systems. In this context, it can be characterized as some class of static bounded memory dataflow networks. In particular, these networks are not recursively defined, and obey some kind of "synchronous" ..."
Abstract

Cited by 58 (9 self)
 Add to MetaCart
Synchronous dataflow is a programming paradigm which has been successfully applied in reactive systems. In this context, it can be characterized as some class of static bounded memory dataflow networks. In particular, these networks are not recursively defined, and obey some kind of "synchronous" constraints (clock calculus). Based on Kahn's relationship between dataflow and stream functions, the synchronous constraints can be related to Wadler's listlessness, and can be seen as sufficient conditions ensuring listless evaluation. As a byproduct, those networks enjoy efficient compiling techniques. In this paper, we show that it is possible to extend the class of static synchronous dataflow to higher order and dynamical networks, thus giving sense to a larger class of synchronous dataflow networks.
The Design, Implementation and Evaluation of Jade, a Portable, Implicitly Parallel Programming Language
 Dept. of Computer Science, Stanford Univ
, 1994
"... ii ..."
SpaceEfficient Scheduling of Nested Parallelism
 ACM Transactions on Programming Languages and Systems
, 1999
"... This article presents an online scheduling algorithm that is provably space e#cient and time e#cient for nestedparallel languages. For a computation with depth D and serial space requirement S1 , the algorithm generates a schedule that requires at most S1 +O(K D p)space (including scheduler spa ..."
Abstract

Cited by 28 (4 self)
 Add to MetaCart
This article presents an online scheduling algorithm that is provably space e#cient and time e#cient for nestedparallel languages. For a computation with depth D and serial space requirement S1 , the algorithm generates a schedule that requires at most S1 +O(K D p)space (including scheduler space) on p processors. Here, K is a useradjustable runtime parameter specifying the net amount of memory that a thread may allocate before it is preempted by the scheduler. Adjusting the value of K provides a tradeo# between the running time and the memory requirement of a parallel computation. To allow the scheduler to scale with the number of processors, we also parallelize the scheduler and analyze the space and time bounds of the computation to include scheduling costs. In addition to showing that the scheduling algorithm is space and time e#cient in theory, we demonstrate that it is e#ective in practice. We have implemented a runtime system that uses our algorithm to schedule lightweight parallel threads. The results of executing parallel programs on this system show that our scheduling algorithm significantly reduces memory usage compared to previous techniques, without compromising performance
A Componentbased Architecture for Parallel MultiPhysics PDE Simulation
 in Proceedings of the International Conference on Computational Science, SpringerVerlag LNCS 2331
, 2002
"... We describe the Uintah Computational Framework (UCF), a set of software components and libraries that facilitate the simulation of partial differential equations on structured adaptive mesh refinement grids using hundreds to thousands of processors. The UCF uses a nontraditional approach to achievi ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
We describe the Uintah Computational Framework (UCF), a set of software components and libraries that facilitate the simulation of partial differential equations on structured adaptive mesh refinement grids using hundreds to thousands of processors. The UCF uses a nontraditional approach to achieving parallelism, employing an abstract taskgraph representation to describe computation and communication. This representation has a number of advantages that affect the performance of the resulting simulation. We demonstrate performance of the system on a solid mechanics algorithm, two different computational fluiddynamics (CFD) algorithms, as well as coupled CFD/mechanics algorithms. We show performance of the UCF using up to 2000 processors. © 2005 Published by Elsevier B.V.
SpaceEfficient Scheduling of Parallelism with Synchronization Variables
"... Recent work on scheduling algorithms has resulted in provable bounds on the space taken by parallel computations in relation to the space taken by sequential computations. The results for online versions of these algorithms, however, have been limited to computations in which threads can only synchr ..."
Abstract

Cited by 27 (9 self)
 Add to MetaCart
Recent work on scheduling algorithms has resulted in provable bounds on the space taken by parallel computations in relation to the space taken by sequential computations. The results for online versions of these algorithms, however, have been limited to computations in which threads can only synchronize with ancestor or sibling threads. Such computations do not include languages with futures or userspecified synchronization constraints. Here we extend the results to languages with synchronization variables. Such languages include languages with futures, such as Multilisp and Cool, as well as other languages such asid. The main result is an online scheduling algorithm which, given a computation with w work (total operations), synchronizations, d depth (critical path) and s1 sequential space, will run in O(w=p + log(pd)=p + d log(pd)) time and s1 + O(pd log(pd)) space, on a pprocessor crcw pram with a fetchandadd primitive. This includes all time and space costs for both the computation and the scheduler. The scheduler is nonpreemptive in the sense that it will only move a thread if the thread suspends on a synchronization, forks a new thread, or exceeds a threshold when allocating space. For the special case where the computation is a planar graph with lefttoright synchronization edges, the scheduling algorithm can be implemented in O(w=p+d log p) time and s1 + O(pd log p) space. These are the first nontrivial space bounds described for such languages.