## A Framework for Generating Task Parallel Programs (1999)

Venue: | In 7th Symposium on the Frontiers of Massively Parallel Computation - Frontiers '99 |

Citations: | 2 - 2 self |

### BibTeX

@INPROCEEDINGS{Fissgus99aframework,

author = {Ursula Fissgus and Thomas Rauber and Gudula Rünger},

title = {A Framework for Generating Task Parallel Programs},

booktitle = {In 7th Symposium on the Frontiers of Massively Parallel Computation - Frontiers '99},

year = {1999},

pages = {72--80}

}

### OpenURL

### Abstract

We consider the generation of mixed task and data parallel programs and discuss how a clear separation into a task and data parallel level can support the development of efficient programs. The program development starts with a specification of the maximum degree of task and data parallelism and proceeds by performing several derivation steps in which the degree of parallelism is adapted to a specific parallel machine. The separation between the task and data parallel level is preserved during the design and translation phases by clearly defined interfaces. We show how the final message-passing programs are generated from the data parallel and the task parallel specification and how the interaction between the two levels can be established. We demonstrate the usefulness of the approach by examples from numerical analysis which offer the potential of a mixed task and data parallel execution but for which it is not a priori clear, how this potential should be used for an implementation o...

### Citations

1130 |
A Bridging Model for Parallel Computation
- Valiant
- 1990
(Show Context)
Citation Context ... (left) and Intel Paragon (right), dense input system. 6 Comparison with Related Work Work related to the TwoL approach includes studies on new parallel programming paradigms [18], computation models =-=[1, 7, 31]-=-, performance prediction techniques [9, 17, 19], and parallelizing compilers. In the following, we compare our approach to similar approaches in the area of parallelizing compilers. Several research g... |

946 |
Performance Fortran Forum, High Performance Fortran Language Specification, Version 1.0
- High
- 1993
(Show Context)
Citation Context ...ed by separate runtime tests for each application. A model that is similar to 7 CONCLUSIONS AND FUTURE RESEARCH 18 the task parallelism model of Fx has recently been added to High Performance Fortran =-=[16]-=- as an approved extension. An exploitation of task and data parallelism in the context of a parallelizing compiler can be found in the Paradigm compiler [3, 19, 21]. The Paradigm compiler provides a f... |

738 |
Parallel Program Design: A Foundation
- Chandy, Misra
- 1988
(Show Context)
Citation Context ...havior and exactly specifies the operations to be performed. These operations determine the semantics of the BM. In the following, we use a UNITY-like notation for the high-level specification of BMs =-=[6, 32]-=-. A BM specification contains a header specifying the input/output parameters, a declaration section introducing local variables, and an assignment section consisting of a sequence of assignment state... |

653 |
High Performance Compilers for Parallel Computing
- Wolfe
- 1995
(Show Context)
Citation Context ...pplied to different data. The granularity depends on the number of data elements per processor. Data parallelism can often be detected by parallelizing compilers using loop parallelization techniques =-=[34]-=-. Task parallelism occurs when independent program parts can be executed on different processors or disjoint groups of processors where processors of the same group collaborate in a data parallel fash... |

624 |
MPI: The Complete Reference
- Snir, Otto, et al.
- 1996
(Show Context)
Citation Context ...t variables as the composed module and an additional parameter for the communication context provided by the calling module. The communication context is realized by the communicator mechanism of MPI =-=[27]-=-. The communication context passed as parameter will be used for all message transmissions within the corresponding functions. For each BM, we assume that the data parallel part provides a PBM and a c... |

603 |
Introduction to Numerical Analysis
- Stoer, Bulirsch
- 1980
(Show Context)
Citation Context ...wing. (In practice, additional tests on the size of the residual r k and a preconditioning will be used. Furthermore, due to rounding errors, a higher number of iteration, usually 2M-5M, is performed =-=[28]-=-.) A module specification is a non-executable program which only indicates the degree of parallelism without giving an exact execution order of tasks or specifying a data distribution for variables. A... |

497 | Eicken. Logp: Towards a realistic model of parallel computation
- Culler, Karp, et al.
- 1993
(Show Context)
Citation Context ... (left) and Intel Paragon (right), dense input system. 6 Comparison with Related Work Work related to the TwoL approach includes studies on new parallel programming paradigms [18], computation models =-=[1, 7, 31]-=-, performance prediction techniques [9, 17, 19], and parallelizing compilers. In the following, we compare our approach to similar approaches in the area of parallelizing compilers. Several research g... |

351 |
ScaLAPACK Users’ Guide
- Blackford, Choi, et al.
- 1997
(Show Context)
Citation Context ...o the data parallel level can be established in different ways. If a library of data parallel realizations of basic computations is available, e.g., in the form of a scientific library like ScaLAPACK =-=[5]-=-, correct function calls are created by the translation step of the task parallel part, i.e., function calls with the data distribution required by the library interface. The specification of the proc... |

235 | LogGP: Incorporating long messages into the LogP model for parallel computation
- Alexandrov, Ionescu, et al.
- 1997
(Show Context)
Citation Context ...PI Comm dup (newcomm1, &newcomm2); f /* lines (23-27) */ MPI Comm newcomm3; int g[2]; int sum[3]; int color; int i; MPI Comm rank (newcomm2, &myrank); g[0] = max (amfg ("sv prod"), amfg (&qu=-=ot;vv add")); g[1] = max (amfg ("-=-sv prod"), amfg ("vv add")); for (i = 0, sum[0] = 0; i ! 2; i++) sum[i + 1] = sum[i] + g[i]; for (i = 0; i ! 2; i++) if ((sum[i] != myrank) && (myrank ! sum[i + 1])) color = i; MPI Comm... |

208 |
Mathematical Analysis and Numerical Methods for Science and Technology. Volume 5
- Dautray, Lions
- 1992
(Show Context)
Citation Context ...onstant computational effort [15]. A function f with evaluation costs that depend on the system size arises, e.g., when solving nonlinear partial differential equations with Fourier--Galerkin methods =-=[8]-=-. In the experiments we used the Brusselator equation as example for a partial differential equation that results in a sparse system of ODEs. The Brusselator equation is a reactor-diffusion system fro... |

134 | Fortran M : a language for modular parallel programming
- Foster, Chandy
(Show Context)
Citation Context ...approach to similar approaches in the area of parallelizing compilers. Several research groups working on parallelizing compilers have included support to combine task and data parallelism. Fortran M =-=[11, 12]-=- allows the creation of processes which can communicate with each other by predefined channels and which can be combined with HPF for a mixed task and data parallel execution. In contrast to the Fortr... |

101 | The Paradigm Compiler for Distributed-Memory Multicomputers
- Banerjee, Chandy, et al.
(Show Context)
Citation Context ...e fragment shown corresponds to lines (22-28) of the frame program of Figure 5. . . . /* lines (22-28) */ MPI Comm dup (newcomm1, &newcomm2); f /* lines (23-27) */ MPI Comm newcomm3; int g[2]; int sum=-=[3]; int color; int i; MPI Comm rank (newco-=-mm2, &myrank); g[0] = max (amfg ("sv prod"), amfg ("vv add")); g[1] = max (amfg ("sv prod"), amfg ("vv add")); for (i = 0, sum[0] = 0; i ! 2; i++) sum[i + 1] = ... |

79 | Models of machines and computation for mapping in multicomputers
- Norman, Thanisch
- 1993
(Show Context)
Citation Context ...IRK method on the IBM SP2 (left) and Intel Paragon (right), dense input system. 6 Comparison with Related Work Work related to the TwoL approach includes studies on new parallel programming paradigms =-=[18]-=-, computation models [1, 7, 31], performance prediction techniques [9, 17, 19], and parallelizing compilers. In the following, we compare our approach to similar approaches in the area of parallelizin... |

71 | Automatic Extraction of Functional Parallelism from Ordinary Programs - Girkar, Polychronopoulos - 1992 |

70 |
G.: Solving Ordinary Differential Equations II
- Hairer, Wanner
- 1996
(Show Context)
Citation Context ...solving systems of differential equations. The discretization of the spatial derivatives of a time-dependent partial differential equation results in a function f with a constant computational effort =-=[15]-=-. A function f with evaluation costs that depend on the system size arises, e.g., when solving nonlinear partial differential equations with Fourier--Galerkin methods [8]. In the experiments we used t... |

64 |
Automatic Parallelization for Distributed-Memory Multiprocessor Systems
- Gerndt
- 1989
(Show Context)
Citation Context ...rting the UNITY-like expressions to loops and by realizing the communication operations by a specific communication library like MPI. This transformation is similar to loop parallelization techniques =-=[34, 13]-=- and will not be considered further in this article. 5 Examples and Experiments As examples, we consider the conjugate gradient method and solution methods for ordinary differential equations (ODEs). ... |

55 | A Compilation System that Integrates High Performance Fortran and Fortran M
- Foster, Xu, et al.
- 1994
(Show Context)
Citation Context ...approach to similar approaches in the area of parallelizing compilers. Several research groups working on parallelizing compilers have included support to combine task and data parallelism. Fortran M =-=[11, 12]-=- allows the creation of processes which can communicate with each other by predefined channels and which can be combined with HPF for a mixed task and data parallel execution. In contrast to the Fortr... |

53 | Automatic generation of efficient array redistribution routines for distributed memory multicomputers
- Ramaswamy, Banerjee
- 1994
(Show Context)
Citation Context ...e for each node and the scheduling algorithm decides on a scheme of execution for the allocated nodes. The goal is to select a strategy that minimizes the execution time of the macro data-flow graph. =-=[20]-=- considers the generation of array redistributions between tasks. There are two main differences between the Paradigm and the TwoL approach. First, Paradigm expects as input a sequential program where... |

40 | Approaches for Integrating Task and Data Parallelism
- Bal, Haines
- 1998
(Show Context)
Citation Context ...lude examples from author supported by Deutsche Forschungsgemeinschaft 2 MODEL OVERVIEW 2 numerical analysis [23, 24], signal processing [29], and multidisciplinary codes like Global Climate Modeling =-=[2]-=-. The integration of task and data parallelism is an active area of research because of its potential benefits and several approaches have been proposed recently. These include language approaches lik... |

31 | A framework for exploiting task and data parallelism on distributed memory multicomputers
- Ramaswamy, Sapatnekar, et al.
- 1997
(Show Context)
Citation Context ..., using the available task parallelism and combining it with data parallelism can increase the performance of parallel applications considerably since an additional degree of parallelism is exploited =-=[21, 24]-=-. This is especially important for parallel machines with a large number of processors like the ASCI teraflop machines or many of the T3E installations. Applications that benefit from a combination of... |

28 |
De Velde. Concurrent Scientific Computing
- Van
- 1994
(Show Context)
Citation Context ... k+1 k (1) (5) (6) (7) (4) (3) (2) x k r k p k w k-1 Figure 2: The conjugate gradient method. system of linear equations Ax = b with a symmetric, positive-definite coefficient matrix A 2 IR M \ThetaM =-=[32]-=-. In each step k, 0sksK \Gamma 1, the method chooses a search direction p k 2 IR n such that p k is A-conjugate to p 0 ; : : : ; p k\Gamma1 , i.e., p T l Ap k = 0 for l ! k. Figurs 2 shows a descripti... |

17 | Microarchitecture support for dynamic scheduling of acyclic task graphs
- Beckmann, Polychronopoulos
- 1992
(Show Context)
Citation Context ... 19, 21]. The Paradigm compiler provides a framework that expresses task parallelism by a macro data-flow graph which has been derived from the hierarchical task graphs used in the Parafrase compiler =-=[4]-=-. Nodes in the macro data-flow graph correspond to basic parallel tasks or loop constructs, edges correspond to precedence constraints that exist between tasks. The nodes and edges are weighted with p... |

17 |
Simultaneous Exploitation of Task and Data Parallelism in Regular Scientific Applications
- Ramaswamy
(Show Context)
Citation Context ... system. 6 Comparison with Related Work Work related to the TwoL approach includes studies on new parallel programming paradigms [18], computation models [1, 7, 31], performance prediction techniques =-=[9, 17, 19]-=-, and parallelizing compilers. In the following, we compare our approach to similar approaches in the area of parallelizing compilers. Several research groups working on parallelizing compilers have i... |

14 |
Daasch. Accurate predictions of parallel program execution time
- Driscoll, Robert
- 1995
(Show Context)
Citation Context ... system. 6 Comparison with Related Work Work related to the TwoL approach includes studies on new parallel programming paradigms [18], computation models [1, 7, 31], performance prediction techniques =-=[9, 17, 19]-=-, and parallelizing compilers. In the following, we compare our approach to similar approaches in the area of parallelizing compilers. Several research groups working on parallelizing compilers have i... |

14 | Automatic mapping of task and data parallel programs for efficient execution on multicomputers
- Subhlok
- 1993
(Show Context)
Citation Context ...hat benefit from a combination of task and data parallelism include examples from author supported by Deutsche Forschungsgemeinschaft 2 MODEL OVERVIEW 2 numerical analysis [23, 24], signal processing =-=[29]-=-, and multidisciplinary codes like Global Climate Modeling [2]. The integration of task and data parallelism is an active area of research because of its potential benefits and several approaches have... |

14 |
Parallel iteration of high-order Runge-Kutta methods with stepsize control
- HOUWEN, SOMMEIJER
- 1990
(Show Context)
Citation Context ...be used for ODEs with different characteristics. Here, we consider RK methods with a large potential of task and data parallelism. These methods have been especially designed for a parallel execution =-=[33, 22]-=-. We apply the solution methods to two classes of ODEs which differ in the amount of computational work of the right hand side f of the ODE system: ffl f has fixed evaluation costs that are independen... |

12 |
Benchmark Evaluation of the IBM SP2 for Parallel Signal Processing
- Hwang, Xu, et al.
- 1996
(Show Context)
Citation Context ... system. 6 Comparison with Related Work Work related to the TwoL approach includes studies on new parallel programming paradigms [18], computation models [1, 7, 31], performance prediction techniques =-=[9, 17, 19]-=-, and parallelizing compilers. In the following, we compare our approach to similar approaches in the area of parallelizing compilers. Several research groups working on parallelizing compilers have i... |

12 |
Deriving Structured Parallel Implementations for Numerical Methods, The Euromicro Journal 41:589–608
- Rauber, Rünger
- 1996
(Show Context)
Citation Context ...tion leads to an efficient program for a given DMM and how design decisions like task scheduling and data distribution can be derived from a general specification expressing the potential parallelism =-=[25, 26]-=-. In this paper, we consider the relationship between task and data parallel executions and describe how hierarchically structured task parallelism can automatically be transformed into a correspondin... |

6 | Modeling the Communication Behavior of the Intel Paragon
- Foschia, Rauber, et al.
- 1997
(Show Context)
Citation Context ...uch that the resulting communication is minimized. All decisions are based on the derivation of runtime formulas containing parameters that describe the relevant properties of the target machine, see =-=[25, 10]-=- for a detailed discussion. Fixing the implementation decisions leads to a parallel frame program exactly expressing the degree of parallelism that should be exploited for a given DMM. The frame progr... |

6 |
Parallel Iterated Runge--Kutta Methods and Applications
- Rauber, Runger
- 1996
(Show Context)
Citation Context ...be used for ODEs with different characteristics. Here, we consider RK methods with a large potential of task and data parallelism. These methods have been especially designed for a parallel execution =-=[33, 22]-=-. We apply the solution methods to two classes of ODEs which differ in the amount of computational work of the right hand side f of the ODE system: ffl f has fixed evaluation costs that are independen... |

6 |
Load Balancing Schemes for Extrapolation Methods. Concurrency: Practice and Experience
- Rauber, Runger
- 1997
(Show Context)
Citation Context ...nstallations. Applications that benefit from a combination of task and data parallelism include examples from author supported by Deutsche Forschungsgemeinschaft 2 MODEL OVERVIEW 2 numerical analysis =-=[23, 24]-=-, signal processing [29], and multidisciplinary codes like Global Climate Modeling [2]. The integration of task and data parallelism is an active area of research because of its potential benefits and... |

5 | The Compiler TwoL for the Design of Parallel Implementations
- Rauber, Runger
- 1996
(Show Context)
Citation Context ...tion leads to an efficient program for a given DMM and how design decisions like task scheduling and data distribution can be derived from a general specification expressing the potential parallelism =-=[25, 26]-=-. In this paper, we consider the relationship between task and data parallel executions and describe how hierarchically structured task parallelism can automatically be transformed into a correspondin... |

3 |
A New Model for Integrating Nested Task and Data
- Subhlok, Yang
- 1997
(Show Context)
Citation Context ... data distributions by himself. The Fx approach allows task parallelism by providing directives to partition processors into subgroups and to assign computations to different subgroups (task regions) =-=[30]-=-. Computations of a specific subgroup are executed in a data parallel way. The Fx compiler provides a mapping tool for the grouping of subroutine calls to modules and the mapping of processors to modu... |

2 | Comparing Task and Data Parallel Execution Schemes for the DIIRK method
- Rauber, Runger
- 1996
(Show Context)
Citation Context ..., using the available task parallelism and combining it with data parallelism can increase the performance of parallel applications considerably since an additional degree of parallelism is exploited =-=[21, 24]-=-. This is especially important for parallel machines with a large number of processors like the ASCI teraflop machines or many of the T3E installations. Applications that benefit from a combination of... |