Results 1  10
of
74
A cost calculus for parallel functional programming
 Journal of Parallel and Distributed Computing
, 1995
"... Abstract Building a cost calculus for a parallel program development environment is difficult because of the many degrees of freedom available in parallel implementations, and because of difficulties with compositionality. We present a strategy for building cost calculi for skeletonbased programmin ..."
Abstract

Cited by 58 (6 self)
 Add to MetaCart
Abstract Building a cost calculus for a parallel program development environment is difficult because of the many degrees of freedom available in parallel implementations, and because of difficulties with compositionality. We present a strategy for building cost calculi for skeletonbased programming languages which can be used for derivational software development and which deals in a pragmatic way with the difficulties of composition. The approach is illustrated for the BirdMeertens theory of lists, a parallel functional language with an associated equational transformation system. Keywords: functional programming, parallel programming, program transformation, cost calculus, equational theories, architecture independence, BirdMeertens formalism.
Polyhedral code generation in the real world
 In Proceedings of the International Conference on Compiler Construction (ETAPS CC’06), LNCS
, 2006
"... Abstract. The polyhedral model is known to be a powerful framework to reason about high level loop transformations. Recent developments in optimizing compilers broke some generally accepted ideas about the limitations of this model. First, thanks to advances in dependence analysis for irregular acce ..."
Abstract

Cited by 27 (11 self)
 Add to MetaCart
Abstract. The polyhedral model is known to be a powerful framework to reason about high level loop transformations. Recent developments in optimizing compilers broke some generally accepted ideas about the limitations of this model. First, thanks to advances in dependence analysis for irregular access patterns, its applicability which was supposed to be limited to very simple loop nests has been extended to wide code regions. Then, new algorithms made it possible to compute the target code for hundreds of statements while this code generation step was expected not to be scalable. Such theoretical advances and new software tools allowed actors from both academia and industry to study more complex and realistic cases. Unfortunately, despite strong optimization potential of a given transformation for e.g., parallelism or data locality, code generation may still be challenging or result in high control overhead. This paper presents scalable code generation methods that make possible the application of increasingly complex program transformations. By studying the transformations themselves, we show how it is possible to benefit from their properties to dramatically improve both code generation quality and space/time complexity, with respect to the best stateoftheart code generation tool. In addition, we build on these improvements to present a new algorithm improving generated code performance for strided domains and reindexed schedules. 1
Code Generation in the Polytope Model
 In IEEE PACT
, 1998
"... Automatic parallelization of nested loops, based on a mathematical model, the polytope model, has been improved significantly over the last decade: stateoftheart methods allow flexible distributions of computations in space and time, which lead to highquality parallelism. However, these methods h ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
Automatic parallelization of nested loops, based on a mathematical model, the polytope model, has been improved significantly over the last decade: stateoftheart methods allow flexible distributions of computations in space and time, which lead to highquality parallelism. However, these methods have not found their way into practical parallelizing compilers due to the lack of code generation schemes which are able to deal with the newfound flexibility. To close this gap is the purpose of this paper. 1. Introduction In recent years, methods for automatic parallelization of nested loops based on a mathematical model, the polytope model [9, 12], have been improved significantly. The focus has been on identifying good schedules, i.e., distributions of computations in time, e.g., [6, 8], and allocations, i.e., distributions of computations in space, e.g., [5, 14]. Thus, the spacetime mapping, i.e., the combination of schedule and allocation, derived by stateoftheart techniques oft...
Generation of Synchronous Code for Automatic Parallelization of while Loops
 EUROPAR '95, Lecture Notes in Computer Science 966
, 1995
"... . Automatic parallelization of imperative programs has focused on nests of do loops with affine bounds and affine dependences, because in this case execution domains and dependences can be precisely known at compiletime. When dynamic control structures, such as while loops, are used, existing m ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
. Automatic parallelization of imperative programs has focused on nests of do loops with affine bounds and affine dependences, because in this case execution domains and dependences can be precisely known at compiletime. When dynamic control structures, such as while loops, are used, existing methods for conversion to singleassignment form and domain scanning are inapplicable. This paper gives an algorithm to automatically generate parallel code, together with an algorithm to possibly convert the program to singleassignment form. 1 Introduction Automatic parallelization of imperative programs has focused on nests of do loops with affine bounds and affine dependences [10], mainly because dependences can then precisely be known at compiletime. Data or "memorybased" dependences are due to reuse of memory cells, and thus are language and programdependent, whereas dataflows or "valuebased dependences" denote transmissions of values and thus are algorithmdependent. Memoryb...
Design Space Exploration for Massively Parallel Processor Arrays
 Parallel Computing Technologies, 6th International Conference, PaCT 2001, Proceedings
, 2001
"... In this paper, we describe an approach for the optimization of dedicated coprocessors that are implemented either in hardware (ASIC) or configware (FPGA). Such massively parallel coprocessors are typically part of a heterogeneous hardware/softwaresystem. Each coprocessor is a massive parallel ..."
Abstract

Cited by 16 (11 self)
 Add to MetaCart
In this paper, we describe an approach for the optimization of dedicated coprocessors that are implemented either in hardware (ASIC) or configware (FPGA). Such massively parallel coprocessors are typically part of a heterogeneous hardware/softwaresystem. Each coprocessor is a massive parallel system consisting of an array of processing elements (PEs). In order to decide whether to map a computational intensive task into hardware, existing approaches either try to optimize for performance or for cost with the other objective being a secondary goal.
HDC: A HigherOrder Language for DivideandConquer
, 2000
"... We propose the higherorder functional style for the parallel programming of algorithms. The functional language HDC, a subset of the language Haskell, facilitates the clean integration of skeletons into a functional program. Skeletons are prede ned programming schemata with an ecient parallel imple ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
We propose the higherorder functional style for the parallel programming of algorithms. The functional language HDC, a subset of the language Haskell, facilitates the clean integration of skeletons into a functional program. Skeletons are prede ned programming schemata with an ecient parallel implementation. We report on our compiler, which translates HDC programs into C+MPI, especially on the design decisions we made. Two small examples, the n queens problem and Karatsuba's polynomial multiplication, are presented to demonstrate the programming comfort and the speedup one can obtain.
Complexity and Uniformity of Elimination in Presburger Arithmetic
 UNIVERSITAT PASSAU
, 1997
"... The decision complexity of Presburger Arithmetic PA and its variants has received much attention in the literature. We investigate the complexity of quantifier elimination procedures for PA  a topic that is even more relevant for applications. First we show that the the author's triply exponential ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
The decision complexity of Presburger Arithmetic PA and its variants has received much attention in the literature. We investigate the complexity of quantifier elimination procedures for PA  a topic that is even more relevant for applications. First we show that the the author's triply exponential upper bound is essentially tight. This fact seems to preclude practical applications. By weakening the concept of quantifier elimination slightly to bounded quantifier elimination, we show, however, that the upper and lower bound for quantifier elimination in PA can be lowered by exactly one exponential. Moreover we gain uniformity in the coefficients, a property that we prove to be impossible for complete quantifier elimination in PA. Thus we have tight upper and lower complexity bounds for elimination theory in PA and uniform PA. The results are inspired by experimental implementations of bounded quantifier elimination that have solved nontrivial application problems e.g. in parametric i...
Parallelization of DivideandConquer by Translation to Nested Loops
 J. Functional Programming
, 1997
"... We propose a sequence of equational transformations and specializations which turns a divideandconquer skeleton in Haskell into a parallel loop nest in C. Our initial skeleton is often viewed as general divideandconquer. The specializations impose a balanced call tree, a fixed degree of the prob ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
We propose a sequence of equational transformations and specializations which turns a divideandconquer skeleton in Haskell into a parallel loop nest in C. Our initial skeleton is often viewed as general divideandconquer. The specializations impose a balanced call tree, a fixed degree of the problem division, and elementwise operations. Our goal is to select parallel implementations of divideandconquer via a spacetime mapping, which can be determined at compile time. The correctness of our transformations is proved by equational reasoning in Haskell; recursion and iteration are handled by induction. Finally, we demonstrate the practicality of the skeleton by expressing Strassen's matrix multiplication in it.