Results 1 - 10
of
142
Loop Parallelization in the Polytope Model
- CONCUR '93, Lecture Notes in Computer Science 715
, 1993
"... . During the course of the last decade, a mathematical model for the parallelization of FOR-loops has become increasingly popular. In this model, a (perfect) nest of r FOR-loops is represented by a convex polytope in Z r . The boundaries of each loop specify the extent of the polytope in a dis ..."
Abstract
-
Cited by 87 (23 self)
- Add to MetaCart
. During the course of the last decade, a mathematical model for the parallelization of FOR-loops has become increasingly popular. In this model, a (perfect) nest of r FOR-loops is represented by a convex polytope in Z r . The boundaries of each loop specify the extent of the polytope in a distinct dimension. Various ways of slicing and segmenting the polytope yield a multitude of guaranteed correct mappings of the loops' operations in space-time. These transformations have a very intuitive interpretation and can be easily quantified and automated due to their mathematical foundation in linear programming and linear algebra. With the recent availability of massively parallel computers, the idea of loop parallelization is gaining significance, since it promises execution speed-ups of orders of magnitude. The polytope model for loop parallelization has its origin in systolic design, but it applies in more general settings and methods based on it will become a part of futur...
An Exact Method for Analysis of Value-based Array Data Dependences
- In Sixth Annual Workshop on Programming Languages and Compilers for Parallel Computing
, 1993
"... Standard array data dependence testing algorithms give information about the aliasing of array references. If statement 1 writes a[5], and statement 2 later reads a[5], standard techniques described this as a flow dependence, even if there was an intervening write. We call a dependence between two ..."
Abstract
-
Cited by 79 (14 self)
- Add to MetaCart
Standard array data dependence testing algorithms give information about the aliasing of array references. If statement 1 writes a[5], and statement 2 later reads a[5], standard techniques described this as a flow dependence, even if there was an intervening write. We call a dependence between two references to the same memory location a memory-based dependence. In contrast, if there are no intervening writes, the references touch the same value and we call the dependence a value-based dependence. There has been a surge of recent work on value-based array data dependence analysis (also referred to as computation of array data-flow dependence information). In this paper, we describe a technique that is exact over programs without control flow (other than loops) and non-linear references. We compare our proposal with the technique proposed by Paul Feautrier, which is the other technique that is complete over the same domain as ours. We also compare our work with that of Tu and Padua, a ...
Interprocedural array regions analyses
, 1995
"... In order to perform powerful program optimizations, an exact interprocedural analysis of array data ow is needed. For that purpose, two new types of array region are introduced. IN and OUT regions represent the sets of array elements, the values of which are imported to or exported from the current ..."
Abstract
-
Cited by 64 (7 self)
- Add to MetaCart
In order to perform powerful program optimizations, an exact interprocedural analysis of array data ow is needed. For that purpose, two new types of array region are introduced. IN and OUT regions represent the sets of array elements, the values of which are imported to or exported from the current statement or procedure. Among the various applications are: compilation of communications for message-passing machines, array privatization, compile-time optimization of local memory or cache behavior in hierarchical memory machines.
Generation of Efficient Nested Loops from Polyhedra
- International Journal of Parallel Programming
, 2000
"... Automatic parallelization in the polyhedral model is based on affine transformations from an original computation domain (iteration space) to a target space-time domain, often with a different transformation for each variable. Code generation is an often ignored step in this process that has a signi ..."
Abstract
-
Cited by 62 (1 self)
- Add to MetaCart
Automatic parallelization in the polyhedral model is based on affine transformations from an original computation domain (iteration space) to a target space-time domain, often with a different transformation for each variable. Code generation is an often ignored step in this process that has a significant impact on the quality of the final code. It involves making a trade-off between code size and control code simplification/optimization. Previous methods of doing code generation are based on loop splitting, however they have non-optimal behavior when working on parameterized programs. We present a general parameterized method for code generation based on dual representation of polyhedra. Our algorithm uses a simple recursion on the dimensions of the domains, and enables fine control over the tradeoff between code size and control overhead.
A Practical Data Flow Framework for Array Reference Analysis and its Use in Optimizations
- In ACM SIGPLAN'93 Conf. on Prog. Lang. Design and Implementation
, 1993
"... Data flow analysis techniques have traditionally been restricted to the analysis of scalar variables. This restriction, however, imposes a limitation on the kinds of optimizations that can be performed in loops containing array references. We present a data flow framework for array reference analysi ..."
Abstract
-
Cited by 55 (2 self)
- Add to MetaCart
Data flow analysis techniques have traditionally been restricted to the analysis of scalar variables. This restriction, however, imposes a limitation on the kinds of optimizations that can be performed in loops containing array references. We present a data flow framework for array reference analysis that provides the information needed in various optimizations targeted at sequential or fine-grained parallel architectures. The framework extends the traditional scalar framework by incorporating iteration distance values into the analysis to qualify the computed data flow solution during the fixed point iteration. Analyses phrased in this framework are capable of discovering recurrent access patterns among array references that evolve during the execution of a loop. The framework is practical in that the fixed point solution requires at most three passes over the body of structured loops. Applications of our framework are discussed for register allocation, load/store optimizations, and controlled loop unrolling.
Automatic Storage Management for Parallel Programs
- Parallel Computing
, 1998
"... This article deals with automatic parallelization of static control programs. During the parallelization process the removal of memory related dependences is usually realized by translating the original program into a single assignment form. This total data expansion has a very high memory cost. We ..."
Abstract
-
Cited by 44 (3 self)
- Add to MetaCart
This article deals with automatic parallelization of static control programs. During the parallelization process the removal of memory related dependences is usually realized by translating the original program into a single assignment form. This total data expansion has a very high memory cost. We present a technique of partial data expansion which leaves untouched the performances of the parallelization process, with the help of algebra techniques given by the polytope model. Keywords : Automatic Parallelization, Memory Management, Array Dataflow Analysis, Scheduling. 1 Introduction This article deals with the automatic parallelization technique based on the polytope model. This method can be applied provided that source programs are static control programs, i.e. are limited to do loops and assignment to array with affine subscripts. The first step is the extraction of exact dependences by array data flow analysis. All memory related dependences, which are due to reuse of data, are...
Background Memory Area Estimation for Multi-dimensional Signal Processing Systems
- IEEE Trans. on VLSI Systems
, 1995
"... Memory cost is responsible for a large amount of the chip and/or board area of customized video and image processing system realizations. In this paper, we present a novel technique -- founded on data-flow analysis -- which allows to address the problem of background memory size evaluation for a giv ..."
Abstract
-
Cited by 40 (17 self)
- Add to MetaCart
Memory cost is responsible for a large amount of the chip and/or board area of customized video and image processing system realizations. In this paper, we present a novel technique -- founded on data-flow analysis -- which allows to address the problem of background memory size evaluation for a given non-procedural algorithm specification, operating on multi-dimensional signals with affine indices. Most of the target applications are characterized by a huge number of signals, so a new polyhedral data-flow model operating on groups of scalar signals is proposed. These groups are obtained by a novel analytical partitioning technique, allowing to select a desired granularity, depending on the application complexity. The method incorporates a way to trade-off memory size with computational and controller complexity. 1 Introduction Speech, image and video processing applications involve a large amount of multi-dimensional signals which lead to large memory units. These result in significa...
Mapping Uniform Loop Nests onto Distributed Memory Architectures
- Parallel Computing
, 1993
"... This paper deals with scheduling, mapping and partitioning techniques for uniform loop nests. It is shown how the different techniques of scheduling, of mapping and of partitioning are linked and how code generation can be derived according to these methods. Our approach is based upon extensions of ..."
Abstract
-
Cited by 33 (9 self)
- Add to MetaCart
This paper deals with scheduling, mapping and partitioning techniques for uniform loop nests. It is shown how the different techniques of scheduling, of mapping and of partitioning are linked and how code generation can be derived according to these methods. Our approach is based upon extensions of systolic array design methodologies.
Accurate Analysis of Array References
, 1992
"... ii I certify that I have read this thesis and that in my opinion it is fully adequate, in scope and in quality, as a dissertation for the degree of Doctor of Philosophy. John L. Hennessy(Principal Adviser) I certify that I have read this thesis and that in my opinion it is fully adequate, in scope a ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
ii I certify that I have read this thesis and that in my opinion it is fully adequate, in scope and in quality, as a dissertation for the degree of Doctor of Philosophy. John L. Hennessy(Principal Adviser) I certify that I have read this thesis and that in my opinion it is fully adequate, in scope and in quality, as a dissertation for the degree of Doctor of Philosophy.

