Results 1  10
of
37
Semiautomatic composition of loop transformations for deep parallelism and memory hierarchies
 Intl J. of Parallel Programming
, 2006
"... Modern compilers are responsible for translating the idealistic operational semantics of the source program into a form that makes efficient use of a highly complex heterogeneous machine. Since optimization problems are associated with huge and unstructured search spaces, this combinational task is ..."
Abstract

Cited by 50 (18 self)
 Add to MetaCart
Modern compilers are responsible for translating the idealistic operational semantics of the source program into a form that makes efficient use of a highly complex heterogeneous machine. Since optimization problems are associated with huge and unstructured search spaces, this combinational task is poorly achieved in general, resulting in weak scalability and disappointing sustained performance. We address this challenge by working on the program representation itself, using a semiautomatic optimization approach to demonstrate that current compilers offen suffer from unnecessary constraints and intricacies that can be avoided in a semantically richer transformation framework. Technically, the purpose of this paper is threefold: (1) to show that syntactic code representations close to the operational semantics lead to rigid phase ordering and cumbersome expression of architectureaware loop transformations, (2) to illustrate how complex transformation sequences may be needed to achieve significant performance benefits, (3) to facilitate the automatic search for program transformation sequences, improving on classical polyhedral representations to better support operation research strategies in a simpler, structured search space. The proposed framework relies on a unified polyhedral representation of loops and statements, using normalization rules to allow flexible and expressive transformation sequencing. This representation allows to extend the scalability of polyhedral dependence analysis, and to delay the (automatic) legality checks until the end of a transformation sequence. Our work leverages on algorithmic advances in polyhedral code generation and has been implemented in a modern research compiler.
LatticeBased Memory Allocation
, 2003
"... We investigate the problem of memory reuse, for reducing the necessary memory size, in the context of compilation of dedicated processors. Memory reuse is a wellknown concept when allocating registers (i.e., scalar variables). Its (recent) extension to arrays was studied mainly by Lefebvre and Feau ..."
Abstract

Cited by 45 (4 self)
 Add to MetaCart
We investigate the problem of memory reuse, for reducing the necessary memory size, in the context of compilation of dedicated processors. Memory reuse is a wellknown concept when allocating registers (i.e., scalar variables). Its (recent) extension to arrays was studied mainly by Lefebvre and Feautrier (for loop parallelization) and by Quillereand Rajopadhye (for circuit synthesis based on recurrence equations) . Both consider a#ne mappings of indices to data, with modulo expressions in the first and (mainly) projections in the second. We develop a mathematical framework based on (integral) critical lattices that subsumes all previous approaches and gives new insights into the problem. Our technique consists first in building an abstract representation of conflicting indices (equivalent in a multidimensional space to the interference graph for register allocation), then in defining an integral lattice, admissible for the set of differences of conflicting indices, used to build a valid modular allocation. We also show the link with critical lattices, successive minima, and basis reduction, and we analyze various strategies for latticebased memory allocation.
Violated dependence analysis
 In ACM ICS
, 2006
"... The polyhedral model is a powerful framework to reason about high level loop transformations. Yet the lack of scalable algorithms and tools has deterred actors from both academia and industry to put this model to practical use. Indeed, for fundamental complexity reasons, its applicability has long b ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
The polyhedral model is a powerful framework to reason about high level loop transformations. Yet the lack of scalable algorithms and tools has deterred actors from both academia and industry to put this model to practical use. Indeed, for fundamental complexity reasons, its applicability has long been limited to simple kernels. Recent developments broke some generally accepted ideas about these limitations. In particular, new algorithms made it possible to compute the target code for full SPEC benchmarks while this code generation step was expected not to be scalable. Instancewise array dependence analysis computes a finite, intensional representation of the (statically unbounded) set of all dynamic dependences. This problem has always been considered nonscalable and/or an overkill with respect to less expressive and faster dependence tests. On the contrary, this article presents experimental evidence of its applicability to full SPEC CPU2000 benchmarks. To make this possible, we revisit the characterization of data dependences, considering relations between time dimensions of the transformed space. Beyond algorithmic benefits, this naturally leads to a novel way of reasoning about violated dependences across arbitrary transformation sequences. Reasoning about violated dependences relieves the compiler designer from the cumbersome task of implementing specific legality checks for each single transformation. It also allows, in the case of invalid transformations, to precisely determine the violated dependences that need to be corrected. Identifying these violations can in turn enable automatic correction schemes to fix an illegal transformation sequence with minimal changes.
A Unified Framework for Schedule and Storage Optimization
 IN INTERNATIONAL CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI’01
, 2001
"... We present a unified mathematical framework for analyzing the tradeoffs between parallelism and storage allocation within a parallelizing compiler. Using this framework, we show how to find a good storage mapping for a given schedule, a good schedule for a given storage mapping, and a good storage m ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
We present a unified mathematical framework for analyzing the tradeoffs between parallelism and storage allocation within a parallelizing compiler. Using this framework, we show how to find a good storage mapping for a given schedule, a good schedule for a given storage mapping, and a good storage mapping that is valid for all legal schedules. We consider storage mappings that collapse one dimension of a multidimensional array, and programs that are in a single assignment form with a onedimensional schedule. Our technique combines affine scheduling techniques with occupancy vector analysis and incorporates general affine dependences across statements and loop nests. We formulate the constraints imposed by the data dependences and storage mappings as a set of linear inequalities, and apply numerical programming techniques to efficiently solve for the shortest occupancy vector. We consider our method to be a first step towards automating a procedure that finds the optimal tradeo# between parallelism and storage space.
Memory Reuse Analysis in the Polyhedral Model
 Parallel Processing Letters
, 1996
"... In the context of developing a compiler for a Alpha, a functional dataparallel language based on systems of affine recurrence equations (SAREs), we address the problem of transforming scheduled singleassignment code to multiple assignment code. We show how the polyhedral model allows us to statical ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
In the context of developing a compiler for a Alpha, a functional dataparallel language based on systems of affine recurrence equations (SAREs), we address the problem of transforming scheduled singleassignment code to multiple assignment code. We show how the polyhedral model allows us to statically compute the lifetimes of program variables, and thus enables us to derive necessary and sufficient conditions for reusing memory. 1. Introduction The methodology of automatic systolic array synthesis from Systems of Affine Recurrence Equations (SAREs) has a close bearing on parallelizing compilers and on efficient implementation of functional languages. To study this relationship, we are currently developing a compiler for Alpha [9], a functional, data parallel language based on SAREs defined over polyhedral index domains. The language semantics directly lead to sequential code based on demand driven evaluation. However, the resulting context switches can be avoided if the program is tra...
Optimizing Storage Size for Static Control Programs in Automatic Parallelizers
 In Proc. EuroPar Conference
, 1997
"... . This article deals with automatic parallelization of static control programs. During the parallelization process the removal of artificial dependences is usually realized by translating the original program into a single assignment form. This total data expansion has a very high memory cost. We pr ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
. This article deals with automatic parallelization of static control programs. During the parallelization process the removal of artificial dependences is usually realized by translating the original program into a single assignment form. This total data expansion has a very high memory cost. We present a technique of partial data expansion which leaves untouched the performances of the parallelization process, with the help of algebra techniques given by the polytope model. 1 Introduction This article deals with the automatic parallelization technique based on the polytope model. This method can be applied provided that source programs are static control programs, i.e. are limited to do loops and assignment statements to array with affine subscripts. The first step is an array data flow analysis in order to extract exact dependences on memory cells. All artificial dependences, which are due to reuse of data, are deleted by a total data expansion. The transformed program has the sing...
Plugging anti and output dependence removal techniques into loop parallelization algorithm
, 1997
"... In this paper we shortly survey some loop transformation techniques which break anti or output dependences, or artificial cycles involving such ‘false’ dependences. These false dependences are removed through the introduction of temporary buffer arrays. Next we show how to plug these techniques into ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
In this paper we shortly survey some loop transformation techniques which break anti or output dependences, or artificial cycles involving such ‘false’ dependences. These false dependences are removed through the introduction of temporary buffer arrays. Next we show how to plug these techniques into loop parallelization algorithms (such as Allen and Kennedy’s algorithm). The goal is to extract as many parallel loops as the intrinsic degree of parallelism of the nest authorizes, while avoiding a full memory expansion. We try to reduce the number of temporary arrays that we introduce, as well as their dimension.
Optimization of Storage Mappings for Parallel Programs
 In EuroPar’99, number 1685 in LNCS
, 1998
"... this paper are the following: ..."
Parallelization via Constrained Storage Mapping Optimization
 Lecture Notes in Computer Science
, 1999
"... . When parallelizing an imperative program, a key problem is to find the good tradeoff between memory expansion and parallelism. Increasing performance of parallelizing compilers thus relies on a difficult multicriteria optimization problem. This paper is a first step in solving this problem: An in ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
. When parallelizing an imperative program, a key problem is to find the good tradeoff between memory expansion and parallelism. Increasing performance of parallelizing compilers thus relies on a difficult multicriteria optimization problem. This paper is a first step in solving this problem: An integrated framework for parallel execution order and storage mapping computation is designed, allowing simultaneous time and space optimization of parallel programs. The use of constrained expansion providing a mathematical way to model expansion strategiesis shown to be very useful in this context. 1 Introduction Data dependences are known to hamper automatic parallelization of imperative programs and their efficient compilation on modern processors or supercomputers. A general method to reduce the number of memorybased dependences is to disambiguate memory accesses in assigning distinct memory locations to nonconflicting writes, i.e. to expand data structures. In parallel processing,...
Data Flow Analysis of Recursive Structures
, 1996
"... . Most imperative languages only offer arrays as "firstclass" data structures. Other data structures, especially recursive data structures such as trees, have to be manipulated using explicit control of memory, i.e., through pointers to explicitly allocated portions of memory. We believe that this ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
. Most imperative languages only offer arrays as "firstclass" data structures. Other data structures, especially recursive data structures such as trees, have to be manipulated using explicit control of memory, i.e., through pointers to explicitly allocated portions of memory. We believe that this severe limitation is mainly due to historical reasons, and this paper will try and demonstrate that modern analysis techniques, such as data flow analysis, allow to cope with the compilation problems associated with recursive data structures. As a matter of fact, recursion in the flow of control also is a current open issue in automatic parallelization: to our knowledge, no theory allows the parallelization of, e.g., recursive Pascal programs. This paper uniformly handles both issues. We propose a kernel language that manipulates recursive data structures in an elegant, algebraic way. In this preliminary work, both data and control recursive structures are restricted, so that a data flow a...