Results 11  20
of
250
Efficient and precise array access analysis
 ACM Trans. Program. Lang. Syst
, 2000
"... A number of existing compiler techniques hinge on the analysis of array accesses in a program. The most important task in array access analysis is to collect the information about array accesses of interest and summarize it in some standard form. Traditional forms used in array access analysis are s ..."
Abstract

Cited by 34 (1 self)
 Add to MetaCart
A number of existing compiler techniques hinge on the analysis of array accesses in a program. The most important task in array access analysis is to collect the information about array accesses of interest and summarize it in some standard form. Traditional forms used in array access analysis are sensitive to the complexity of array subscripts; that is, they are usually quite accurate and efficient for simple array subscripting expressions, but lose accuracy or require potentially expensive algorithms for complex subscripts. Our study has revealed that in many programs, particularly numerical applications, many access patterns are simple in nature even when the subscripting expressions are complex. Based on this analysis, we have developed a new, general array region representational form, called the linear memory access descriptor (LMAD). The key idea of the LMAD is to relate all memory accesses to the linear machine memory rather than to the shape of the logical data structures of a programming language. This form helps us expose the simplicity of the actual patterns of array accesses in memory, which may be hidden by complex array subscript expressions. Our recent experimental studies show that our new representation simplifies array
Optimal Fine and Medium Grain Parallelism Detection in Polyhedral Reduced Dependence Graphs
, 1996
"... This papcr presents an optimal algorithm lor detecting line or medium grain parallelism in nested loops whose dependences are described by an approximation of distance vectors by polyhedra. In particular, this algorithm is optimal for the classical approximation by direction sectors. This result gcn ..."
Abstract

Cited by 31 (5 self)
 Add to MetaCart
(Show Context)
This papcr presents an optimal algorithm lor detecting line or medium grain parallelism in nested loops whose dependences are described by an approximation of distance vectors by polyhedra. In particular, this algorithm is optimal for the classical approximation by direction sectors. This result gcncruli/es. to the case of several statements. Wolf and Lam's algorithm which is optimal for a single statement. Our algorithm relies on a dependence uniformi/ation process and on paralleli/ation techniques related to system of uniform recurrence equations. It can also be viewed as a combination of both Allen and Kennedy's algorithm and Wolf and Lam's algorithm.
Accurate Analysis of Array References
, 1992
"... ii I certify that I have read this thesis and that in my opinion it is fully adequate, in scope and in quality, as a dissertation for the degree of Doctor of Philosophy. John L. Hennessy(Principal Adviser) I certify that I have read this thesis and that in my opinion it is fully adequate, in scope a ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
(Show Context)
ii I certify that I have read this thesis and that in my opinion it is fully adequate, in scope and in quality, as a dissertation for the degree of Doctor of Philosophy. John L. Hennessy(Principal Adviser) I certify that I have read this thesis and that in my opinion it is fully adequate, in scope and in quality, as a dissertation for the degree of Doctor of Philosophy.
Going beyond Integer Programming with the Omega Test to Eliminate False Data Dependences
 IEEE Transactions on Parallel and Distributed Systems
, 1992
"... Array data dependence analysis methods currently in use generate false dependences that can prevent useful program transformations. These false dependences arise because the questions asked are conservative approximations to the questions we really should be asking. Unfortunately, the questions we r ..."
Abstract

Cited by 30 (11 self)
 Add to MetaCart
(Show Context)
Array data dependence analysis methods currently in use generate false dependences that can prevent useful program transformations. These false dependences arise because the questions asked are conservative approximations to the questions we really should be asking. Unfortunately, the questions we really should be asking go beyond integer programming and require decision procedures for a subclass of Presburger formulas. In this paper, we describe how to extend the Omega test so that it can answer these queries and allow us to eliminate these false data dependences. We have implemented the techniques described here and believe they are suitable for use in production compilers.
Lazy Array DataFlow Dependence Analysis
 In Proceedings of Annual ACM Symposium on Principles of Programming Languages
, 1994
"... Automatic parallelization of real FORTRAN programs does not live up to users expectations yet, and dependence analysis algorithms which either produce too many false dependences or are too slow contribute significantly to this. In this paper we introduce dataflow dependence analysis algorithm which ..."
Abstract

Cited by 29 (2 self)
 Add to MetaCart
Automatic parallelization of real FORTRAN programs does not live up to users expectations yet, and dependence analysis algorithms which either produce too many false dependences or are too slow contribute significantly to this. In this paper we introduce dataflow dependence analysis algorithm which exactly computes valuebased dependence relations for program fragments in which all subscripts, loop bounds and IF conditions are affine. Our algorithm also computes good affine approximations of dependence relations for nonaffine program fragments. Actually, we do not know about any other algorithm which can compute better approximations. And our algorithm is efficient too, because it is lazy. When searching for write statements that supply values used by a given read statement, it starts with statements which are lexicographically close to the read statement in iteration space. Then if some of the read statement instances are not "satisfied" with these close writes, the algorithm broade...
PLuTo: A practical and fully automatic polyhedral program optimization system
 IN: PROCEEDINGS OF THE ACM SIGPLAN 2008 CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI 08
, 2008
"... We present the design and implementation of a fully automatic polyhedral sourcetosource transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we show the practicality of analytica ..."
Abstract

Cited by 27 (7 self)
 Add to MetaCart
(Show Context)
We present the design and implementation of a fully automatic polyhedral sourcetosource transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we show the practicality of analytical modeldriven automatic transformation in the polyhedral model – far beyond what is possible by current production compilers. Unlike previous works, our approach is an endtoend fully automatic one driven by an integer linear optimization framework that takes an explicit view of finding good ways of tiling for parallelism and locality using affine transformations. We also address generation of tiled code for multiple statement domains of arbitrary dimensionalities under (statementwise) affine transformations – an issue that has not been addressed previously. Experimental results from the implemented system show very high speedups for local and parallel execution on multicores over stateoftheart compiler frameworks from the research community as well as the best native compilers. The system also enables the easy use of powerful empirical/iterative optimization for general arbitrarily nested loop sequences.
Loop parallelization algorithms: from parallelism extraction to code generation
 Compiler Optimizations for Scalable Parallel Systems Languages
, 2001
"... ..."
(Show Context)
A Unified Framework for Schedule and Storage Optimization
 IN INTERNATIONAL CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI’01
, 2001
"... We present a unified mathematical framework for analyzing the tradeoffs between parallelism and storage allocation within a parallelizing compiler. Using this framework, we show how to find a good storage mapping for a given schedule, a good schedule for a given storage mapping, and a good storage m ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
(Show Context)
We present a unified mathematical framework for analyzing the tradeoffs between parallelism and storage allocation within a parallelizing compiler. Using this framework, we show how to find a good storage mapping for a given schedule, a good schedule for a given storage mapping, and a good storage mapping that is valid for all legal schedules. We consider storage mappings that collapse one dimension of a multidimensional array, and programs that are in a single assignment form with a onedimensional schedule. Our technique combines affine scheduling techniques with occupancy vector analysis and incorporates general affine dependences across statements and loop nests. We formulate the constraints imposed by the data dependences and storage mappings as a set of linear inequalities, and apply numerical programming techniques to efficiently solve for the shortest occupancy vector. We consider our method to be a first step towards automating a procedure that finds the optimal tradeo# between parallelism and storage space.
Automatic Transformations for CommunicationMinimized Parallelization and Locality Optimization in the Polyhedral Model
"... Abstract. Many compute intensive applications spend a significant fraction of their time in nested loops. The polyhedral model provides powerful abstractions to optimize loop nests with regular accesses for parallel execution. Affine transformations in this model capture a complex sequence of execut ..."
Abstract

Cited by 24 (10 self)
 Add to MetaCart
(Show Context)
Abstract. Many compute intensive applications spend a significant fraction of their time in nested loops. The polyhedral model provides powerful abstractions to optimize loop nests with regular accesses for parallel execution. Affine transformations in this model capture a complex sequence of executionreordering loop transformations that can improve performance by parallelization as well as locality enhancement. Although a significant amount of research has addressed affine scheduling and partitioning, the problem of automatically finding good affine transforms for communicationoptimized coarsegrained parallelization along with locality optimization for the general case of arbitrarilynested loop sequences remains a challenging problem. In this paper, we propose an automatic transformation framework to optimize arbitrarilynested loop sequences with affine dependences for parallelism and locality simultaneously. The approach finds good tiling hyperplanes by embedding a powerful and versatile cost function into an Integer Linear Programming formulation. These tiling hyperplanes are used for communicationminimized coarsegrained parallelization as well as locality optimization. It enables the minimization of intertile communication volume in the processor space, and minimization of reuse distances for local execution at each node. Programs requiring onedimensional versus multidimensional time schedules (with schedulingbased approaches) are all handled with the same algorithm. Synchronizationfree parallelism, permutable loops or pipelined parallelism at various levels can be detected. Preliminary results from the implemented framework show promising performance and scalability with input size. 1
Decoupling algorithms from schedules for easy . . .
"... Using existing programming tools, writing highperformance image processing code requires sacrificing readability, portability, and modularity. We argue that this is a consequence of conflating what computations define the algorithm, with decisions about storage and the order of computation. We refe ..."
Abstract

Cited by 24 (8 self)
 Add to MetaCart
Using existing programming tools, writing highperformance image processing code requires sacrificing readability, portability, and modularity. We argue that this is a consequence of conflating what computations define the algorithm, with decisions about storage and the order of computation. We refer to these latter two concerns as the schedule, including choices of tiling, fusion, recomputation vs. storage, vectorization, and parallelism. We propose a representation for feedforward imaging pipelines that separates the algorithm from its schedule, enabling highperformance without sacrificing code clarity. This decoupling simplifies the algorithm specification: images and intermediate buffers become functions over an infinite integer domain, with no explicit storage or boundary conditions. Imaging pipelines are compositions of functions. Programmers separately specify scheduling strategies for the various functions composing the algorithm, which allows them to efficiently explore different optimizations without changing the algorithmic code. We demonstrate the power of this representation by expressing a range of recent image processing applications in an embedded domain specific language called Halide, and compiling them for ARM, x86, and GPUs. Our compiler targets SIMD units, multiple cores, and complex memory hierarchies. We demonstrate that it can handle algorithms such as a camera raw pipeline, the bilateral grid, fast local Laplacian filtering, and image segmentation. The algorithms expressed in our language are both shorter and faster than stateoftheart implementations.