Results 1 - 10
of
78
A General Data Dependence Test for Dynamic, Pointer-Based Data Structures
- In Proc. ACM PLDI
, 1994
"... Optimizing compilers require accurate dependence testing to enable numerous, performance-enhancing transformations. However, data dependence testing is a difficult problem, particularly in the presence of pointers. Though existing approaches work well for pointers to named memory locations (i.e. oth ..."
Abstract
-
Cited by 69 (2 self)
- Add to MetaCart
Optimizing compilers require accurate dependence testing to enable numerous, performance-enhancing transformations. However, data dependence testing is a difficult problem, particularly in the presence of pointers. Though existing approaches work well for pointers to named memory locations (i.e. other variables) , they are overly conservative in the case of pointers to unnamed memory locations. The latter occurs in the context of dynamic, pointer-based data structures, used in a variety of applications ranging from system software to computational geometry to N-body and circuit simulations. In this paper we present a new technique for performing more accurate data dependence testing in the presence of dynamic, pointer-based data structures. We will demonstrate its effectiveness by breaking false dependences that existing approaches cannot, and provide results which show that removing these dependences enables significant parallelization of a real application. 1 Introduction and Motiv...
Nonlinear Array Layouts for Hierarchical Memory Systems
, 1999
"... Programming languages that provide multidimensional arrays and a flat linear model of memory must implement a mapping between these two domains to order array elements in memory. This layout function is fixed at language definition time and constitutes an invisible, non-programmable array attribute. ..."
Abstract
-
Cited by 67 (4 self)
- Add to MetaCart
Programming languages that provide multidimensional arrays and a flat linear model of memory must implement a mapping between these two domains to order array elements in memory. This layout function is fixed at language definition time and constitutes an invisible, non-programmable array attribute. In reality, modern memory systems are architecturally hierarchical rather than flat, with substantial differences in performance among different levels of the hierarchy. This mismatch between the model and the true architecture of memory systems can result in low locality of reference and poor performance. Some of this loss in performance can be recovered by re-ordering computations using transformations such as loop tiling. We explore nonlinear array layout functions as an additional means of improving locality of reference. For a benchmark suite composed of dense matrix kernels, we show by timing and simulation that two specific layouts (4D and Morton) have low implementation costs (2--5% of total running time) and high performance benefits (reducing execution time by factors of 1.1-2.5); that they have smooth performance curves, both across a wide range of problem sizes and over representative cache architectures; and that recursion-based control structures may be needed to fully exploit their potential.
Nonlinear Array Dependence Analysis
, 1991
"... Standard array data dependence techniques can only reason about linear constraints. There has also been work on analyzing some dependences involving polynomial constraints. Analyzing array data dependences in real-world programs requires handling many "unanalyzable" terms: subscript arrays, run-time ..."
Abstract
-
Cited by 63 (5 self)
- Add to MetaCart
Standard array data dependence techniques can only reason about linear constraints. There has also been work on analyzing some dependences involving polynomial constraints. Analyzing array data dependences in real-world programs requires handling many "unanalyzable" terms: subscript arrays, run-time tests, function calls. The standard approach to analyzing such programs has been to omit and ignore any constraints that cannot be reasoned about. This is unsound when reasoning about value-based dependences and whether privatization is legal. Also, this prevents us from determining the conditions that must be true to disprove the dependence. These conditions could be checked by a run-time test or verified by a programmer or aggressive, demand-driven interprocedural analysis. We describe a solution to these problems. Our solution makes our system sound and more accurate for analyzing value-based dependences and derives conditions that can be used to disprove dependences. We also give some p...
Generation of Efficient Nested Loops from Polyhedra
- International Journal of Parallel Programming
, 2000
"... Automatic parallelization in the polyhedral model is based on affine transformations from an original computation domain (iteration space) to a target space-time domain, often with a different transformation for each variable. Code generation is an often ignored step in this process that has a signi ..."
Abstract
-
Cited by 62 (1 self)
- Add to MetaCart
Automatic parallelization in the polyhedral model is based on affine transformations from an original computation domain (iteration space) to a target space-time domain, often with a different transformation for each variable. Code generation is an often ignored step in this process that has a significant impact on the quality of the final code. It involves making a trade-off between code size and control code simplification/optimization. Previous methods of doing code generation are based on loop splitting, however they have non-optimal behavior when working on parameterized programs. We present a general parameterized method for code generation based on dual representation of polyhedra. Our algorithm uses a simple recursion on the dimensions of the domains, and enables fine control over the tradeoff between code size and control overhead.
Background Memory Area Estimation for Multi-dimensional Signal Processing Systems
- IEEE Trans. on VLSI Systems
, 1995
"... Memory cost is responsible for a large amount of the chip and/or board area of customized video and image processing system realizations. In this paper, we present a novel technique -- founded on data-flow analysis -- which allows to address the problem of background memory size evaluation for a giv ..."
Abstract
-
Cited by 40 (17 self)
- Add to MetaCart
Memory cost is responsible for a large amount of the chip and/or board area of customized video and image processing system realizations. In this paper, we present a novel technique -- founded on data-flow analysis -- which allows to address the problem of background memory size evaluation for a given non-procedural algorithm specification, operating on multi-dimensional signals with affine indices. Most of the target applications are characterized by a huge number of signals, so a new polyhedral data-flow model operating on groups of scalar signals is proposed. These groups are obtained by a novel analytical partitioning technique, allowing to select a desired granularity, depending on the application complexity. The method incorporates a way to trade-off memory size with computational and controller complexity. 1 Introduction Speech, image and video processing applications involve a large amount of multi-dimensional signals which lead to large memory units. These result in significa...
Efficient worst case timing analysis of data caching
- In IEEE Real-Time Technology and Applications Symposium
, 1996
"... Recent progress in worst case timing analysis of programs has made it possible to perform accurate timing analysis of pipelined execution and instruction caching, which is necessary when a RISC processor is used as the target processor of a real-time system. However, there has not been much progress ..."
Abstract
-
Cited by 39 (1 self)
- Add to MetaCart
Recent progress in worst case timing analysis of programs has made it possible to perform accurate timing analysis of pipelined execution and instruction caching, which is necessary when a RISC processor is used as the target processor of a real-time system. However, there has not been much progress in worst case timing analysis of data caching. This is mainly due to load/store instructions that reference multiple memory locations such as those used to implement array and pointer-based references. These load/store instructions are called dynamic load/store instructions and most current analysis techniques take a very conservative approach to their timing analysis. In many cases, it is assumed that each of the references from a dynamic load/store instruction will miss in the cache and replace a cache block that would otherwise lead to a cache hit. This conservative approach results in severe overestimation of the worst case execution time (WCET). This paper proposes two techniques to minimize the WCET overestimation due to such load/store instructions. The first technique uses a global data flow analysis technique to reduce the number of load/store instructions that are misclassified as dynamic load/store instructions. The second technique utilizes data dependence analysis to minimize the adverse impact of dynamic load/store instructions. This paper also compares the WCET bounds of simple benchmark programs that are predicted with and without applying the proposed techniques. The results show that they significantly (up to 20%) improve the accuracy of WCET estimation especially for programs with a large number of references from dynamic load/store instructions. 1.
Advanced Compilation Techniques in the PARADIGM Compiler for Distributed-Memory Multicomputers
, 1995
"... The PARADIGM compiler project provides an automated means to parallelize programs, written in a serial programming model, for efficient execution on distributed-memory multicomputers. A previous implementation of the compiler based on the PTD representation allowed symbolic array sizes, affine loop ..."
Abstract
-
Cited by 34 (2 self)
- Add to MetaCart
The PARADIGM compiler project provides an automated means to parallelize programs, written in a serial programming model, for efficient execution on distributed-memory multicomputers. A previous implementation of the compiler based on the PTD representation allowed symbolic array sizes, affine loop bounds and array subscripts, and variable number of processors, provided that arrays were single- or multi-dimensionally block distributed. The techniques presented here extend the compiler to also accept multidimensional cyclic and block-cyclic distributions within a uniform symbolic framework. These extensions demand more sophisticated symbolic manipulation capabilities. A novel aspect of our approach is to meet this demand by interfacing PARADIGM with a powerful off-the-shelf symbolic package, Mathematica(TM). This paper describes some of the Mathematica(TM) routines that performs various transformations, shows how they are invoked and used by the compiler to overcome the new challenges, and presents experimental results for code involving cyclic and block-cyclic arrays as evidence of the feasibility of the approach.
Symbolic Array Dataflow Analysis for Array Privatization and Program Parallelization
- PROCEEDINGS OF SUPERCOMPUTING '95
, 1995
"... Array dataflow information plays an important role for successful automatic parallelization of Fortran programs. This paper proposes a powerful symbolic array dataflow analysis to support arrayprivatization and loop parallelization for programs with arbitrary control flow graphs and acyclic call gra ..."
Abstract
-
Cited by 29 (5 self)
- Add to MetaCart
Array dataflow information plays an important role for successful automatic parallelization of Fortran programs. This paper proposes a powerful symbolic array dataflow analysis to support arrayprivatization and loop parallelization for programs with arbitrary control flow graphs and acyclic call graphs. Our scheme summarizes array access information using guarded array regions and propagates such regions over a Hierarchical Supergraph (HSG). The use of guards allows us to use the information in IF conditions to sharpen the array dataflow analysis and thereby to handle difficult cases which elude other existing techniques. The guarded array regions retain the simplicity of set operations for regular array regions in common cases, and they enhance regular array regions in complicated cases by using guards to handle complex symbolic expressions and array shapes. Scalar values that appear in array subscripts and loop limits are substituted on the fly during the array information propagation, which disambiguates the symbolic values precisely for set operations. We present efficient algorithms that implement our scheme. Initial experiments of applying our analysis to Perfect Benchmarks show promising results of improved array privatization.
Dataflow-driven Memory Allocation for Multi-dimensional Signal Processing Systems
, 1994
"... Memory cost is responsible for a large amount of the chip and/or board area of customized video and image processing systems. Therefore, design support to determine the storage organization for the multi-dimensional (M-D) signals is crucial to reduce the system architecture design time, while mainta ..."
Abstract
-
Cited by 29 (12 self)
- Add to MetaCart
Memory cost is responsible for a large amount of the chip and/or board area of customized video and image processing systems. Therefore, design support to determine the storage organization for the multi-dimensional (M-D) signals is crucial to reduce the system architecture design time, while maintaining a sufficiently high design quality in terms of area and/or power. In this paper, a novel background memory allocation and assignment technique is presented. It is intended for a behavioural algorithm specification, where the procedural ordering (especially the loop ordering) of the memory related operations is not yet fully fixed. Therefore, we have introduced a optimisation method driven by data-flow analysis, instead of the more restricted scheduling-based exploration in current approaches -- which start from a procedurally interpreted specification in terms of loops. Our novel allocation approach can process non-procedural functional descriptions, containing conditions (data-depen...

