Results 1 - 10
of
167
Global Optimizations for Parallelism and Locality on Scalable Parallel Machines
- IN PROCEEDINGS OF THE SIGPLAN '93 CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION
, 1993
"... Data locality is critical to achieving high performance on large-scale parallel machines. Non-local data accesses result in communication that can greatly impact performance. Thus the mapping, or decomposition, of the computation and data onto the processors of a scalable parallel machine is a key i ..."
Abstract
-
Cited by 256 (20 self)
- Add to MetaCart
Data locality is critical to achieving high performance on large-scale parallel machines. Non-local data accesses result in communication that can greatly impact performance. Thus the mapping, or decomposition, of the computation and data onto the processors of a scalable parallel machine is a key issue in compiling programs for these architectures.
SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers
- ACM SIGPLAN Notices
, 1994
"... Compiler infrastructures that support experimental research are crucial to the advancement of high-performance computing. New compiler technology must be implemented and evaluated in the context of a complete compiler, but developing such an infrastructure requires a huge investment in time and reso ..."
Abstract
-
Cited by 247 (22 self)
- Add to MetaCart
(Show Context)
Compiler infrastructures that support experimental research are crucial to the advancement of high-performance computing. New compiler technology must be implemented and evaluated in the context of a complete compiler, but developing such an infrastructure requires a huge investment in time and resources. We have spent a number of years building the SUIF compiler into a powerful, flexible system, and we would now like to share the results of our efforts. SUIF consists of a small, clearly documented kernel and a toolkit of compiler passes built on top of the kernel. The kernel defines the intermediate representation, provides functions to access and manipulate the intermediate representation, and structures the interface between compiler passes. The toolkit currently includes C and Fortran front ends, a loop-level parallelism and locality optimizer, an optimizing MIPS back end, a set of compiler development tools, and support for instructional use. Although we do not expect SUIF to be suitable for everyone, we think it may be useful for many other researchers. We thus invite you to use SUIF and welcome your contributions to this infrastructure. Directions for obtaining the SUIF software are included at the end of this paper. 1
Supporting Dynamic Data Structures on Distributed-Memory Machines
, 1995
"... this article, we describe an execution model for supporting programs that use pointer-based dynamic data structures. This model uses a simple mechanism for migrating a thread of control based on the layout of heap-allocated data and introduces parallelism using a technique based on futures and lazy ..."
Abstract
-
Cited by 166 (8 self)
- Add to MetaCart
this article, we describe an execution model for supporting programs that use pointer-based dynamic data structures. This model uses a simple mechanism for migrating a thread of control based on the layout of heap-allocated data and introduces parallelism using a technique based on futures and lazy task creation. We intend to exploit this execution model using compiler analyses and automatic parallelization techniques. We have implemented a prototype system, which we call Olden, that runs on the Intel iPSC/860 and the Thinking Machines CM-5. We discuss our implementation and report on experiments with five benchmarks.
The PARADIGM Compiler for Distributed-Memory Message Passing Multicomputers
- IEEE Computer
, 1994
"... The PARADIGM compiler project provides an automated means to parallelize programs, written in a serial programming model, for efficient execution on distributed-memory multicomputers. In addition to performing traditional compiler optimizations, PARADIGM is unique in that it addresses many other is ..."
Abstract
-
Cited by 112 (9 self)
- Add to MetaCart
The PARADIGM compiler project provides an automated means to parallelize programs, written in a serial programming model, for efficient execution on distributed-memory multicomputers. In addition to performing traditional compiler optimizations, PARADIGM is unique in that it addresses many other issues within a unified platform: automatic data distribution, synthesis of high-level communication, communication optimizations, irregular computations, functional and data parallelism, and multithreaded execution. This paper describes the techniques used and provides experimental evidence of their effectiveness. 1 Introduction Distributed-memory massively parallel multicomputers can provide the high levels of performance required to solve the Grand Challenge computational science problems [16]. Distributed-memory multicomputers such as the Intel iPSC/860, the Intel Paragon, the IBM SP-1 and the Thinking Machines CM-5 offer significant advantages over shared-memory multiprocessors in terms...
Compiler Optimizations for Eliminating Barrier Synchronization
- In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, 1995
"... This paper presents novel compiler optimizations for reducing synchronization overhead in compiler-parallelized scientific codes. A hybrid programming model is employed to combine the flexibility of the fork-join model with the precision and power of the singleprogram, multiple data (SPMD) model. By ..."
Abstract
-
Cited by 91 (13 self)
- Add to MetaCart
(Show Context)
This paper presents novel compiler optimizations for reducing synchronization overhead in compiler-parallelized scientific codes. A hybrid programming model is employed to combine the flexibility of the fork-join model with the precision and power of the singleprogram, multiple data (SPMD) model. By exploiting compiletime computation partitions, communication analysis can eliminate barrier synchronization or replace it with less expensive forms of synchronization. We show computation partitions and data communication can be represented as systems of symbolic linear inequalities for high flexibility and precision. These optimizations has been implemented in the Stanford SUIF compiler. We extensively evaluate their performance using standard benchmark suites. Experimental results show barrier synchronization is reduced 29% on averageand by several orders of magnitude for certain programs. 1 Introduction Parallel machines with shared address spaces and coherent caches provide an attracti...
An Exact Method for Analysis of Value-based Array Data Dependences
- In Sixth Annual Workshop on Programming Languages and Compilers for Parallel Computing
, 1993
"... Standard array data dependence testing algorithms give information about the aliasing of array references. If statement 1 writes a[5], and statement 2 later reads a[5], standard techniques described this as a flow dependence, even if there was an intervening write. We call a dependence between two ..."
Abstract
-
Cited by 88 (14 self)
- Add to MetaCart
(Show Context)
Standard array data dependence testing algorithms give information about the aliasing of array references. If statement 1 writes a[5], and statement 2 later reads a[5], standard techniques described this as a flow dependence, even if there was an intervening write. We call a dependence between two references to the same memory location a memory-based dependence. In contrast, if there are no intervening writes, the references touch the same value and we call the dependence a value-based dependence. There has been a surge of recent work on value-based array data dependence analysis (also referred to as computation of array data-flow dependence information). In this paper, we describe a technique that is exact over programs without control flow (other than loops) and non-linear references. We compare our proposal with the technique proposed by Paul Feautrier, which is the other technique that is complete over the same domain as ours. We also compare our work with that of Tu and Padua, a ...
A Linear Algebra Framework for Static HPF Code Distribution
, 1995
"... High Performance Fortran (hpf) was developed to support data parallel programming for simd and mimd machines with distributed memory. The programmer is provided a familiar uniform logical address space and specifies the data distribution by directives. The compiler then exploits these directives to ..."
Abstract
-
Cited by 81 (11 self)
- Add to MetaCart
(Show Context)
High Performance Fortran (hpf) was developed to support data parallel programming for simd and mimd machines with distributed memory. The programmer is provided a familiar uniform logical address space and specifies the data distribution by directives. The compiler then exploits these directives to allocate arrays in the local memories, to assign computations to elementary processors and to migrate data between processors when required. We show here that linear algebra is a powerful framework to encode Hpf directives and to synthesize distributed code with space-efficient array allocation, tight loop bounds and vectorized communications for INDEPENDENT loops. The generated code includes traditional optimizations such as guard elimination, message vectorization and aggregation, overlap analysis... The systematic use of an affine framework makes it possible to prove the compilation scheme correct. An early version of this paper was presented at the Fourth International Workshop on Comp...
Dynamic feedback: an effective technique for adaptive computing
- PLDI ’97: Proceedings of the ACM SIGPLAN
, 1997
"... This paper presents dynamic feedback, a technique that enables computations to adapt dynamically to different execution environ-ments. A compiler that uses dynamic feedback produces several different versions of the same source code; each version uses a dif-ferent optimization policy. The generated ..."
Abstract
-
Cited by 71 (6 self)
- Add to MetaCart
(Show Context)
This paper presents dynamic feedback, a technique that enables computations to adapt dynamically to different execution environ-ments. A compiler that uses dynamic feedback produces several different versions of the same source code; each version uses a dif-ferent optimization policy. The generated code alternately performs sampling phases and production phases. Each sampling phase mea-sures the overhead of each version in the current environment. Each production phase uses the version with the least overhead in the pre-vious sampling phase. The computation periodically resamples to adjust dynamically to changes in the environment. We have implemented dynamic feedback in the context of a par-allelizing compiler for object-based programs. The generated code uses dynamic feedback to automatically choose the best synchro-nization optimization policy. Our experimental results show that the synchronization optimization policy has a significant impact on the overall performance of the computation, that the best policy varies from program to program, that the compiler is unable to stat-ically choose the best policy, and that dynamic feedback enables the generated code to exhibit performance that is comparable to that of code that has been manually tuned to use the best policy. We have also performed a theoretical analysis which provides, under certain assumptions, a guaranteed optimality bound for dynamic feedback relative to a hypothetical (and unrealizable) optimal algorithm that uses the best policy at every point during the execution. 1
Evaluating Compiler Optimizations For Fortran D
, 1994
"... The Fortran D compiler uses data decomposition specifications to automatically translate Fortran programs for execution on MIMD distributed-memory machines. This paper introduces and classifies a number of advanced optimizations needed to achieve acceptable performance; they are analyzed and empiric ..."
Abstract
-
Cited by 69 (4 self)
- Add to MetaCart
The Fortran D compiler uses data decomposition specifications to automatically translate Fortran programs for execution on MIMD distributed-memory machines. This paper introduces and classifies a number of advanced optimizations needed to achieve acceptable performance; they are analyzed and empirically evaluated for stencil computations. Communication optimizations reduce communication overhead by decreasing the number of messages and hide communication overhead by overlapping the cost of remaining messages with local computation. Parallelism optimizations exploit parallel and pipelined computations, and may need to restructure the computation to increase parallelism. Profitability formulas are derived for each optimization. Empirical results show that exploiting parallelism for pipelined computations, reductions, and scans is vital. Message vectorization, collective communication, and efficient coarse-grain pipelining also significantly affect performance. Scalability of communicatio...
Using Integer Sets for Data-Parallel Program Analysis and Optimization
- In Proceedings of the SIGPLAN '98 Conference on Programming Language Design and Implementation
, 1998
"... In this paper, we describe our experience with using an abstract integer-set framework to develop the Rice dHPF compiler, a compiler for High Performance Fortran. We present simple, yet general formulations of the major computation partitioning and communication analysis tasks as well as a number of ..."
Abstract
-
Cited by 58 (29 self)
- Add to MetaCart
(Show Context)
In this paper, we describe our experience with using an abstract integer-set framework to develop the Rice dHPF compiler, a compiler for High Performance Fortran. We present simple, yet general formulations of the major computation partitioning and communication analysis tasks as well as a number of important optimizations in terms of abstract operations on sets of integer tuples. This approach has made it possible to implement a comprehensive collection of advanced optimizations in dHPF, and to do so in the context of a more general computation partitioning model than previous compilers. One potential limitation of the approach is that the underlying class of integer set problems is fundamentally unable to represent HPF data distributions on a symbolic number of processors. We describe how we extend the approach to compile codes for a symbolic number of processors, without requiring any changes to the set formulations for the above optimizations. We show experimentally that the set re...