Results 1 -
5 of
5
Implementation of Fourier-Motzkin Elimination
, 1994
"... Every transformation of a perfectly nested loop consisting of a combination of loop interchanging, loop skewing and loop reversal can be modeled by a linear transformation represented by a unimodular matrix. This modeling offers more flexibility than the traditional step-wise application of loop tra ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Every transformation of a perfectly nested loop consisting of a combination of loop interchanging, loop skewing and loop reversal can be modeled by a linear transformation represented by a unimodular matrix. This modeling offers more flexibility than the traditional step-wise application of loop transformations because we can directly construct a unimodular matrix for a particular goal. In this paper, we present implementation issues arising when this framework is incorporated in a compiler. 1 Introduction Inherent to the application of program transformations in an optimizing or restructuring compiler is the so-called `phase ordering problem', i.e. the problem of finding an effective order in which particular transformations must be applied. This problem is still an important research topic [WS90]. An important step forwards in solving the phase ordering problem has been accomplished by the observation that any combination of the iteration-level loop transformations loop interchangin...
Controlling Application Grain Size on a Network of Workstations
- Proceedings of Supercomputing ’95, ACM/IEEE
, 1995
"... An important challenge in the area of distributed computing is to automate the selection of the parameters that control the distributed computation. A performance-critical parameter is the grain size of the computation, i.e., the interval between successive synchronization points in the application. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
An important challenge in the area of distributed computing is to automate the selection of the parameters that control the distributed computation. A performance-critical parameter is the grain size of the computation, i.e., the interval between successive synchronization points in the application. This parameter is hard to select since it depends both on compile time (loop structure and data dependences, computational complexity) and run time components (speed of compute nodes and network). On networks of workstations that are shared with other users, the run-time parameters can change over time. As a result, it is also necessary to consider the interactions with dynamic load balancing, which is needed to achieve good performance in this environment. In this paper we present a method for automatically selecting the grain size of the computation consisting of nested DO loops. The method is based on close cooperation between the compiler and the runtime system. We evaluate the method u...
PLASMA: Portable Programming for SIMD Heterogeneous Accelerators
"... Abstract—Data-parallel accelerators have emerged as highperformance alternatives to general-purpose processors for many applications. The Cell BE, GPUs from NVIDIA and ATI, and the like can outperform conventional superscalar architectures, but only for applications that can take advantage of these ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—Data-parallel accelerators have emerged as highperformance alternatives to general-purpose processors for many applications. The Cell BE, GPUs from NVIDIA and ATI, and the like can outperform conventional superscalar architectures, but only for applications that can take advantage of these accelerators’ SIMD architectures, large number of cores, and local memories. Coupled with the SIMD extensions on general-purpose processors, these heterogeneous computing architectures provide a powerful platform to accelerate data-parallel programs. Unfortunately, each accelerator provides its own programming model, and programmers are often forced to confront issues of distributed memory, multithreading, load-balancing and computation scheduling. This necessitates a framework which can exploit different types of parallelism across heterogeneous functional units and supports multiple types of high-level programming languages including stream programming or traditional shared or distributed memory programming framework or prototyping languages such as MATLAB. Towards this goal, in this paper, we present PLASMA, a programming framework that enables the writing of portable SIMD programs. The main component of PLASMA is an intermediate representation (IR), which provides succinct and clean abstractions to enable programs to be compiled to different accelerators. With the assistance of a runtime, these programs can then be automatically multithreaded, run on multiple heterogeneous accelerators transparently and are oblivious of distributed memory. We demonstrate a prototype compiler and runtime that targets PLASMA programs to scalar processors, processors with SIMD extensions and GPUs. I.
Automatic Generation of Parallel Programs with Dynamic Load Balancing for a Network of Workstations Bruce S. Siegell
- In Proceedings of the Third International Symposium on High-Performance Distributed Computing
, 1994
"... Because of their high availability and relatively low cost, networks of workstations are now often considered as platforms for applications that used to be relegated to dedicated multiprocessors. Parallelizing compilers have simplified the programming of shared and distributed memory multiprocessors ..."
Abstract
- Add to MetaCart
Because of their high availability and relatively low cost, networks of workstations are now often considered as platforms for applications that used to be relegated to dedicated multiprocessors. Parallelizing compilers have simplified the programming of shared and distributed memory multiprocessors. However, with networks of workstations, which are more loosely coupled, additional problems of heterogeneity, varying resource availability, and higher communication costs must be addressed in order to maximize utilization of system resources. Computational capabilities may vary with time due to other applications competing for resources, so dynamic load balancing is very important. Our research explores issues in retargeting a parallelizing compiler for a network of workstations. In this dissertation, we describe a system that supports dynamic load balancing of distributed applications consisting of parallelized DOALL and DOACROSS loops. We outline the added compiler functionality needed to generate parallel programs with dynamic load balancing and demonstrate how parameters for dynamic load balancing can be selected and controlled automatically at run time with cooperation between the compiler and runtime system. We have implemented a prototype runtime system on the Nectar system at Carnegie Mellon University and have evaluated its performance using hand-parallelized applications running in various environments. Key performance parameters under our control include the grain size of the application, the frequency of load balancing, and the amount and frequency of work movement. The optimal grain size is selected based on computation and communication costs of the application on the particular system on which it is run. Selecting an appropriate load balancing frequency re...
Dynamically Reconfigurable Architecture for a Class of Real-Time Applications
, 1992
"... This report (thesis) presents an architectural design methodology for computing systems suitable for a class of real-time applications, characterized by a large volume of periodic real-time data input at a high rate and vector operations on the real-time data. The proposed methodology incorporates i ..."
Abstract
- Add to MetaCart
This report (thesis) presents an architectural design methodology for computing systems suitable for a class of real-time applications, characterized by a large volume of periodic real-time data input at a high rate and vector operations on the real-time data. The proposed methodology incorporates into the architectural design the notion of resource sharing as well as techniques for satisfying timing requirements.

