Results 1 - 10
of
13
Automatic and Interactive Parallelization
, 1994
"... The goal of this dissertation is to give programmers the ability to achieve high performance by focusing on developing parallel algorithms, rather than on architecturespecific details. The advantages of this approach also include program portability and legibility. To achieve high performance, we pr ..."
Abstract
-
Cited by 38 (8 self)
- Add to MetaCart
The goal of this dissertation is to give programmers the ability to achieve high performance by focusing on developing parallel algorithms, rather than on architecturespecific details. The advantages of this approach also include program portability and legibility. To achieve high performance, we provide automatic compilation techniques that tailor parallel algorithms to shared-memory multiprocessors with local caches and a common bus. In particular, the compiler maps complete applications onto the specifics of a machine, exploiting both parallelism and memory. To optimize complete applications, we develop novel, general algorithms to transform loops that contain arbitrary conditional control flow. In addition, we provide new interprocedural transformations which enable optimization across procedure boundaries. These techniques provide the basis for a robust automatic parallelizing algorithm that is applicable to complete programs. The algorithm for automatic parallel code generation t...
Optimising the Parallel Behaviour of Combinations of Program Components
, 1995
"... The skeleton approach to programming parallel machines promises to offer a high-level of abstraction to the programmer, whilst providing the implementation with sufficient information to effectively manage the resources available. Each skeleton captures a common pattern of computation and has associ ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
The skeleton approach to programming parallel machines promises to offer a high-level of abstraction to the programmer, whilst providing the implementation with sufficient information to effectively manage the resources available. Each skeleton captures a common pattern of computation and has associated with it parallel implementations. Functional programming languages are a suitable framework for exploring this approach as skeletons can be elegantly represented as higher-order functions. Applications are naturally expressed as combinations of several skeletons. This thesis explores the problem of optimising combinations of skeletons, where each skeleton may have more than one underlying parallel implementation. A skeleton approach suitable for expressing programs as combinations of skeletons is presented. The primitive skeletons of this approach are operators of parallel abstract data types. Skeletons are combined together using a set of combining skeletons which abstract patterns of...
The Impact of Data Communication and Control Synchronization on Coarse-Grain Task Parallelism
- In Second Annual Conf. of ASCI
, 1996
"... Research into automatic extraction of instruction-level parallelism and data parallelism from sequential languages by compilers has been going on for many years. However, task parallelism has been almost unexploited by parallelizing compilers. It has been shown that coarse-grain task parallelism is ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
Research into automatic extraction of instruction-level parallelism and data parallelism from sequential languages by compilers has been going on for many years. However, task parallelism has been almost unexploited by parallelizing compilers. It has been shown that coarse-grain task parallelism is a useful additional resource of parallelism for multiprocessors, but the simple and restricted execution models of the automatic compilers have resulted into poor performance figures. This paper presents experimental results used to evaluate the available coarse-grain (procedure based) task parallelism in a set of C benchmarks assuming different machine models, ranging from very basic to extreme complex. The experiments show large amounts of available parallelism for machineswhich support both fast data communicationand complex control synchronization. 1 Introduction The automatic exploitation of parallelism from sequential programs has been focused on instruction-level parallelism and data...
Program Dependence Graphs for the Rest of Us
, 1993
"... This report presents new control dependence analysis techniques that succeed in constructing a control dependence graph (CDG) in all of the common cases without requiring either the control flow graph or the auxiliary structures needed by the fully general algorithm. In the worst case, the intermedi ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This report presents new control dependence analysis techniques that succeed in constructing a control dependence graph (CDG) in all of the common cases without requiring either the control flow graph or the auxiliary structures needed by the fully general algorithm. In the worst case, the intermediate structures built by our algorithms are used to derive a simplified form of the control flow graph that is then used by the general algorithm. In this eventuality, the general algorithm may run more quickly, since a portion of its analysis has already been performed. The report also presents an adaptation of Tarjan's interval analysis algorithm for data flow analysis that uses the control dependence graph instead of the control flow graph. Using the CDG-based analysis algorithm allows us to construct a program dependence graph (PDG) without first constructing a control flow graph. This approach offers several advantages over the conventional models. It eliminates a complete program repres...
Interprocedural Analyses of Fortran Programs
- Parallel Computing
, 1997
"... Interprocedural analyses (IPA) are becoming more and more common in commercial compilers. But research on the analysis of Fortran programs is still going on, as a number of problems are not yet satisfactorily solved and others are emerging with new language dialects. This paper presents a survey ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Interprocedural analyses (IPA) are becoming more and more common in commercial compilers. But research on the analysis of Fortran programs is still going on, as a number of problems are not yet satisfactorily solved and others are emerging with new language dialects. This paper presents a survey of the main interprocedural analysis techniques, with an emphasis on the suitability of the analysis framework for the characteristics of the original semantic problem. Our experience with the pips interprocedural compiler workbench is then described. pips includes a make-like mechanism, PipsMake, which takes care of the interleavings between top-down and bottom-up analyses and allows a quick prototyping of new interprocedural analyses. Intensive summarization is used to reduce storage requirements and achieve reasonable analysis times when dealing with real-life applications. The speed/accuracy tradeoffs made for pips are discussed in the light of other interprocedural tools. Key ...
A Parallel Functional Language Compiler for Message-Passing Multicomputers
, 1998
"... The research presented in this thesis is about the design and implementation of Naira, a parallel, parallelising compiler for a rich, purely functional programming language. The source language of the compiler is a subset of Haskell 1.2. The front end of Naira is written entirely in the Haskell subs ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
The research presented in this thesis is about the design and implementation of Naira, a parallel, parallelising compiler for a rich, purely functional programming language. The source language of the compiler is a subset of Haskell 1.2. The front end of Naira is written entirely in the Haskell subset being compiled. Naira has been successfully parallelised and it is the largest successfully parallelised Haskell program having achieved good absolute speedups on a network of SUN workstations. Having the same basic structure as other production compilers of functional languages, Naira's parallelisation technology should carry forward to other functional language compilers. The back end of Naira is written in C and generates parallel code in the C language which is envisioned to be run on distributed-memory machines. The code generator is based on a novel compilation scheme specified using a restricted form of Milner's ß-calculus which achieves asynchronous communication. We present the f...
Tests des D'ependances et Transformations de Programme
, 1993
"... The parallelization of sequential programs requires several stages : analysis of dependence relations, representation of these dependences and application of transformations using this representation to find a parallel schedule for the program instructions. The success of parallelization depends on ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
The parallelization of sequential programs requires several stages : analysis of dependence relations, representation of these dependences and application of transformations using this representation to find a parallel schedule for the program instructions. The success of parallelization depends on the precision of the dependences test and dependence representation used. In this thesis, we present and compare different dependence test algorithms and different data dependence abstractions. The algorithm of the PIPS parallelizer is based on a approximate feasibility test using Fourier-Motzkin elimination. Our experiments show that, in practice, it is accurate enough for treating dependences systems, and that its practical complexity is polynomial. Different dependence abstractions have different precision. For deciding whether a transformation is legal, several abstractions are admissible, meaning they contain enough information for knowing if this transformation is legal. The minimal a...
Compiler Technology for Parallel Scientific Computation
, 1994
"... There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving large-scale problems in science and engineering. Yet, the use of paralle ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving large-scale problems in science and engineering. Yet, the use of parallel computation is limited by the high costs of developing the needed software. To overcome this difficulty we advocate a comprehensive approach to the development of scalable architecture-independent software for scientific computation based on our experience with Equational Programming Language, EPL.
Extracting data flow information for parallelizing FORTRAN nested loop kernels
, 1994
"... Currently available parallelizing FORTRAN compilers expend a large amount of effort in determining data independent statements in a program such that these statements can be scheduled in parallel without need for synchronisation. This thesis hypothesises that it is just as important to derive exact ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Currently available parallelizing FORTRAN compilers expend a large amount of effort in determining data independent statements in a program such that these statements can be scheduled in parallel without need for synchronisation. This thesis hypothesises that it is just as important to derive exact data flow information about the data dependencies where they exist. We focus on the specific problem of imperative nested loop parallelization by describing a direct method for determining the distance vectors of the inter-loop data dependencies in an n-nested loop kernel. These distance vectors define dependence arcs between iterations which are represented as points in n-dimensional euclidean space. To demonstrate some of the benefits gained from deriving such exact data flow information about a nested loop computation we show how implicit task graph information about the computation can be deduced. Deriving the implicit task graph of the computation enables the parallelization of a class ...
A Method for Developing Parallel Vision Algorithms with an Example of Edge Tracking
, 1996
"... We present a general approach to parallel algorithm development based on protyping and benchmarking in a functional language. Preferred solutions are then translated to an imperative language to run on a MIMD architecture. The methodology is well-suited to vision algorithms, since these show a high ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We present a general approach to parallel algorithm development based on protyping and benchmarking in a functional language. Preferred solutions are then translated to an imperative language to run on a MIMD architecture. The methodology is well-suited to vision algorithms, since these show a high degree of data or algorithmic complexity at several levels, and are necessarily dependent on parallel processing for rapid computation. The approach is demonstrated fully on a relatively difficult parallel task, tracking boundaries in an intensity image, showing how the functional prototype provides accurate predictions of the final parallel code. Prototyping is in Standard ML for parallel implementation in occam2 on a transputer array. Keywords: functional prototyping, parallel vision, distributed memory, edge tracking Telephone: 0131-451-3422 (Michaelson) -3423 (Wallace) Fax: 0131-451-3431 email: norman,greg,andy@cee.hw.ac.uk Correspondence: Dr. Andrew Wallace at above address 1 Over...

