Results 1 - 10
of
15
Compiling Collection-Oriented Languages onto Massively Parallel Computers
- Journal of Parallel and Distributed Computing
, 1990
"... : This paper introduces techniques for compiling the nested parallelism of collectionoriented languages onto existing parallel hardware. Programmers of parallel machines encounter nested parallelism whenever they write a routine that performs parallel operations, and then want to call that routine ..."
Abstract
-
Cited by 78 (10 self)
- Add to MetaCart
: This paper introduces techniques for compiling the nested parallelism of collectionoriented languages onto existing parallel hardware. Programmers of parallel machines encounter nested parallelism whenever they write a routine that performs parallel operations, and then want to call that routine itself in parallel. This occurs naturally in many applications. Most parallel systems, however, do not permit the expression of nested parallelism. This forces the programmer to exploit only one level of parallelism or to implement nested parallelism themselves. Both of these alternatives tend to produce code that is harder to maintain and less modular than code described at a higher-level with nested parallel constructs. Not permitting the expression of nested parallelism is analogous to not permitting nested loops in serial languages. This paper describes issues and techniques for taking high-level descriptions of parallelism in the form of operations on nested collections and automaticall...
Accelerator: using data parallelism to program GPUs for general-purpose uses
- in Proceedings of the 12th international conference on Architectural
, 2006
"... GPUs are difficult to program for general-purpose uses. Programmers can either learn graphics APIs and convert their applications to use graphics pipeline operations or they can use stream programming abstractions of GPUs. We describe Accelerator, a system that uses data parallelism to program GPUs ..."
Abstract
-
Cited by 57 (0 self)
- Add to MetaCart
GPUs are difficult to program for general-purpose uses. Programmers can either learn graphics APIs and convert their applications to use graphics pipeline operations or they can use stream programming abstractions of GPUs. We describe Accelerator, a system that uses data parallelism to program GPUs for general-purpose uses instead. Programmers use a conventional imperative programming language and a library that provides only high-level data-parallel operations. No aspects of GPUs are exposed to programmers. The library implementation compiles the data-parallel operations on the fly to optimized GPU pixel shader code and API calls. We describe the compilation techniques used to do this. We evaluate the effectiveness of using data parallelism to program GPUs by providing results for a set of compute-intensive benchmarks. We compare the performance of Accelerator versions of the benchmarks against hand-written pixel shaders. The speeds of the Accelerator versions are typically within 50 % of the speeds of hand-written pixel shader code. Some benchmarks significantly outperform C versions on a CPU: they are up to 18 times faster than C code running on a CPU.
Collection-Oriented Languages
- PROCEEDINGS OF THE IEEE
, 1991
"... Several programming languages arising from widely diverse practical and theoretical considerations share a common high-level feature: their basic data type is an aggregate of other more primitive data types and their primitive functions operate on these aggregates. Examples of such languages (and th ..."
Abstract
-
Cited by 49 (5 self)
- Add to MetaCart
Several programming languages arising from widely diverse practical and theoretical considerations share a common high-level feature: their basic data type is an aggregate of other more primitive data types and their primitive functions operate on these aggregates. Examples of such languages (and the collections they support) are FORTRAN 90 (arrays), APL (arrays), Connection Machine LISP (xectors), PARALATION LISP (paralations), and SETL (sets). Acting on large collections of data with a single operation is the hallmark of data-parallel programming and massively parallel computers. These languages --- which we call collection-oriented --- are thus ideal for use with massively parallel machines, even though many of them were developed before parallelism and associated considerations became important. This paper examines collections and the operations that can be performed on them in a language-independent manner. It also critically reviews and compares a variety of collection-oriented languages...
C**: A Large-Grain, Object-Oriented, Data-Parallel Programming Language
- LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING (5TH INTERNATIONAL WORKSHOP
, 1992
"... C** is a new data-parallel programming language based on a new computation model called large-grain data parallelism. C** overcomes many disadvantages of existing data-parallel languages, yet retains their distinctive and advantageous programming style and deterministic behavior. This style makes da ..."
Abstract
-
Cited by 46 (3 self)
- Add to MetaCart
C** is a new data-parallel programming language based on a new computation model called large-grain data parallelism. C** overcomes many disadvantages of existing data-parallel languages, yet retains their distinctive and advantageous programming style and deterministic behavior. This style makes data parallelism well-suited for massively-parallel computation. Large-grain data parallelism enhances data parallelism by permitting a wider range of algorithms to be expressed naturally. C is an object-oriented programming language that inherits data abstraction features from C++. Existing scientific programming languages do not provide modern programming facilities such as operator extensibility, abstract datatypes, or object-oriented programming. C ---and its sequential subset C ++ ---support modern programming practices and enable a single language to be used for all parts of large, complex programs and libraries. This technical report consists of three parts. The body of t...
Compiling Nested Data-Parallel Programs for Shared-Memory Multiprocessors
- ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS
, 1993
"... While data parallelism is well-suited from algorithmic, architectural, and linguistic considerations to serve as a basis for portable parallel programming, its characteristic fine-grained parallelism makes the efficient implementation of data-parallel languages on MIMD machines a challenging task. ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
While data parallelism is well-suited from algorithmic, architectural, and linguistic considerations to serve as a basis for portable parallel programming, its characteristic fine-grained parallelism makes the efficient implementation of data-parallel languages on MIMD machines a challenging task. The design, implementation, and evaluation of an optimizing compiler are presented for an applicative nested data-parallel language called VCODE targeted at the Encore Multimax, a shared-memory multiprocessor. The source language supports nested aggregate data types; aggregate operations including elementwise forms, scans, reductions, and permutations; and conditionals and recursion for control flow. A small set of graph-theoretic compile-time optimizations reduce the overheads on MIMDmachines in several ways: by increasing the grain size of the output program, by reducing synchronization and storage requirements, and by improving locality of reference. The two key ideas behind these...
Size and Access Inference for Data-Parallel Programs
- IN ACM SIGPLAN '91 CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION
, 1991
"... Data-parallel programming languages have many desirable features, such as single-thread semantics and the ability to express fine-grained parallelism. However, it is challenging to implement such languages efficiently on conventional MIMD multiprocessors, because these machines incur a high overhead ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Data-parallel programming languages have many desirable features, such as single-thread semantics and the ability to express fine-grained parallelism. However, it is challenging to implement such languages efficiently on conventional MIMD multiprocessors, because these machines incur a high overhead for small grain sizes. This paper presents compile-time analysis techniques for data-parallel program graphs that reduce these overheads in two ways: by stepping up the grain size, and by relaxing the synchronous nature of the computation without altering the program semantics. The algorithms partition the program graph into clusters of nodes such that all nodes in a cluster have the same loop structure, and further refine these clusters into epochs based on generation and consumption patterns of data vectors. This converts the fine-grain parallelism in the original program to medium-grain loop parallelism, which is better suited to MIMD machines. A compiler has been implemented based on th...
Factor-Join: A Unique Approach to Compiling Array Languages for Parallel Machines
- IN WORKSHOP ON LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING
, 1996
"... This paper describes a new approach to compiling and optimizing array languages for parallel machines. This approach first decomposes array language operations into factors, where each factor corresponds to a different communication or computation structure. Optimizations are then achieved by co ..."
Abstract
-
Cited by 13 (8 self)
- Add to MetaCart
This paper describes a new approach to compiling and optimizing array languages for parallel machines. This approach first decomposes array language operations into factors, where each factor corresponds to a different communication or computation structure. Optimizations are then achieved by combining, or joining, these factors. Because
An Array Operation Synthesis Scheme to Optimize Fortran 90 Programs
- Programs, Proceedings of ACM SIGPLAN Conference on Principles and Practice of Parallel Programming
, 1995
"... An increasing number of programming languages, such as Fortran 90 and APL, are providing a rich set of intrinsic array functions and array expressions. These constructs which constitute an important part of data parallel languages provide excellent opportunities for compiler optimizations. In this p ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
An increasing number of programming languages, such as Fortran 90 and APL, are providing a rich set of intrinsic array functions and array expressions. These constructs which constitute an important part of data parallel languages provide excellent opportunities for compiler optimizations. In this paper, we present a new approach to combine consecutive data access patterns of array constructs into a composite access function to the source arrays. Our scheme is based on the composition of access functions, which is similar to a composition of mathematic functions. Our new scheme can handle not only data movements of arrays of different numbers of dimensions and segmented array operations but also masked array expressions and multiple sources array operations. As a result, our proposed scheme is the first synthesis scheme which can synthesize Fortran 90 RESHAPE, EOSHIFT, MERGE, and WHERE constructs together. Experimental results show speedups from 1.21 to 2.95 for code fragments from rea...
Composition and Compilation in Functional Programming Languages
, 1994
"... Functional programming languages, such as Backus' FP, and high level expression oriented languages, such as APL, are examples of programming languages in which the primary method of program construction is the process of composition. In this paper we describe an approach to generating code for langu ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Functional programming languages, such as Backus' FP, and high level expression oriented languages, such as APL, are examples of programming languages in which the primary method of program construction is the process of composition. In this paper we describe an approach to generating code for languages based on compositions. The approach involves finding an intermediate representation which grows in size very slowly as additional terms are composed. In particular, the size of the intermediate representation of a composed object should be considerably smaller, and easier to interpret, than the sum of the sizes of the internal representations of the individual elements. We illustrate this technique by showing how to generate conventional code for Backus' language FP. The general technique, however, is applicable to other languages, as well as other architectures. 1 Introduction The purposes of this paper are two-fold. First, we want to describe an approach to compilation and code gener...
Achieving Speedups for APL on an SIMD Distributed Memory Machine
, 1990
"... The potential speedup for SIMD parallel implementations of APL programs is considered. Both analytical and (simulated) empirical studies are presented. The approach is to recognize that nearly 95% of the operators appearing in APL programs are either scalar primitive, reduction or indexing and so th ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
The potential speedup for SIMD parallel implementations of APL programs is considered. Both analytical and (simulated) empirical studies are presented. The approach is to recognize that nearly 95% of the operators appearing in APL programs are either scalar primitive, reduction or indexing and so the performance of these operators gives a good estimate of the amount of speedup a full program might receive. Substantial speedups are demonstrated for these operators and the empirical evidence accords with the analytical estimates. Keywords: APL, data parallel, parallelism, parallel programming, SIMD computers This research has been funded by the Office of Naval Research Contract No. N00014-86-K-0264 and the National Science Foundation Grant No. DCR 8416878. List of Figures 1 4 \Theta 4 mesh. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 Algorithm for performing the scalar primitive operator in parallel on the mesh. . . . 10 3 Algorithm for per...

