Results 11  20
of
58
Exploiting symmetry on parallel architectures
, 1995
"... This thesis describes techniques for the design of parallel programs that solvewellstructured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a groupequivariant matrix. Fast techniques for this multiplication are described ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
This thesis describes techniques for the design of parallel programs that solvewellstructured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a groupequivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over nite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetryexploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral groupequivariant matrix is described. This code runs faster than previous serial programs, and discovered a number of results. Second, parallel algorithms for Fourier transforms for nite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct nbody problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.
A Parallel Programming Methodology Based on Paradigms
 In Transputer and Occam Developments
, 1995
"... Today's efforts are mainly concentrated on providing "standard" parallel languages to ensure the portability of programs across various architectures. It is now believed that the next level of abstraction that will be addressed is the application level. This paper argues that there is ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
Today's efforts are mainly concentrated on providing "standard" parallel languages to ensure the portability of programs across various architectures. It is now believed that the next level of abstraction that will be addressed is the application level. This paper argues that there is an intermediate level that consist of common parallel programming paradigms. It describes some of these paradigms and explains the basic principles behind a "paradigmoriented" programming approach. Finally, it points to future directions which can make it feasible to build parallel CASE tools that achieve automatic parallel code generation. 1 Introduction This paper is concerned the process of developing portable applications that are suitable for general purpose parallel computers. Until very recently, the most efficient way to develop efficient code has been to program directly at the machine code level. Efforts have been made in order to provide a higher abstraction level without a significant loss i...
Optimizing Compositions of Scans and Reductions in Parallel Program Derivation
, 1997
"... Introduction We study two popular programming schemas: scan (also known as prefix sums, parallel prefix, etc.) and reduction (also known as fold). Originally from the functional world [3], they are becoming increasingly popular as primitives of parallel programming. The reasons are that, first, such ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Introduction We study two popular programming schemas: scan (also known as prefix sums, parallel prefix, etc.) and reduction (also known as fold). Originally from the functional world [3], they are becoming increasingly popular as primitives of parallel programming. The reasons are that, first, such higherorder combinators are adequate and useful for a broad class of applications [4], second, they encourage wellstructured, coarsegrained parallel programming and, third, their implementation in the MPI standard [14] makes the target programs portable across different parallel architectures with predictable performance. Our contributions are as follows:  We formally prove two optimization rules: the first rule transforms a sequential composition of scan and reduction into a single reduction, the second rule transforms a composition of two scans into a single scan.  We apply the first rule in the formal derivation of a parallel algorithm for the
List Homomorphic Parallel Algorithms for Bracket Matching
 Department of Computer Science, University of Edinburgh
, 1993
"... We present a family of parallel algorithms for simple language recognition problems involving bracket matching. The algorithms are expressed in the BirdMeertens Formalism, exploiting only list operations which are inherently massively parallel. Our intention is to illustrate the practical efficacy ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
We present a family of parallel algorithms for simple language recognition problems involving bracket matching. The algorithms are expressed in the BirdMeertens Formalism, exploiting only list operations which are inherently massively parallel. Our intention is to illustrate the practical efficacy with which such algorithms can be derived and expressed given the support of a well understood theoretical foundation. One of the variants produced is of particular interest in that it exploits the same theoretical result twice to produce nested parallelism. 1 Introduction In [8], we investigated an informal methodology for the generation of parallel algorithms based upon exploitation of a fundamental result from the BirdMeertens "theory of lists". Our main example was an algorithm for the maximum segment sum problem. In this report we provide further examples of the approach. For completeness, the remainder of this section and sections 2 and 3 repeat the introductory material from [8]. Re...
(De)Composition Rules for Parallel Scan and Reduction
 In Proc. 3rd Int. Working Conf. on Massively Parallel Programming Models (MPPM'97
, 1998
"... We study the use of welldefined building blocks for SPMD programming of machines with distributed memory. Our general framework is based on homomorphisms, functions that capture the idea of dataparallelism and have a close correspondence with collective operations of the MPI standard, e.g., scan an ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
We study the use of welldefined building blocks for SPMD programming of machines with distributed memory. Our general framework is based on homomorphisms, functions that capture the idea of dataparallelism and have a close correspondence with collective operations of the MPI standard, e.g., scan and reduction. We prove two composition rules: under certain conditions, a composition of a scan and a reduction can be transformed into one reduction, and a composition of two scans into one scan. As an example of decomposition, we transform a segmented reduction into a composition of partial reduction and allgather. The performance gain and overhead of the proposed composition and decomposition rules are assessed analytically for the hypercube and compared with the estimates for some other parallel models.
Categorical Data Types
 IN SECOND WORKSHOP ON ABSTRACT MODELS FOR PARALLEL COMPUTATION
, 1993
"... An ideal abstract model for parallel computation must carefully balance requirements for effective software engineering with requirements for efficient implementation. Models based on sets of fixed communication/computation patterns satisfy these requirements but, in general, the sets of patterns ar ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
An ideal abstract model for parallel computation must carefully balance requirements for effective software engineering with requirements for efficient implementation. Models based on sets of fixed communication/computation patterns satisfy these requirements but, in general, the sets of patterns are chosen arbitrarily. Categorical data types are a way of building such models while automatically generating operations, equations, and a guarantee of completeness. We illustrate this construction, and its usefulness for practical problems, by building the type of chemical molecules and showing how molecular properties can be computed in parallel.
Questions and Answers About Categorical Data Types
 in Proceedings on the BCS Workshop on Bulk Data Types for Architecture Independence, London (20
, 1994
"... this document without fee provided it is copied in its entirety and this notice remains attached. the computation and communication of an operation on the data type are arranged. That's a job for the implementer and compiler writer. So there's a separation of concerns at just the right le ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
this document without fee provided it is copied in its entirety and this notice remains attached. the computation and communication of an operation on the data type are arranged. That's a job for the implementer and compiler writer. So there's a separation of concerns at just the right level  programmers think about monolithic operations on data types, while implementers worry about how to make them happen. This provides architecture independence. If the target machine is replaced during the night by some new machine, even a completely different architecture, there is no need to alter the software. The differences between machines can be hidden by the compiler.
Optimizing Sequences of Skeleton Calls
 In DomainSpecific Program Generation, LNCS 3016
, 2004
"... ..."
Parallel Functional Programming for MessagePassing Multiprocessors
, 1993
"... We propose a framework for the evaluation of implicitly parallel functional programs on message passing multiprocessors with special emphasis on the issue of load bounding. The model is based on a new encoding of the lcalculus in Milner's pcalculus and combines lazy evaluation and eager (par ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
We propose a framework for the evaluation of implicitly parallel functional programs on message passing multiprocessors with special emphasis on the issue of load bounding. The model is based on a new encoding of the lcalculus in Milner's pcalculus and combines lazy evaluation and eager (parallel) evaluation in the same framework. The pcalculus encoding serves as the specification of a more concrete compilation scheme mapping a simple functional language into a message passing, parallel program. We show how and under which conditions we can guarantee successful load bounding based on this compilation scheme. Finally we discuss the architectural requirements for a machine to support our model efficiently and we present a simple RISCstyle processor architecture which meets those criteria. 3 Acknowledgments Many people have had profound influence on this thesis and I want to pay tribute to some of them here. To my supervisor, Tony Davie, for his willingness to supervise what start...