Results 1  10
of
23
Models of Parallel Computation: A Survey and Synthesis
 INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES
, 1995
"... In the realm of sequential computing the random access machine has successufully provided an underlying model of computation that promoted consistency and coordination among algorithm developers, computer architects and language experts. In the realm of parallel computing, however, there has been no ..."
Abstract

Cited by 52 (0 self)
 Add to MetaCart
In the realm of sequential computing the random access machine has successufully provided an underlying model of computation that promoted consistency and coordination among algorithm developers, computer architects and language experts. In the realm of parallel computing, however, there has been no similar success. The need for such a unifying parallel model or set of models is heightened by the greater demand for performance and the greater diversity among machines. Yet the modeling of parallel computing still seems to be mired in controversy and chaos. This paper is an excerpt from a study which presents broad range of models of parallel computation and the different roles they serve in algorithm, language and machine design. The objective is to better understand which model characteristics are important to each design community in order to elucidate the requirements of a unifying paradigm. As an impetus for discussion, we conclude by suggesting a model of parallel computation which...
Parallelization in Calculational Forms
 In 25th ACM Symposium on Principles of Programming Languages
, 1998
"... The problems involved in developing efficient parallel programs have proved harder than those in developing efficient sequential ones, both for programmers and for compilers. Although program calculation has been found to be a promising way to solve these problems in the sequential world, we believe ..."
Abstract

Cited by 32 (24 self)
 Add to MetaCart
The problems involved in developing efficient parallel programs have proved harder than those in developing efficient sequential ones, both for programmers and for compilers. Although program calculation has been found to be a promising way to solve these problems in the sequential world, we believe that it needs much more effort to study its effective use in the parallel world. In this paper, we propose a calculational framework for the derivation of efficient parallel programs with two main innovations:  We propose a novel inductive synthesis lemma based on which an elementary but powerful parallelization theorem is developed.  We make the first attempt to construct a calculational algorithm for parallelization, deriving associative operators from data type definition and making full use of existing fusion and tupling calculations. Being more constructive, our method is not only helpful in the design of efficient parallel programs in general but also promising in the construc...
LowOverhead LogGP Parameter Assessment for Modern Interconnection Networks
 in Proceedings of the 21st IEEE International Parallel & Distributed Processing Symposium. IEEE Computer Society
, 2007
"... Network performance measurement and prediction is very important to predict the running time of high performance computing applications. The LogP model family has been proven to be a viable tool to assess the communication performance of parallel architectures. However, nonintrusive LogP parameter a ..."
Abstract

Cited by 15 (13 self)
 Add to MetaCart
Network performance measurement and prediction is very important to predict the running time of high performance computing applications. The LogP model family has been proven to be a viable tool to assess the communication performance of parallel architectures. However, nonintrusive LogP parameter assessment is still a very difficult task. We compare well known measurement methods for Log(G)P parameters and discuss their accuracy and network contention. Based on this, a new theoretically exact measurement method that does not saturate the network is derived and explained in detail. Our method only uses benchmarked values instead of computed parameters to compute other parameters to avoid propagation of firstorder errors. A methodology to detect protocol changes in the underlying communication subsystem is also proposed. The applicability of our method and the protocol change detection is shown for the lowlevel API as well as MPI implementations of different modern high performance interconnection networks. The whole method is implemented in the tool Netgauge and it is available as open source to the public. 1
An Accumulative Parallel Skeleton for All
, 2001
"... Parallel skeletons intend to encourage programmers to build... ..."
Abstract

Cited by 14 (11 self)
 Add to MetaCart
Parallel skeletons intend to encourage programmers to build...
A library of constructive skeletons for sequential style of parallel programming
 In InfoScale ’06: Proceedings of the 1st international conference on Scalable information systems, volume 152 of ACM International Conference Proceeding Series
, 2006
"... With the increasing popularity of parallel programming environments such as PC clusters, more and more sequential programmers, with little knowledge about parallel architectures and parallel programming, are hoping to write parallel programs. Numerous attempts have been made to develop highlevel pa ..."
Abstract

Cited by 12 (7 self)
 Add to MetaCart
With the increasing popularity of parallel programming environments such as PC clusters, more and more sequential programmers, with little knowledge about parallel architectures and parallel programming, are hoping to write parallel programs. Numerous attempts have been made to develop highlevel parallel programming libraries that use abstraction to hide lowlevel concerns and reduce difficulties in parallel programming. Among them, libraries of parallel skeletons have emerged as a promising way towards this direction. Unfortunately, these libraries are not well accepted by sequential programmers, because of incomplete elimination of lowerlevel details, adhoc selection of library functions, unsatisfactory performance, or lack of convincing application examples. This paper addresses principle of designing skeleton libraries of parallel programming and reports implementation details and practical applications of a skeleton library SkeTo. The SkeTo library is unique in its feature that it has a solid theoretical foundation based on the theory of Constructive Algorithmics, and is practical to be used to describe various parallel computations in a sequential manner. 1.
Diffusion: Calculating Efficient Parallel Programs
 IN 1999 ACM SIGPLAN WORKSHOP ON PARTIAL EVALUATION AND SEMANTICSBASED PROGRAM MANIPULATION (PEPM ’99
, 1999
"... Parallel primitives (skeletons) intend to encourage programmers to build a parallel program from readymade components for which efficient implementations are known to exist, making the parallelization process easier. However, programmers often suffer from the difficulty to choose a combination of p ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
Parallel primitives (skeletons) intend to encourage programmers to build a parallel program from readymade components for which efficient implementations are known to exist, making the parallelization process easier. However, programmers often suffer from the difficulty to choose a combination of proper parallel primitives so as to construct efficient parallel programs. To overcome this difficulty, we shall propose a new transformation, called diffusion, which can efficiently decompose a recursive definition into several functions such that each function can be described by some parallel primitive. This allows programmers to describe algorithms in a more natural recursive form. We demonstrate our idea with several interesting examples. Our diffusion transformation should be significant not only in development of new parallel algorithms, but also in construction of parallelizing compilers.
Automatic Inversion Generates DivideandConquer Parallel Programs
"... Divideandconquer algorithms are suitable for modern parallel machines, tending to have large amounts of inherent parallelism and working well with caches and deep memory hierarchies. Among others, list homomorphisms are a class of recursive functions on lists, which match very well with the divide ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
Divideandconquer algorithms are suitable for modern parallel machines, tending to have large amounts of inherent parallelism and working well with caches and deep memory hierarchies. Among others, list homomorphisms are a class of recursive functions on lists, which match very well with the divideandconquer paradigm. However, direct programming with list homomorphisms is a challenge for many programmers. In this paper, we propose and implement a novel system that can automatically derive costoptimal list homomorphisms from a pair of sequential programs, based on the third homomorphism theorem. Our idea is to reduce extraction of list homomorphisms to derivation of weak right inverses. We show that a weak right inverse always exists and can be automatically generated from a wide class of sequential programs. We demonstrate our system with several nontrivial examples, including the maximum prefix sum problem, the prefix sum computation, the maximum segment sum problem, and the lineofsight problem. The experimental results show practical efficiency of our automatic parallelization algorithm and good speedups of the generated parallel programs.
A Communication Model for Small Messages with InfiniBand
 PARS Proceedings
, 2005
"... Designing new and optimal algorithms for a specific architecture requires accurate modelling of this architecture. This is especially needed to choose one out of different solutions for the same problem or to proof a lower bound to a problem. Assumed that the model is highly accurate, a given algori ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Designing new and optimal algorithms for a specific architecture requires accurate modelling of this architecture. This is especially needed to choose one out of different solutions for the same problem or to proof a lower bound to a problem. Assumed that the model is highly accurate, a given algorithm can be seen as optimal solution if it reaches the lower bound. Therefore the accuracy of a model is extremely important for algorithmic design. A detailed model can also help to understand the architectural details and their influence on the running time of different solutions and it can be used to derive better algorithms for a given problem. This work introduces some architectural specialities of the InfiniBand network and shows that most widely used models introduce inaccuracies for sending small messages with InfiniBand. Therefore a comparative model analysis is performed to find the most accurate model for InfiniBand. Basing on this analysis and a description of the architectural specialities of InfiniBand, a new, more accurate but also much complexer model called LoP is deduced from the LogP which can be used to assess the running time of different algorithms. The newly developed model can be used to find lower bounds for algorithmic problems and to enhance several algorithms. 1
A New Parallel Skeleton for General Accumulative Computations
 International Journal of Parallel Programming
, 2004
"... this paper, we propose a powerful and general parallel skeleton called accumulate and describe its efficientimplementation in C++ with MPI (Message Passing Interface) (18) as a solution to the above problems. Unlike the approaches that apply such optimizations as loop restructuring to the target p ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
this paper, we propose a powerful and general parallel skeleton called accumulate and describe its efficientimplementation in C++ with MPI (Message Passing Interface) (18) as a solution to the above problems. Unlike the approaches that apply such optimizations as loop restructuring to the target program, our approach provides a general recursive computation with accumulation as a library function (skeleton) with an optimized implementation. We are based on the data parallel programming model of BMF, which provides us with a concise way to describe and manipulate parallel programs. The main advantages of accumulate can be summarized as follows
Towards polytypic parallel programming
, 1998
"... Data parallelism is currently one of the most successful models for programming massively parallel computers. The central idea is to evaluate a uniform collection of data in parallel by simultaneously manipulating each data element in the collection. Despite many of its promising features, the curre ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Data parallelism is currently one of the most successful models for programming massively parallel computers. The central idea is to evaluate a uniform collection of data in parallel by simultaneously manipulating each data element in the collection. Despite many of its promising features, the current approach suffers from two problems. First, the main parallel data structures that most data parallel languages currently support are restricted to simple collection data types like lists, arrays or similar structures. But other useful data structures like trees have not been well addressed. Second, parallel programming relies on a set of parallel primitives that capture parallel skeletons of interest. However, these primitives are not well structured, and efficient parallel programming with these primitives is difficult. In this paper, we propose a polytypic framework for developing efficient parallel programs on most data structures. We showhow a set of polytypic parallel primitives can be formally defined for manipulating most data structures, how these primitives can be successfully structured into a uniform recursive definition, and how an efficient combination of primitives can be derived from a naive specification program. Our framework should be significant not only in development of new parallel algorithms, but also in construction of parallelizing compilers.