Results 1 -
4 of
4
A Cost Analysis for a Higher-order Parallel Programming Model
, 1996
"... Programming parallel computers remains a difficult task. An ideal programming environment should enable the user to concentrate on the problem solving activity at a convenient level of abstraction, while managing the intricate low-level details without sacrificing performance. This thesis investiga ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Programming parallel computers remains a difficult task. An ideal programming environment should enable the user to concentrate on the problem solving activity at a convenient level of abstraction, while managing the intricate low-level details without sacrificing performance. This thesis investigates a model of parallel programming based on the BirdMeertens Formalism (BMF). This is a set of higher-order functions, many of which are implicitly parallel. Programs are expressed in terms of functions borrowed from BMF. A parallel implementation is defined for each of these functions for a particular topology, and the associated execution costs are derived. The topologies which have been considered include the hypercube, 2-D torus, tree and the linear array. An analyser estimates the costs associated with different implementations of a given program and selects a cost-effective one for a given topology. All the analysis is performed at compile-time which has the advantage of reducing run-...
Experience with the Implementation of a Concurrent Graph Reduction System on an nCUBE/2 Platform
- In CONPAR'94 | Conf. on Parallel and Vector Processing, LNCS 854
, 1994
"... . This paper reports on some experiments with the implementation of a concurrent version of a graph reduction system -red + on an nCUBE/2 system of up to 32 processing sites. They primarily concern basic concepts of workload partitioning and balancing, the relationship between relative perform ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
. This paper reports on some experiments with the implementation of a concurrent version of a graph reduction system -red + on an nCUBE/2 system of up to 32 processing sites. They primarily concern basic concepts of workload partitioning and balancing, the relationship between relative performance gains and the computational complexities of the investigated programs, resource management and suitable system topologies. All programs used for these experiments realize divide and conquer algorithms and have been run with varying (sizes of) data sets and system parameters (configurations). 1 Introduction Running complex application programs non-sequentially in multiprocessor systems is known to be a formidable organizational problem. It relates to a programming paradigm suitable for exposing problem-inherent concurrency, to the orderly cooperation of all processes participating in the computation, and to a process management discipline which ensures a stable overall system behav...
Analytic models for multistage interconnection networks
- Journal of Parallel and Distributed Computing
, 1991
"... Central to all parallel architectures is a switching network which facilitates the communication between a machine's components necessary to support their cooperation. Multi-stage interconnection networks (MINs) are classified and two queueing models for packet-switched MINs with unlimited buffer sp ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Central to all parallel architectures is a switching network which facilitates the communication between a machine's components necessary to support their cooperation. Multi-stage interconnection networks (MINs) are classified and two queueing models for packet-switched MINs with unlimited buffer space are introduced. The fust uses standard techniques and is exact with respect to its assumptions, hence providing a standard against which to assess approximate models. From this exact model, we can also obtain distributions of transmission times; previous work has either used simulation, which can be unreliable and is expensive to run, or produced only Laplace transforms. The second model has much milder assumptions, is more generally applicable and can be implemented more efficiently, but is approximate. However, it has been found to give accurate predictions for a wide range of traffic patterns and distributions of link transmission times. Established techniques can be integrated into our queueing-based methodology to model MlNs with finite buffers and hence blocking. III. 2 Analytical models for multi-stage interconnection networks 1.
Efficient Shared-Memory Support for Parallel Graph Reduction
, 1996
"... This paper presents the results of a simulation study of cache coherency issues in parallel implementations of functional programming languages. Parallel graph reduction uses a heap shared between processors for all synchronisation and communication. We show that a high degree of spatial locality ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper presents the results of a simulation study of cache coherency issues in parallel implementations of functional programming languages. Parallel graph reduction uses a heap shared between processors for all synchronisation and communication. We show that a high degree of spatial locality is often present and that the rate of synchronisation is much greater than for imperative programs. We propose a modified coherency protocol with static cache line ownership and show that this allows locality to be exploited to at least the level of a conventional protocol, but without the unnecessary serialisation and network transactions this usually causes. The new protocol avoids false sharing, and makes it possible to reduce the number of messages exchanged, but relies on increasing the size of the cache lines exchanged to do so. It is therefore of most benefit with a high-bandwidth interconnection network with relatively high communication latencies or message handling overhead...

