Results 1 -
4 of
4
Extracting and Implementing List Homomorphisms in Parallel Program Development
- Science of Computer Programming
, 1997
"... this paper, we study functions called list homomorphisms, which represent a particular pattern of parallelism. ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
this paper, we study functions called list homomorphisms, which represent a particular pattern of parallelism.
Dense Edge-Disjoint Embedding of Complete Binary Trees in Interconnection Networks
- in the Hypercube, Inf. Process. Lett
, 1993
"... We describe dense edge-disjoint embeddings of the complete binary tree with n leaves in the following n-node communication networks: the hypercube, the de Bruijn and shuffle-exchange networks and the two-dimensional mesh. For the mesh and the shuffle-exchange graphs each edge is regarded as two para ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We describe dense edge-disjoint embeddings of the complete binary tree with n leaves in the following n-node communication networks: the hypercube, the de Bruijn and shuffle-exchange networks and the two-dimensional mesh. For the mesh and the shuffle-exchange graphs each edge is regarded as two parallel (or anti-parallel) edges. The embeddings have the following properties: paths of the tree are mapped onto edge-disjoint paths of the host graph and at most two tree nodes (just one of which is a leaf) are mapped onto each host node. We prove that the maximum distance from a leaf to the root of the tree is asymptotically as short as possible in all host graphs except in the case of the shuffle-exchange, in which case we conjecture that it is as short as possible. The embeddings facilitate efficient implementation of many P-RAM algorithms on these networks.
Compiler Technology for Parallel Scientific Computation
, 1994
"... There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving large-scale problems in science and engineering. Yet, the use of paralle ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving large-scale problems in science and engineering. Yet, the use of parallel computation is limited by the high costs of developing the needed software. To overcome this difficulty we advocate a comprehensive approach to the development of scalable architecture-independent software for scientific computation based on our experience with Equational Programming Language, EPL.
Simultaneous Parallel Reduction on SIMD Machines
"... Proper distribution of operations among parallel processors in a large scientific computation executed on a distributed-memory machine can significantly reduce the total computation time. In this paper we propose an operation, called simultaneous parallel reduction(SPR), that is amenable to such ..."
Abstract
- Add to MetaCart
Proper distribution of operations among parallel processors in a large scientific computation executed on a distributed-memory machine can significantly reduce the total computation time. In this paper we propose an operation, called simultaneous parallel reduction(SPR), that is amenable to such optimization. SPR performs reduction operations in parallel, each operation reducing a one-dimensional consecutive section of a distributed array. Each element of the distributed array is used as an operand to many reductions executed concurrently over the overlapping array's sections. SPR is distinct from a more commonly considered parallel reduction which concurrently evaluates a single reduction. In this paper we consider SPR on Single Instruction Multiple Data (SIMD) machines with different interconnection networks. We focus on SPR over sections whose size is not a power of 2 with the result shifted relative to the arguments. Several algorithms achieving some of the lower bounds on SPR complexity are presented under various assumptions about the properties of the binary operator of the reduction and of the communication cost of the target architectures.

