Results 1 
4 of
4
Extracting and Implementing List Homomorphisms in Parallel Program Development
 Science of Computer Programming
, 1997
"... this paper, we study functions called list homomorphisms, which represent a particular pattern of parallelism. ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
this paper, we study functions called list homomorphisms, which represent a particular pattern of parallelism.
Dense EdgeDisjoint Embedding of Complete Binary Trees in Interconnection Networks
 in the Hypercube, Inf. Process. Lett
, 1993
"... We describe dense edgedisjoint embeddings of the complete binary tree with n leaves in the following nnode communication networks: the hypercube, the de Bruijn and shuffleexchange networks and the twodimensional mesh. For the mesh and the shuffleexchange graphs each edge is regarded as two para ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We describe dense edgedisjoint embeddings of the complete binary tree with n leaves in the following nnode communication networks: the hypercube, the de Bruijn and shuffleexchange networks and the twodimensional mesh. For the mesh and the shuffleexchange graphs each edge is regarded as two parallel (or antiparallel) edges. The embeddings have the following properties: paths of the tree are mapped onto edgedisjoint paths of the host graph and at most two tree nodes (just one of which is a leaf) are mapped onto each host node. We prove that the maximum distance from a leaf to the root of the tree is asymptotically as short as possible in all host graphs except in the case of the shuffleexchange, in which case we conjecture that it is as short as possible. The embeddings facilitate efficient implementation of many PRAM algorithms on these networks.
Compiler Technology for Parallel Scientific Computation
, 1994
"... There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving largescale problems in science and engineering. Yet, the use of paralle ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving largescale problems in science and engineering. Yet, the use of parallel computation is limited by the high costs of developing the needed software. To overcome this difficulty we advocate a comprehensive approach to the development of scalable architectureindependent software for scientific computation based on our experience with Equational Programming Language, EPL.
Simultaneous Parallel Reduction on SIMD Machines
"... Proper distribution of operations among parallel processors in a large scientific computation executed on a distributedmemory machine can significantly reduce the total computation time. In this paper we propose an operation, called simultaneous parallel reduction(SPR), that is amenable to such ..."
Abstract
 Add to MetaCart
Proper distribution of operations among parallel processors in a large scientific computation executed on a distributedmemory machine can significantly reduce the total computation time. In this paper we propose an operation, called simultaneous parallel reduction(SPR), that is amenable to such optimization. SPR performs reduction operations in parallel, each operation reducing a onedimensional consecutive section of a distributed array. Each element of the distributed array is used as an operand to many reductions executed concurrently over the overlapping array's sections. SPR is distinct from a more commonly considered parallel reduction which concurrently evaluates a single reduction. In this paper we consider SPR on Single Instruction Multiple Data (SIMD) machines with different interconnection networks. We focus on SPR over sections whose size is not a power of 2 with the result shifted relative to the arguments. Several algorithms achieving some of the lower bounds on SPR complexity are presented under various assumptions about the properties of the binary operator of the reduction and of the communication cost of the target architectures.