## Optimal Broadcasting in Mesh-Connected Architectures (1991)

Citations: | 6 - 0 self |

### BibTeX

@TECHREPORT{Barnett91optimalbroadcasting,

author = {Michael Barnett and David G. Payne and Robert Van De Geijn},

title = {Optimal Broadcasting in Mesh-Connected Architectures},

institution = {},

year = {1991}

}

### OpenURL

### Abstract

In this paper, we disprove the common assumption that the time for broadcasting in a mesh is at best proportional to the square root of the number of processors, at least in the presence of worm-hole routing. We present an optimal algorithm for broadcasting in mesh-connected distributed-memory architectures with worm-hole routing. By organizing the processing nodes in a logical spanning tree, the algorithm executes in time proportional to the logarithm of the number of nodes without inducing contention in the communication network. We restrict the number of nodes in each dimension of the processor mesh to be a power of two. Our method provides insight into how to avoid and/or reduce network contention on meshes for other communication operations. Experimental results on the Intel Touchstone Delta system are included. Keywords: distributed-memory, mesh-connected, broadcast, parallel processing, worm-hole routing 1 Introduction We investigate broadcast algorithms for mesh-co...

### Citations

117 | LAPACK: A portable linear algebra library for high-performance computers
- Anderson, Bai, et al.
- 1990
(Show Context)
Citation Context ...h was motivated by the need for efficient broadcast algorithms for codes being developed as part of an effort to port a subset of algorithms included in LAPACK to distributedmemory MIMD architectures =-=[2, 1]-=-. In particular, the broadcast discussed in this paper has been incorporated in an implementation on the Delta of a proposed communication library, the Basic Linear Algebra Communication Subprograms (... |

38 |
Distributed routing algorithms for broadcasting and personalized communication in hypercubes
- Ho, Johnsson
- 1986
(Show Context)
Citation Context ...oduction We investigate broadcast algorithms for mesh-connected distributed-memory architectures with worm-hole routing. While broadcast algorithms on hypercubes have been studied for quite some time =-=[6, 7, 10]-=-, such algorithms can cause network contention on mesh-connected architectures. We will show that, under reasonable assumptions, there is a hypercube-optimal algorithm that is also optimal for meshes.... |

31 |
de Geijn. Reduction to condensed form for the eigenvalue problem on distributed memory computers. Computer Science Dept
- Dongarra, van
- 1991
(Show Context)
Citation Context ... a proposed communication library, the Basic Linear Algebra Communication Subprograms (BLACS) [3], and is used by parallel implementations of the LU factorization [12] and matrix reduction algorithms =-=[5]-=-. Section 2 discusses linear arrays and the extension to meshes is in Section 3. Each section includes timing results on the Intel Touchstone Delta system. Conclusions and future work are discussed in... |

31 |
The Touchstone 30 Gigaflop DELTA prototype
- Lillevik
- 1996
(Show Context)
Citation Context ...ss developed jointly 3 This research was supported in part by Intel Corporation. 1 2 BROADCASTING ON LINEAR ARRAYS 2 by the Defense Advanced Research Projects Agency (DARPA) and the Intel Corporation =-=[8]-=-. It consists of 520 i860-based nodes, interconnected via a communications network having the topology of a two-dimensional rectangular grid. (Scaling is not restricted to a power-of-two increment typ... |

22 |
de Geijn. Massively parallel LINPACK benchmark on the Intel Touchstone Delta and iPSC/860 systems
- van
- 1991
(Show Context)
Citation Context ... in an implementation on the Delta of a proposed communication library, the Basic Linear Algebra Communication Subprograms (BLACS) [3], and is used by parallel implementations of the LU factorization =-=[12]-=- and matrix reduction algorithms [5]. Section 2 discusses linear arrays and the extension to meshes is in Section 3. Each section includes timing results on the Intel Touchstone Delta system. Conclusi... |

18 |
de geijn. Lapack for distributed memory architecture progress report
- Anderson, Benzoni, et al.
- 1991
(Show Context)
Citation Context ...h was motivated by the need for efficient broadcast algorithms for codes being developed as part of an effort to port a subset of algorithms included in LAPACK to distributedmemory MIMD architectures =-=[2, 1]-=-. In particular, the broadcast discussed in this paper has been incorporated in an implementation on the Delta of a proposed communication library, the Basic Linear Algebra Communication Subprograms (... |

16 |
de Geijn, â€œBasic linear algebra communication subprograms
- Anderson, Benzoni, et al.
- 1991
(Show Context)
Citation Context ...rticular, the broadcast discussed in this paper has been incorporated in an implementation on the Delta of a proposed communication library, the Basic Linear Algebra Communication Subprograms (BLACS) =-=[3]-=-, and is used by parallel implementations of the LU factorization [12] and matrix reduction algorithms [5]. Section 2 discusses linear arrays and the extension to meshes is in Section 3. Each section ... |

13 |
de Geijn. Efficient global combine operations
- van
- 1991
(Show Context)
Citation Context ...the least significant bit should be toggled first. Moreover, similar techniques can be applied to more complicated communications, e.g., total exchange [6, 10] and more complicated combine algorithms =-=[6, 11, 13]-=-, in order to reduce network conflicts. In a future paper, we will generalize our algorithm to allow the number of processors in a dimension of the mesh to be any number, not just a power of two. Ackn... |

8 |
de Geijn, On Global Combine Operations, LAPACK Working Note 29
- van
- 1991
(Show Context)
Citation Context ...the least significant bit should be toggled first. Moreover, similar techniques can be applied to more complicated communications, e.g., total exchange [6, 10] and more complicated combine algorithms =-=[6, 11, 13]-=-, in order to reduce network conflicts. In a future paper, we will generalize our algorithm to allow the number of processors in a dimension of the mesh to be any number, not just a power of two. Ackn... |

2 | Broadcast communication delay metric for iPSC/2 and iPSC/860 hypercubes - McCreary, Mcardle, et al. - 1991 |

2 |
Data communciation in parallel architectures
- Saad, Schultz
- 1986
(Show Context)
Citation Context ...oduction We investigate broadcast algorithms for mesh-connected distributed-memory architectures with worm-hole routing. While broadcast algorithms on hypercubes have been studied for quite some time =-=[6, 7, 10]-=-, such algorithms can cause network contention on mesh-connected architectures. We will show that, under reasonable assumptions, there is a hypercube-optimal algorithm that is also optimal for meshes.... |