Results 1  10
of
16
Matrix Multiplication on Hypercubes Using Full Bandwidth and Constant Storage
 in Proceeding of the Sixth Distributed Memory Computing Conference
, 1991
"... For matrix multiplication on hypercube multiprocessors with the product matrix accumulated in place a processor must receive about P 2 = p N elements of each input operand, with operands of size P \Theta P distributed evenly over N processors. With concurrent communication on all ports, the numb ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
For matrix multiplication on hypercube multiprocessors with the product matrix accumulated in place a processor must receive about P 2 = p N elements of each input operand, with operands of size P \Theta P distributed evenly over N processors. With concurrent communication on all ports, the number of element transfers in sequence can be reduced to P 2 = p N log N for each input operand. We present a twolevel partitioning of the matrices and an algorithm for the matrix multiplication with optimal data motion and constant storage. The algorithm has sequential arithmetic complexity 2P 3 , and parallel arithmetic complexity 2P 3 =N . The algorithm has been implemented on the Connection Machine model CM2. For the performance on the 8K CM2, we measured about 1.6 Gflops, which would scale up to about 13 Gflops for a 64K full machine. 1 Introduction The multiplication of matrices is an important operation in many computationally intensive scientific applications. Effective use...
Updating the hamiltonian problem  a survey
 J. Graph Theory
, 1991
"... This article is intended as a survey, updating earlier surveys in the area. For completeness of the presentation of both particular questions and the general area, it also contains material on closely related topics such as traceable, pancyclic and hamiltonianconnected graphs and digraphs. 1 ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
This article is intended as a survey, updating earlier surveys in the area. For completeness of the presentation of both particular questions and the general area, it also contains material on closely related topics such as traceable, pancyclic and hamiltonianconnected graphs and digraphs. 1
Oblivious Gossiping on Tori
 Journal of Algorithms
"... Nearoptimal gossiping algorithms are given for two and higher dimensional tori assuming the fullport storeandforward communication model. For twodimensional tori, a previous algorithm achieved optimality in an intricate way, with an adaptive routing pattern. In contrast, the PUs in our algo ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Nearoptimal gossiping algorithms are given for two and higher dimensional tori assuming the fullport storeandforward communication model. For twodimensional tori, a previous algorithm achieved optimality in an intricate way, with an adaptive routing pattern. In contrast, the PUs in our algorithm forward the received packets always in the same way. We thus achieve almost the same performance with patterns that might be hardwired.
Hamilton circuits in directed Butterfly networks
 Research Report #2925  Theme 1, INRIA Sophia Antipolis
, 1995
"... apport de recherche ISSN 02496399Hamilton circuits in the directed Butterfly network ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
apport de recherche ISSN 02496399Hamilton circuits in the directed Butterfly network
Revisiting Hamiltonian Decomposition of the Hypercube
 SBCCI2000  XIII Symposium on Integrated Circuits and System Design
, 2000
"... this paper we study a useful namely the Hamiltonian decomposition, i.e. the partitioning of its edge set into Hamiltonian cycles. It is known that there are bn=2c disjoint Hamiltonian cycles on a binary ncube. The proof of this result, however, does not give rise to any simple construction algorith ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
this paper we study a useful namely the Hamiltonian decomposition, i.e. the partitioning of its edge set into Hamiltonian cycles. It is known that there are bn=2c disjoint Hamiltonian cycles on a binary ncube. The proof of this result, however, does not give rise to any simple construction algorithm of such cycles. In a previous work Song presents ideas towards a simple method to this problem. First decompose the hypercube into cycles of length 16, C 16 , and then apply a merge operator to join the C 16 cycles into larger Hamiltonian cycles. The case of dimension n = 6 (a 64node hypercube) is illustrated. He conjectures the method can be generalized for any even n. In this paper, we generalize the rst phase of that method for any even n and prove its correctness. Also we show four possible merge operators for the case of n = 8 (a 256node hypercube). This result can be viewed as a step toward the general merge operator, thus proving the conjecture
TimeIndependent Gossiping on FullPort Tori
 MaxPlanck Institut fr Informatik
, 1998
"... Nearoptimal gossiping algorithms are given for two and higher dimensional tori. It is assumed that the amount of data each PU contributes is so large that startup time may be neglected. For twodimensional tori, a previous algorithm achieved optimality in an intricate way, with a timedependent r ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Nearoptimal gossiping algorithms are given for two and higher dimensional tori. It is assumed that the amount of data each PU contributes is so large that startup time may be neglected. For twodimensional tori, a previous algorithm achieved optimality in an intricate way, with a timedependent routing pattern. In all steps of our algorithms, the PUs forward the received packets in the same way.
Hamilton cycle decomposition of the Butterfly network
, 1996
"... In this paper, we prove that the wrapped Butterfly graph WBF(d;n) of degree d and dimension n is decomposable into Hamilton cycles. This answers a conjecture of D. Barth and A. Raspaud who solved the case d = 2. ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
In this paper, we prove that the wrapped Butterfly graph WBF(d;n) of degree d and dimension n is decomposable into Hamilton cycles. This answers a conjecture of D. Barth and A. Raspaud who solved the case d = 2.
Gossiping Large Packets on FullPort Tori
 In Proc. EuroPar 1998 Parallel Processing, volume 1470 of LNCS
, 1998
"... Nearoptimal gossiping algorithms are given for two and higher dimensional tori. It is assumed that the amount of data each PU is contributing is so large, that startup time may be neglected. For twodimensional tori, an earlier algorithm achieved optimality in an intricate way, with a timedepend ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Nearoptimal gossiping algorithms are given for two and higher dimensional tori. It is assumed that the amount of data each PU is contributing is so large, that startup time may be neglected. For twodimensional tori, an earlier algorithm achieved optimality in an intricate way, with a timedependent routing pattern. In our algorithms, in all steps, the PUs forward the received packets in the same way.
Towards a simple construction method for Hamiltonian decomposition of the hypercube
, 1994
"... . We consider the problem of Hamiltonian decomposition on the hypercube. It is known that there exist bn=2c edgedisjoint Hamiltonian cycles on a binary ncube. However, there are still no simple algorithms to construct such cycles. We present some promising results that may lead to a very simple me ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
. We consider the problem of Hamiltonian decomposition on the hypercube. It is known that there exist bn=2c edgedisjoint Hamiltonian cycles on a binary ncube. However, there are still no simple algorithms to construct such cycles. We present some promising results that may lead to a very simple method to obtain the Hamiltonian decomposition. The binary ncube is equivalent to the Cartesian product of cycles of length four (C4 \Theta C4 : : : \Theta C4 ). Case n = 4 is trivial. For the case n = 6, we first partition the set of edges of the C 4 \Theta C 4 \Theta C 4 into 12 disjoint cycles of length 16. We then present an operator to merge the cycles to produce the desired Hamiltonian cycles. In general the edge set of n=2 products C 4 \Theta C 4 : : : \Theta C 4 , can be partitioned into n2 n =32 disjoint cycles of length 16. It remains to formalize the merge operator in the general case. 1. Introduction The problem of finding edgedisjoint cycles on a hypercube can be important i...
Construction of EdgeDisjoint Spanning Trees in the Torus and Application to Multicast in WormholeRouted Networks
 Proc. 1999 Int’l Conf. on Parallel and Distributed Computing Systems
, 1999
"... A treebased multicast algorithm for wormholerouted torus networks, which makes use of multiple edgedisjoint spanning trees is presented. A technique for constructing two spanning trees in 2dimensional torus networks is described. It is formally proven that this construction produces two edgedisj ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
A treebased multicast algorithm for wormholerouted torus networks, which makes use of multiple edgedisjoint spanning trees is presented. A technique for constructing two spanning trees in 2dimensional torus networks is described. It is formally proven that this construction produces two edgedisjoint spanning trees in any 2D torus network. Compared with an algorithm for construction of multiple edgedisjoint spanning trees in arbitrary networks, our construction produces significantly lower maximum and average path lengths. Finally, two approaches to providing single link fault tolerance with edgedisjoint spanning trees are presented and evaluated. Keywords: Deadlock freedom, edgedisjoint spanning trees, treebased multicast, wormhole routing. 1 Introduction Multicast communication involves one multicomputer node sending messages to a subset of the other nodes in the system. Multicast can be used to build many useful operations such as barrier synchronization, cache invalidati...