Results 1 
7 of
7
CircuitSwitched Gossiping in the 3Dimensional Torus Networks
, 1997
"... In this paper we describe, in the case of short messages, an efficient gossiping algorithm for 3dimensional torus networks (wraparound or toroidal meshes) that uses synchronous circuitswitched routing. The algorithm is based on a recursive decomposition of a torus. The protocol requires an optima ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
In this paper we describe, in the case of short messages, an efficient gossiping algorithm for 3dimensional torus networks (wraparound or toroidal meshes) that uses synchronous circuitswitched routing. The algorithm is based on a recursive decomposition of a torus. The protocol requires an optimal number of rounds and a quasioptimal number of intermediate switch settings to gossip in an 7^i × 7^i × 7^i torus.
High Performance Scalable Matrix Algebra Algorithms for Distributed Memory Architectures
 Overall Best Student Paper Award
, 1992
"... Our experimental results showed that block based algorithms for numerically intensive applications are superior to their noblock counterpart[10]. It is desirable to parallelize block based algorithms on distributed memory MIMD architectures since many scientific and engineering applications make use ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Our experimental results showed that block based algorithms for numerically intensive applications are superior to their noblock counterpart[10]. It is desirable to parallelize block based algorithms on distributed memory MIMD architectures since many scientific and engineering applications make use of these algorithms. Our goal is to optimize sample applications from LAPACK, develop them in Fortran 77D and Fortran 90D, and have them available as a scalable compiler library. In the presented study, we show ways to parallelize sequential block algorithms for the LU factorization. The goal of this paper is twofold. On one hand, since these algorithms are difficult to parallelize they will be included in a benchmarking suite for the Fortran 90D project [7]. We point out problems inherent in the sequential nature of the block based algorithms. We learn that it is not intuitively clear which algorithm might perform best on a distributed memory architecture. The problems described here will ...
A Systematic Approach to Develop Efficient Complete Exchange Algorithms for Meshes and Tori
 G A G B S
, 1997
"... Many authors have considered the design of complete exchange algorithms for a variety of multicomputer models, including hypercubes, multidimensional meshes and tori with different port and message switching models. Frequently, algorithms for a given multicomputer architecture cannot be used (or are ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Many authors have considered the design of complete exchange algorithms for a variety of multicomputer models, including hypercubes, multidimensional meshes and tori with different port and message switching models. Frequently, algorithms for a given multicomputer architecture cannot be used (or are not efficient) for a different architecture. This paper presents a method which allows the systematic design of complete exchange algorithms for a wide range of multicomputer architectures, including the cases usually considered in the literature and some other architectures that may be interesting in the future. Performance figures obtained by analytical models show that algorithms obtained through the proposed method are efficient for almost all the multicomputer models under consideration and outperform the best known algorithms for a significant range of the problem and system parameters. Keywords: complete exchange, multidimensional meshes/tori, circuit switching, oneport and allpor...
Issues in the Design of Direct Multiprocessor Networks
, 1997
"... this paper, every packet is broken into a number of flits, and buffering, forwarding and flowcontrol are performed at the flit level. The flits of a packet are sent consecutively over a channel, so the flits of two packets are never interleaved. Flits themselves are actually transmitted a phit (phys ..."
Abstract
 Add to MetaCart
this paper, every packet is broken into a number of flits, and buffering, forwarding and flowcontrol are performed at the flit level. The flits of a packet are sent consecutively over a channel, so the flits of two packets are never interleaved. Flits themselves are actually transmitted a phit (physical transfer unit) at a time, which is typically the size of the link width, something that can be transferred in a single clock cycle. Figure 2 illustrates how a message is partitioned into packets, into flits and then into phits. Network routers connect the processing nodes to the network and manage the links to the neighboring nodes. A router normally contains communication processing logic as well as a set of buffers to hold flits. It handles all communication related tasks to allow computation (by the processor) and communication at the node to take place concurrently. These communication tasks include relaying packets from one node to the next in the direction of the packets' destination node(s) (switching and routing), preventing buffer overflow (flowcontrol), removing packets from the network if destined for the local node, and injecting packets from the local node Direct Multiprocessor Networks \Delta 3 H D1 D2 D3 H H H F1 F2 P1 P2 Packets Flits Phits Message Message Routing unit Switching and flowcontrol unit Transmission unit Application unit P1 P2 P1 Fig. 2. The figure illustrates the message, packets, flits and phits in direct networks. into the network (switching). In addition, some routers also assemble packets into messages and disassemble messages into packets. The behavior of a direct network is determined primarily by how it does switching, routing and flowcontrol. Switching is the mechanism by which a router removes a packet from its input link and p...
Diffusion En Mode Commutation De Circuits Dans Les Tores De Dimension
"... . This paper deals with broadcasting in k\Gammadimensional torus network under the circuitswitched routing model. We suppose that a node can send a message simultaneously on all its outlinks. Here, we consider a broadcast protocol as a succession of rounds ; during each round, the communication dip ..."
Abstract
 Add to MetaCart
. This paper deals with broadcasting in k\Gammadimensional torus network under the circuitswitched routing model. We suppose that a node can send a message simultaneously on all its outlinks. Here, we consider a broadcast protocol as a succession of rounds ; during each round, the communication dipaths used by the algorithm must be arcdisjoint. We give optimal protocols for the number of rounds and near of the optimal for the length of communication dipaths. This work generalizes that of Peters and Syska [PET 96] concerning the case k = 2. We use tools of linear coding theory and describe in details the cases k = 3 and k = 4. MOTSCLE S : diffusion, commutation de circuits, tore, theorie des codes. KEY WORDS : broadcasting, circuitswitching, torus network, coding theory. 2 1.Introduction En algorithmique parallele et distribuee il est important de disposer d'une part d'un protocole de routage efficace (fonction de routage) [FRA 95] et d'autre part d'algorithmes de communication...