Results 1 - 10
of
95
A survey of wormhole routing techniques in direct networks
- IEEE Computer
, 1993
"... messages is critical to the performance of direct network systems. The popular wormhole routing technique faces several challenges-particularly flow control and deadlock avoidance. 62 0018.9162/93/0200-0062$03.00 0 1993 IEEE assively parallel computers with thousands of processors are considered the ..."
Abstract
-
Cited by 429 (39 self)
- Add to MetaCart
messages is critical to the performance of direct network systems. The popular wormhole routing technique faces several challenges-particularly flow control and deadlock avoidance. 62 0018.9162/93/0200-0062$03.00 0 1993 IEEE assively parallel computers with thousands of processors are considered the most promising technology to achieve teraflops computational power. Such large-scale multiprocessors are usually organized as ensembles of nodes, where each node has its own processor, local memory, and other supporting devices. These nodes may have different functional capabilities. For
The Turn Model for Adaptive Routing
- In Proceedings of the International Symposium on Computer Architecture
, 1992
"... We present a model for designing wormhole routing algorithms that are deadlock free, livelock free, minimal or nonminimal, and maximally adaptive. A unique feature of this model is that it is not based on adding physical or virtual channels to network topologies (though it can be applied to networks ..."
Abstract
-
Cited by 247 (6 self)
- Add to MetaCart
We present a model for designing wormhole routing algorithms that are deadlock free, livelock free, minimal or nonminimal, and maximally adaptive. A unique feature of this model is that it is not based on adding physical or virtual channels to network topologies (though it can be applied to networks with extra channels). Instead, the model is based on analyzing the directions in which packets can turn in a network and the cycles that the turns can form. Prohibiting just enough turns to break all of the cycles produces routing algorithms that are deadlock free, livelock free, minimal or nonminimal, and maximally adaptive for the network. In this paper, we focus on the two most common network topologies for wormhole routing, n-dimensional meshes and k-ary n-cubes, without extra channels. In an n-dimensional mesh, just a quarter of the turns must be prohibited to prevent deadlock. The remaining three quarters of the turns permit partial adaptiveness in routing. Partially adaptive routing ...
A Survey of Collective Communication in Wormhole-Routed Massively Parallel Computers
- IEEE COMPUTER
, 1994
"... Massively parallel computers (MPC) are characterized by the distribution of memory among an ensemble of nodes. Since memory is physically distributed, MPC nodes communicate by sending data through a network. In order to program an MPC, the user may directly invoke low-level message passing primitive ..."
Abstract
-
Cited by 93 (6 self)
- Add to MetaCart
Massively parallel computers (MPC) are characterized by the distribution of memory among an ensemble of nodes. Since memory is physically distributed, MPC nodes communicate by sending data through a network. In order to program an MPC, the user may directly invoke low-level message passing primitives, may use a higher-level communications library, or may write the program in a data parallel language and rely on the compiler to translate language constructs into communication operations. Whichever method is used, the performance of communication operations directly affects the total computation time of the parallel application. Communication operations may be either point-to-point, which involves a single source and a single destination, or collective, in which more than two processes participate. This paper discusses the design of collective communication operations for current systems that use the wormhole routing switching strategy, in which messages are divided into small pieces and...
Multidestination Message Passing Mechanism Conforming to Base Wormhole Routing Scheme
, 1994
"... This paper proposes a novel concept of multidestination worm mechanism which allows a message to be propagated in a wormhole network conforming to the underlying base routing scheme (ecube, planar, turn, or fully adaptive). Using this model, any source has potential to deliver a message to multiple ..."
Abstract
-
Cited by 56 (22 self)
- Add to MetaCart
This paper proposes a novel concept of multidestination worm mechanism which allows a message to be propagated in a wormhole network conforming to the underlying base routing scheme (ecube, planar, turn, or fully adaptive). Using this model, any source has potential to deliver a message to multiple destinations in any valid path in the system conforming to the base routing scheme while encountering only a single communication start-up. The flexibility of sending unicast messages exists under this model as a subset operation. Two schemes are developed and evaluated under this model to perform fast multicasting on 2D/3D meshes/tori. Not only do these schemes demonstrate superiority over Umesh (unicast-based multicast) [25] and Hamiltonian-Path-based [24] schemes, they indicate a very unique and interesting result that the cost of multicast operations can be reduced or kept near-constant in e-cube systems as the degree of multicast (number of destinations/src) increases. The proposed sc...
Efficient Implementation of Barrier Synchronization in Wormhole-Routed Hypercube Multicomputers
, 1992
"... This paper addresses efficient implementation of barrier synchronization in wormhole-routed hypercube multicomputers. For those systems supporting only unicast communication in hardware, a novel software tree approach, the U-cube tree, is proposed. An important feature of the U-cube tree is that all ..."
Abstract
-
Cited by 52 (17 self)
- Add to MetaCart
This paper addresses efficient implementation of barrier synchronization in wormhole-routed hypercube multicomputers. For those systems supporting only unicast communication in hardware, a novel software tree approach, the U-cube tree, is proposed. An important feature of the U-cube tree is that all messages injected into the network are guaranteed to be contention-free. Performance measurements of several barrier synchronization techniques implemented on a 64-node nCUBE-2 are given.
Multicast snooping: a new coherence method using a multicast address network
- In Proceedings of the 26th Annual International Symposium on Computer architecture(ISCA
, 1999
"... This paper proposes a new coherence method called “multicast snooping ” that dynamically adapts between broadcast snooping and a directory protocol. Multicast snooping is unique because processors predict which caches should snoop each coherence transaction by specifying a multicast “mask. ” Transac ..."
Abstract
-
Cited by 40 (7 self)
- Add to MetaCart
This paper proposes a new coherence method called “multicast snooping ” that dynamically adapts between broadcast snooping and a directory protocol. Multicast snooping is unique because processors predict which caches should snoop each coherence transaction by specifying a multicast “mask. ” Transactions are delivered with an ordered multicast network, such as an Isotach network, which eliminates the need for acknowledgment messages. Processors handle transactions as they would with a snooping protocol, while a simplified directory operates in parallel to check masks and gracefully handle incorrect ones (e.g., previous owner missing). Preliminary performance numbers with mostly SPLASH-2 benchmarks running on 32 processors show that we can limit multicasts to an average of 2-6 destinations (<< 32) and we can deliver 2-5 multicasts per network cycle (>> broadcast snooping’s 1 per cycle). While these results do not include timing, they do provide encouragement that multicast snooping can obtain data directly (like broadcast snooping) but apply to larger systems (like directories). 1
Multi-Address Encoding for Multicast
- In Proceedings of the Parallel Computer Routing and Communication Workshop
, 1994
"... . Efficient implementation of multicast communication is critical to the performance of message-based scalable parallel computers and switch-based high speed networks. This paper deals with address issues occurring in the message header for the transmission of multicast messages. Multi-address encod ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
. Efficient implementation of multicast communication is critical to the performance of message-based scalable parallel computers and switch-based high speed networks. This paper deals with address issues occurring in the message header for the transmission of multicast messages. Multi-address encoding is becoming critical to system performance as the scale of networks is getting larger and the demand of multicast communication is getting higher. Several multi-address encoding schemes are investigated and explored. Although the proposed multi-address encoding schemes can be applied to networks with different switching techniques, the emphasis of this paper is on the emerging wormhole routing technique. 1 Introduction Multicast communication, which refers to the delivery of a message from a single source node to a number of destination nodes, is a frequently used communication pattern in distributed-memory parallel computers and computer networks. Efficient implementation of multicast ...
Multicast on Irregular Switch-based Networks with Wormhole Routing
- IN PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA-3
, 1997
"... This paper presents efficient multicasting with reduced contention on irregular networks with switchbased wormhole interconnection and unicast message passing. First, it is proved that for an arbitrary irregular network with a typical deadlock-free, adaptive routing, it may not be possible to create ..."
Abstract
-
Cited by 31 (10 self)
- Add to MetaCart
This paper presents efficient multicasting with reduced contention on irregular networks with switchbased wormhole interconnection and unicast message passing. First, it is proved that for an arbitrary irregular network with a typical deadlock-free, adaptive routing, it may not be possible to create an ordered list of nodes to implement an arbitrary multicast in a contention-free manner with minimal number of communication steps. Next, three different multicast algorithms are proposed with their respective node orderings to reduce contention: switchbased ordering (SO), switch-based hierarchical ordering (SHO), and chain concatenation ordering (CCO). A variation of a binomial tree-based communication pattern with unicast message passing is used on the above ordered lists to implement multicast. The proposed multicast algorithms are compared with each other as well as with the naive random ordering (RO) algorithm for a range of system sizes, switch sizes, message lengths, degrees of co...
Should Scalable Parallel Computers Support Efficient Hardware Multicast?
, 1995
"... Multicast communication is a frequently invoked communication pattern in many parallel algorithms. Although some parallel computer vendors have tried to directly support multicast in hardware, most vendors use software approach to support multicast atop existing unicast communications. This position ..."
Abstract
-
Cited by 30 (1 self)
- Add to MetaCart
Multicast communication is a frequently invoked communication pattern in many parallel algorithms. Although some parallel computer vendors have tried to directly support multicast in hardware, most vendors use software approach to support multicast atop existing unicast communications. This position paper shows the need of efficient hardware multicast support, illustrates some possible approaches to support wormhole-switched multicast, describes difficulties encountered when supporting multiple multicasts, and points out some research issues. 1 The Need of Multicast Communication Multicast communication, which refers to the delivery of a message from a source node to a number of destination nodes, is a frequently used communication pattern in distributed-memory parallel computers. Such pattern is fundamental in many applications and has been identified in many parallel languages. ffl MPI: Message Passing Interface [1]. ffi MPI Barrier: barrier synchronization ffi MPI Bcast: broadc...
An Euler-Path-Based Multicasting Model for Wormhole-Routed Networks with Multi-Destination Capability
, 1996
"... This research is supported by the National Science Council of the Republic of China under Grant # NSC86-2213-E-008-029 and Grant # NSC86-2213-E-216-021. 1 Recently, wormhole routers with multi-destination capability have been proposed to support fast multicast in a multi-computer network. In thi ..."
Abstract
-
Cited by 29 (5 self)
- Add to MetaCart
This research is supported by the National Science Council of the Republic of China under Grant # NSC86-2213-E-008-029 and Grant # NSC86-2213-E-216-021. 1 Recently, wormhole routers with multi-destination capability have been proposed to support fast multicast in a multi-computer network. In this paper, we develop a new multicasting model for such networks based on the concept of Euler path/circuit in graph theory. The model can support multiple concurrent multicasts freely from deadlock and can be applied to any network which is Eulerian or is Eulerian after some links being removed. No virtual channels are needed. In particular, we demonstrate the potential of this model by showing its fault-tolerant capability in supporting multicasting in the currently popular torus/mesh topology of any dimension with regular fault patterns (such as single node, block, L-shape, +-shape, U-shape, and H-shape) and even irregular fault patterns. It is the first multicasting model known t...

