#### DMCA

## VIRTUAL CHANNELS IN WORMHOLE ROUTERS (1999)

### Citations

2172 | Randomized Algorithms - Motwani, Raghavan - 1995 |

1471 |
Introduction To Parallel Algorithms And Architectures: Arrays
- Leighton
- 1992
(Show Context)
Citation Context ...vious and Related Research There is a vast literature devoted to the subject of routing messages in networks. For a broader treatment of the subject, see the survey paper by Leighton [28] or his book =-=[25]-=-. In what follows we describe only the research most closely related to this paper. 1.3.1. Network-independent algorithms. The idea of decoupling the path selection process from the scheduling process... |

623 | Virtual-channel flow control
- Dally
- 1992
(Show Context)
Citation Context ...e architecture of the Ametek Series 2010 Multicomputer [44], and Dally and Seitz [17] describe the Torus Routing Chip. In the second category, two of the most influential papers were written by Dally =-=[15, 16]-=-. The first analyzes the behavior of wormhole routing algorithms for k-ary n-cubes. The second analyzes the effect of virtual channels on the throughput of multistage networks. In the third category, ... |

377 | Virtual cut-through: a new computer communication switching technique. Computer Networks
- Kermani, Kleinrock
- 1979
(Show Context)
Citation Context ...ralization of wormhole routing is to allow a switch to buffer more than one flit per message, perhaps even allowing-it to buffer an entire message. This approach is called virtual cut-through routing =-=[21]-=-, and predates wormhole routing. In wormhole or virtual cut-through routing it is customary to measure time in flit steps, where a flit step is the time taken to transmit one flit across a single link... |

354 | Performance analysis of k-ary n-cube interconnection networks
- Dally
- 1990
(Show Context)
Citation Context ...e architecture of the Ametek Series 2010 Multicomputer [44], and Dally and Seitz [17] describe the Torus Routing Chip. In the second category, two of the most influential papers were written by Dally =-=[15, 16]-=-. The first analyzes the behavior of wormhole routing algorithms for k-ary n-cubes. The second analyzes the effect of virtual channels on the throughput of multistage networks. In the third category, ... |

284 | The Torus Routing Chip
- Dally, Seitz
- 1986
(Show Context)
Citation Context ...s worst-case bounds on the running times of wormhole routing algorithms. In the first category Seitz et al. describe the architecture of the Ametek Series 2010 Multicomputer [44], and Dally and Seitz =-=[17]-=- describe the Torus Routing Chip. In the second category, two of the most influential papers were written by Dally [15, 16]. The first analyzes the behavior of wormhole routing algorithms for k-ary n-... |

241 |
A scheme for fast parallel communication
- Valiant
- 1982
(Show Context)
Citation Context ... L-flit message to send. The algorithm runs in O((L+log n) log n) flit steps. In Problem 3.286, he observes that the algorithm can be converted to one that routes any permutation using Valiant's idea =-=[47]-=- of first routing to random intermediate destinations. For the interesting case of L=O(log n), the time is O(log 2 n) flit steps. The algorithm can easily be generalized to the case q>1 and runs in O(... |

161 |
The Performance of Multistage Interconnection Networks for Multiprocessors
- Kruskal, Snir
- 1983
(Show Context)
Citation Context ...uit-switching on a butterfly network. In a circuit-switched network, each message must lock-down a dedicated path from its input node to its output node before it can be transmitted. Kruskal and Snir =-=[24]-=- showed that if each input in an n-input circuit-switched butterfly network sends a message to a randomly-chosen output, and at most one message can use any edge of the network, then the expected numb... |

151 | The J-Machine Multicomputer: An Architectural Evaluation
- Noakes, Wallach, et al.
- 1993
(Show Context)
Citation Context ...emic Press 1. INTRODUCTION Wormhole routing has become the routing method of choice in the latest generation of parallel computers, including experimental machines such as iWarp [8] and the J-Machine =-=[37]-=-, and commercial machines such as the Intel Paragon, Cray T3D [23], and Connection Machine CM-5 [31]. In a wormhole router, the bits in a message are grouped into a sequence of flits, where a flit is ... |

139 | iWarp: An Integrated Solution to High-Speed Parallel Computing
- Borkar, Cohn, et al.
- 1988
(Show Context)
Citation Context ... algorithms. 2001 Academic Press 1. INTRODUCTION Wormhole routing has become the routing method of choice in the latest generation of parallel computers, including experimental machines such as iWarp =-=[8]-=- and the J-Machine [37], and commercial machines such as the Intel Paragon, Cray T3D [23], and Connection Machine CM-5 [31]. In a wormhole router, the bits in a message are grouped into a sequence of ... |

119 | Packet routing and jobshop scheduling in o(Congestion + Dilation) steps
- Leighton, Maggs, et al.
- 1994
(Show Context)
Citation Context ...work-independent algorithms. The idea of decoupling the path selection process from the scheduling process, and then analyzing the scheduling process alone, was first used by Leighton, Maggs, and Rao =-=[27]-=-. They proved that for any set of messages whose paths are edge-simple (i.e., no path uses the same edge more than once) and have congestion C and dilation D, there is a store-and-forward schedule tha... |

88 | A permutation network
- Waksman
- 1968
(Show Context)
Citation Context ...e edge-disjoint paths between the inputs and outputs of a Benes network in anysVIRTUAL CHANNELS IN WORMHOLE ROUTERS permutation. A Benes network is simply two back-to-back butterfly networks. Waksman =-=[48]-=- gave an elegant linear time algorithm for determining how the nodes should be set in order to realize any particular permutation. Waksman's algorithm can be used for wormhole routing. It shows how to... |

86 | Randomized routing and sorting on fixed-connection networks
- Leighton, Maggs, et al.
- 1994
(Show Context)
Citation Context ...at only require edge buffers of size 2. The O(C+D) bound of Leighton et al. was followed by a number of algorithmic results. For the special case of leveled networks, Leighton, Maggs, Ranade, and Rao =-=[26]-=- presented a simple online store-and-forward algorithm for routing any set of n messages in a leveled network with depth D in O(C+D+log n) message steps. In a leveled network with depth D, each node i... |

74 |
The network architecture of the Connection Machine CM-5
- Pierre, Wong, et al.
- 1992
(Show Context)
Citation Context ...neration of parallel computers, including experimental machines such as iWarp [8] and the J-Machine [37], and commercial machines such as the Intel Paragon, Cray T3D [23], and Connection Machine CM-5 =-=[31]-=-. In a wormhole router, the bits in a message are grouped into a sequence of flits, where a flit is the smallest unit of information that can be buffered at a node of the network. Typically the number... |

66 | On-line algorithms for path selection in a nonblocking network
- Arora, Leighton, et al.
- 1996
(Show Context)
Citation Context ...e number of virtual channels to be a small constant larger than one and assumes that each hypercube node can service all log n of its edges simultaneously. On the multibutterfly network, Arora et al. =-=[3]-=- devised an algorithm for routing n L-flit messages from the inputs to the outputs of an n-input network in O(L+log n) flit steps. The algorithm can also be applied to a multi-Benes network (two back-... |

66 | Lectures on the Probabilistic Method. p - Spencer - 1987 |

57 |
Deadlock Free Message Routing in Multiprocessor Interconnection Networks
- Dally, Seitz
- 1987
(Show Context)
Citation Context ...d latency, wormhole routing also has the advantage that it can be implemented with small, fast switches. Wormhole routing owes much of its recent popularity to an influential paper by Dally and Seitz =-=[14]-=-, which introduced the method. Much of the paper by Dally and Seitz is devoted to the design of wormhole routing algorithms that avoid deadlock. A wormhole routing algorithm can deadlock if the header... |

56 |
Optimal rearrangeable multistage connecting networks
- Bene´s
- 1964
(Show Context)
Citation Context ...ll not review the previous results. Descriptions of several of the algorithms can be found in [25, 28]. Instead, we focus on algorithms for wormhole routing. In two early papers, Beizer [6] and Benes =-=[7]-=- showed that it is possible to route edge-disjoint paths between the inputs and outputs of a Benes network in anysVIRTUAL CHANNELS IN WORMHOLE ROUTERS permutation. A Benes network is simply two back-t... |

54 |
Methods for message routing in parallel machines
- Leighton
- 1992
(Show Context)
Citation Context ...ination. 1.3. Previous and Related Research There is a vast literature devoted to the subject of routing messages in networks. For a broader treatment of the subject, see the survey paper by Leighton =-=[28]-=- or his book [25]. In what follows we describe only the research most closely related to this paper. 1.3.1. Network-independent algorithms. The idea of decoupling the path selection process from the s... |

54 | A constant-factor approximation algorithm for packet routing, and balancing local vs. global criteria
- Srinivasan, Teo
- 1997
(Show Context)
Citation Context ...itrary =>0. The universal store-and-forward message routing results outlined so far deal only with the problem of scheduling messages after paths have been chosen for each message. Srinivasan and Teo =-=[46]-=- show how to select paths, given the source and destination for each message, so that the value of C+D is the smallest possible to within constant factors. (Finding the exact minimum value of C+D is N... |

53 |
Deadlock avoidance in store-andforward networks
- Merlin, Schweitzer
- 1980
(Show Context)
Citation Context ...t path for a message. Minimal deadlock-free algorithms have been designed for de Bruijn and shuffle-exchange networks [11]. Fully-adaptive minimal deadlock-free algorithms have been devised for trees =-=[34]-=-, meshes [39], toruses [12], and hypercubes [39]. In the last category, wormhole routing algorithms have been designed for hypercubes, multibutterflies, trees, and meshes with constant dimension. (A m... |

42 |
Tight bounds for oblivious routing in the hypercube
- Kaklamanis, Krizanc, et al.
- 1990
(Show Context)
Citation Context ... message's path is determined solely by its origin and destination and not by the actions taken by other messages. The Borodin Hopcroft lower bound was later improved to 0(- n d) by Kaklamanis et al. =-=[20]-=-. These congestion-based bounds for store-and-forward routing can be translated to lower bounds (in flit steps) for oblivious wormhole routing algorithms simply by multiplying the time bound by the fa... |

42 | Distributed packet switching in arbitrary networks
- Rabani, Tardos
- 1996
(Show Context)
Citation Context ..., where P is the sum of the lengths of the paths taken by the messages. Recent advances in online local control algorithms for universal store-andforward routing include the work of Rabani and Tardos =-=[40]-=- and Ostrovsky and Rabani [38]. Ostrovsky and Rabani improve on the results in [40] by presenting a randomized online algorithm that delivers all the messages to their destinations in O(C+D+log 1+= n)... |

42 | Universal Routing Strategies for Interconnection Networks - Scheideler |

37 | Fast algorithms for bit-serial routing on a hypercube
- Aiello, Leighton, et al.
- 1990
(Show Context)
Citation Context ...s the number of virtual channels, e.g., the Kaklamanis et al. bounds become 0(L - n (dB)) flit steps. The Borodin Hopcroft lower bound was extended to randomized oblivious algorithms by Aiello et al. =-=[1]-=-. The result of [1] adapted to our model implies that on any degree-d network, almost all permutations require 0(log n (log d+log log n)) message steps, or 0(L log n (B(log d+log log n))) flit steps. ... |

35 |
An Algorithmic Approach to the Lovász
- Beck
- 1991
(Show Context)
Citation Context ...Leighton, Maggs, and Richa [29, 30] discovered a sequential algorithm for finding a storeand-forward routing schedule of length O(C+D) on any network. The algorithm is based on the techniques of Beck =-=[5]-=- and Alon [2] for making the Lovasz local lemma constructive. It uses information about the entire network and all of the messages and runs in O(P log 1+= P log*(C+D)) time, for any fixed =>0, where P... |

35 | Universal algorithms for store-and-forward and wormhole routing
- Cypher, Heide, et al.
- 1996
(Show Context)
Citation Context ...in O(LCD) flit steps. The O(LCD) bound improves on the naive O((L+D) CD) bound. 5 Neither of these papers considered the case where the network has multiple virtual channels, i.e., B>1. Cypher et al. =-=[13]-=- have recently independently proved a number of results that are closely related, and in some cases superior, to the results in this paper. They give a simple randomized algorithm for routing n messag... |

33 | A theory of wormhole routing in parallel computers
- Raghavan, Upfal
- 1992
(Show Context)
Citation Context ... algorithm can easily be generalized to the case q>1 and runs in O(q(L+log n) log n) flit steps. For the interesting case of L=O(log n) and q=log n, the time is O(log 3 n) flit steps. Felperin et al. =-=[18]-=- independently discovered an O(log 4 n) flit step algorithm for solving a random problem for the case L=O(log n) and q=log n, and then Ranade et al. [41] discovered an O(log 3 nlog log n) flit step al... |

32 | Greedy Packet Scheduling on Shortest Paths
- Mansour, Patt-Shamir
- 1993
(Show Context)
Citation Context ...ssage steps. In a leveled network with depth D, each node is labeled with an integer between 0 and D, and each edge with its tail on level i, 0 i<D, has its head on level i+1. Mansour and Patt-Shamir =-=[33]-=- then showed that, in any network, if messages are routed greedily on shortest paths, then all of the messages reach their destinations within D+n&1 message steps, where n is the total number of messa... |

30 | A packet routing protocol for arbitrary networks
- Heide, Vöcking
- 1995
(Show Context)
Citation Context ...ons within D+n&1 message steps, where n is the total number of messages. These schedules may be much longer than optimal, however, because n may be much larger than C. Meyer auf der Heide and Vocking =-=[35]-=- later devised a simple online randomized algorithm that routes all messages to their destinations in O(C+D+log n) message steps, with high probability, provided that the pathssVIRTUAL CHANNELS IN WOR... |

25 |
sorting on parallel models of computation
- Borodin, Hopcroft, et al.
- 1985
(Show Context)
Citation Context ... process. The algorithms described in this paper, which are designed to route a batch of packets, are not continuous. 1.3.2. Network-independent lower bounds. In a classic paper, Borodin and Hopcroft =-=[9]-=- proved that any deterministic oblivious store-and-forward algorithm must take a least 0(- n d 3 ) message steps to route some permutation on any n-node degree d network. An oblivious routing algorith... |

24 | Universal continuous routing strategies
- Scheideler, Vöcking
- 1996
(Show Context)
Citation Context ...to theirs for other ranges of the parameters. The techniques used in the two papers to design network-independent algorithms are very different and are of independent interest. Scheideler and Vocking =-=[43]-=- have also recently shown that the same factor of D 1 B appears in the maximum injection rate for continuous wormhole routing algorithms. A continuous routing algorithm is one that accepts packets tha... |

23 |
Fully-Adaptive Minimal Deadlock-Free Packet Routing
- Pifarr'e, Gravano, et al.
- 1991
(Show Context)
Citation Context ...message. Minimal deadlock-free algorithms have been designed for de Bruijn and shuffle-exchange networks [11]. Fully-adaptive minimal deadlock-free algorithms have been devised for trees [34], meshes =-=[39]-=-, toruses [12], and hypercubes [39]. In the last category, wormhole routing algorithms have been designed for hypercubes, multibutterflies, trees, and meshes with constant dimension. (A mesh with cons... |

22 | Universal wormhole routing
- Greenberg, Oh
- 1997
(Show Context)
Citation Context ...actor from optimal using constant-sized buffers. In contrast with store-and-forward routing, there is relatively little prior work on network independent wormhole routing algorithms. Greenberg and Oh =-=[19]-=- were the first to state nontrivial network-independent wormhole routing results in terms of L, C, and D. They created a randomized algorithm that takes O(lCD+ lCL log n) flit steps, where l=min[L, D]... |

21 |
Increasing the size of a Network by a constant factor Can Increase Performance by More Than a Constant Factor
- Koch
(Show Context)
Citation Context ...ly-chosen output, and at most one message can use any edge of the network, then the expected number of messages that reach their destinations (i.e., succeed in locking-down paths) is 3(n log n). Koch =-=[22]-=- generalized the result of Kruskal and Snir by showing that if each edge can support B messages, then the expected fraction of messages that get through is 3(n log 1 B n). Thus, Koch observed in the c... |

17 | How much can hardware help routing
- Borodin, Raghavan, et al.
- 1993
(Show Context)
Citation Context ... (log d+log log n)) message steps, or 0(L log n (B(log d+log log n))) flit steps. For constant-degree networks such as the butterfly, this bound is 0(L log n (B log log n)) flit steps. Borodin et al. =-=[10]-=- later showed that for d n log 3 n, any oblivious randomized single-port permutation routing algorithm requires 0(log d n+log n log log n) message steps, on average. The bound from [10] is stronger th... |

17 | Fast algorithms for finding O(congestion+dilation) packet routing schedules
- Leighton, Maggs, et al.
- 1999
(Show Context)
Citation Context ...message steps, with high probability, provided that the pathssVIRTUAL CHANNELS IN WORMHOLE ROUTERS taken by the messages are shortcut free (e.g., shortest paths). Recently, Leighton, Maggs, and Richa =-=[29, 30]-=- discovered a sequential algorithm for finding a storeand-forward routing schedule of length O(C+D) on any network. The algorithm is based on the techniques of Beck [5] and Alon [2] for making the Lov... |

14 | Simple algorithms for routing on butterfly networks with bounded queues
- Maggs, Sitaraman
- 1999
(Show Context)
Citation Context ...tivated by the fact that on the BBN Butterfly parallel computer [4], B=2. Note that in this paper we show similar superlinear resource-performance trade-offs for wormhole routing. Maggs and Sitaraman =-=[32]-=- generalized the previous two results by showing that by making two passes through a butterfly it is possible to route a 3(n log 1 B n) fraction of any permutation (rather than only a random permutati... |

14 | Nearly tights bounds for wormhole routing
- Ranade, Schleimer, et al.
- 1994
(Show Context)
Citation Context ...it steps, where l=min[L, D], provided the paths of any two messages intersect in at most one contiguous sequence of edges and the channel dependency graph is acyclic to avoid deadlocks. Ranade et al. =-=[41]-=- then showed that on any leveled network, any set of L-flit messages whose paths have congestion C and dilation D can be routed in O(LCD) flit steps. The O(LCD) bound improves on the naive O((L+D) CD)... |

12 |
The analysis and synthesis of signal switching networks
- BEIZER
- 1962
(Show Context)
Citation Context ...is area, we will not review the previous results. Descriptions of several of the algorithms can be found in [25, 28]. Instead, we focus on algorithms for wormhole routing. In two early papers, Beizer =-=[6]-=- and Benes [7] showed that it is possible to route edge-disjoint paths between the inputs and outputs of a Benes network in anysVIRTUAL CHANNELS IN WORMHOLE ROUTERS permutation. A Benes network is sim... |

9 |
A Parallel Algorithmic Version of the
- Alon
- 1991
(Show Context)
Citation Context ...gs, and Richa [29, 30] discovered a sequential algorithm for finding a storeand-forward routing schedule of length O(C+D) on any network. The algorithm is based on the techniques of Beck [5] and Alon =-=[2]-=- for making the Lovasz local lemma constructive. It uses information about the entire network and all of the messages and runs in O(P log 1+= P log*(C+D)) time, for any fixed =>0, where P is the sum o... |

8 |
Universal O(congestion+dilation+log1+ε N) local control packet switching algorithms
- Ostrovsky, Rabani
- 1997
(Show Context)
Citation Context ...ngths of the paths taken by the messages. Recent advances in online local control algorithms for universal store-andforward routing include the work of Rabani and Tardos [40] and Ostrovsky and Rabani =-=[38]-=-. Ostrovsky and Rabani improve on the results in [40] by presenting a randomized online algorithm that delivers all the messages to their destinations in O(C+D+log 1+= n) message steps with high proba... |

6 |
A shared MPP from Cray Research
- Koeninger, Furtney, et al.
- 1994
(Show Context)
Citation Context ... method of choice in the latest generation of parallel computers, including experimental machines such as iWarp [8] and the J-Machine [37], and commercial machines such as the Intel Paragon, Cray T3D =-=[23]-=-, and Connection Machine CM-5 [31]. In a wormhole router, the bits in a message are grouped into a sequence of flits, where a flit is the smallest unit of information that can be buffered at a node of... |

3 |
The architecture and programming
- Seitz, Athas, et al.
- 1988
(Show Context)
Citation Context ... papers that prove rigorous worst-case bounds on the running times of wormhole routing algorithms. In the first category Seitz et al. describe the architecture of the Ametek Series 2010 Multicomputer =-=[44]-=-, and Dally and Seitz [17] describe the Torus Routing Chip. In the second category, two of the most influential papers were written by Dally [15, 16]. The first analyzes the behavior of wormhole routi... |

2 | deadlock-free routing in hypercubic and arbitrary networks
- Cypher, Minimal
- 1995
(Show Context)
Citation Context ....) A fully-adaptive minimal algorithm is one that considers every possible shortest path for a message. Minimal deadlock-free algorithms have been designed for de Bruijn and shuffle-exchange networks =-=[11]-=-. Fully-adaptive minimal deadlock-free algorithms have been devised for trees [34], meshes [39], toruses [12], and hypercubes [39]. In the last category, wormhole routing algorithms have been designed... |

1 |
deadlock-free, adaptive packet routing algorithms for torus networks
- Cypher, Gravano, et al.
- 1994
(Show Context)
Citation Context ...al deadlock-free algorithms have been designed for de Bruijn and shuffle-exchange networks [11]. Fully-adaptive minimal deadlock-free algorithms have been devised for trees [34], meshes [39], toruses =-=[12]-=-, and hypercubes [39]. In the last category, wormhole routing algorithms have been designed for hypercubes, multibutterflies, trees, and meshes with constant dimension. (A mesh with constant dimension... |