## Limits on Interconnection Network Performance (1991)

### Cached

### Download Links

- [ftp.cag.lcs.mit.edu]
- [www.ecs.umass.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | IEEE Transactions on Parallel and Distributed Systems |

Citations: | 176 - 4 self |

### BibTeX

@ARTICLE{Agarwal91limitson,

author = {Anant Agarwal},

title = {Limits on Interconnection Network Performance},

journal = {IEEE Transactions on Parallel and Distributed Systems},

year = {1991},

volume = {2},

pages = {398--412}

}

### Years of Citing Articles

### OpenURL

### Abstract

As the performance of interconnection networks becomes increasingly limited by physical constraints in high-speed multiprocessor systems, the parameters of high-performance network design must be reevaluated, starting with a close examination of assumptions and requirements. This paper models network latency, taking both switch and wire delays into account. A simple closed form expression for contention in buffered, direct networks is derived and is found to agree closely with simulations. The model includes the effects of packet size and communication locality. Network analysis under various constraints (such as fixed bisection width, fixed channel width, and fixed node size) and under different workload parameters (such as packet size, degree of communication locality, and network request rate) reveals that performance is highly sensitive to these constraints and workloads. A twodimensional network has the lowest latency only when switch delays and network contention are ignored, but...

### Citations

531 | Queueing Systems
- Kleinrock
- 1975
(Show Context)
Citation Context ...ivals in any given cycle. As shown in [16], the average waiting time w for a packet in such a unit cycle-time system can be derived from the set of equations that result from an M=G=1 queueing system =-=[15]-=-, as w = V 2E(1 \Gamma E) \Gamma 1 2 (3) 3.1 Deriving the Distribution of v To compute E and V we need the distribution of the random variable v. In an indirect network v has a simple binomial distrib... |

475 |
The Connection Machine
- Hillis
- 1985
(Show Context)
Citation Context ...e becoming increasingly popular for interconnections in large-scale concurrent computers. Examples of machines that use direct networks include the Caltech Cosmic Cube [25] and the Connection Machine =-=[13]-=-. We will focus on the general class of direct networks called k-ary n-cubes [28]. A k-ary n-cube is a network with n dimensions having k nodes in each dimension. For example, a 100 processor array ha... |

338 | Virtual cut-through: A new computer communication switching technique
- Kermani, Kleinrock
- 1979
(Show Context)
Citation Context ...is some function of the switch dimension. We will also assume a linear wire delay model. The switches are pipelined (i.e., switches use wormhole routing [7], which is a variant of cut through routing =-=[14]-=-). As mentioned before, the clock cycle is the sum of the switch delay and the delay due to the longest wire in the synchronous network. Our study assumes that the networks are embedded in a plane. A ... |

297 | Performance Analysis of k-ary n-cube Interconnection Networks
- Dally
- 1990
(Show Context)
Citation Context ... for a five-dimensional network W = 8, which yields 16-flit messages for L = 128. Here we see that the higher-dimensional networks suffer much higher delays, and we obtain results similar to those in =-=[8]-=-. With fixed bisection, the two-dimensional network outperforms the rest. Figure 11(d) plots network latency when the node size is fixed, and normalized to that of the two-dimensional network with W =... |

259 | April: A Processor Architecture for Multiprocessing
- Agarwal, Lim, et al.
- 1990
(Show Context)
Citation Context ...arp [4]. Some recent distributed shared-memory designs are also planning to use low-dimensional direct networks, e.g., HORIZON [18], the Stanford DASH Multiprocessor [20], and the MIT Alewife machine =-=[2, 6]-=-. The choice of the optimal network for a multiprocessor is highly sensitive to the assumptions about system parameters and the constraints that apply on the design. System parameters include, among o... |

204 |
The Cosmic Cube
- Seitz
- 1985
(Show Context)
Citation Context ...ion locality, direct networks are becoming increasingly popular for interconnections in large-scale concurrent computers. Examples of machines that use direct networks include the Caltech Cosmic Cube =-=[25]-=- and the Connection Machine [13]. We will focus on the general class of direct networks called k-ary n-cubes [28]. A k-ary n-cube is a network with n dimensions having k nodes in each dimension. For e... |

200 | LimitLESS Directories: A Scalable Cache Coherence Scheme
- Chaiken, Kubiatowicz, et al.
- 1991
(Show Context)
Citation Context ...arp [4]. Some recent distributed shared-memory designs are also planning to use low-dimensional direct networks, e.g., HORIZON [18], the Stanford DASH Multiprocessor [20], and the MIT Alewife machine =-=[2, 6]-=-. The choice of the optimal network for a multiprocessor is highly sensitive to the assumptions about system parameters and the constraints that apply on the design. System parameters include, among o... |

177 |
Access and alignment of data in an array processor
- Lawrie
- 1975
(Show Context)
Citation Context ...ch other over a set of point-to-point links. The point-to-point interconnections between processors distinguish direct networks from indirect networks (or multistage networks) [27], such as the Omega =-=[19]-=- and the Delta [22] networks. An indirect network does not integrate processors and switches. Consequently, processors cannot communicate directly with each other, but must do so through a set of inte... |

155 |
Multicomputers: message-passing concurrent computers
- Athas, Seitz
- 1988
(Show Context)
Citation Context ...le better than high-dimensional networks, they are modular, and they are easy to implement. Examples of machine designs that use such networks are the MuNet [12], Ametek 2010 [26], the Caltech Mosaic =-=[3]-=-, the MIT J-machine [9], and the CMU-Intel iWarp [4]. Some recent distributed shared-memory designs are also planning to use low-dimensional direct networks, e.g., HORIZON [18], the Stanford DASH Mult... |

141 |
The performance of multistage interconnection networks for multiprocessors
- Kruskal, Snir
- 1983
(Show Context)
Citation Context ...ks This section derives a contention model for high-radix direct networks and validates it through simulations. The derivation proceeds like the buffered-indirect-network analysis of Kruskal and Snir =-=[16]-=-. Our contention model assumes buffered networks as well. Simulation experiments by Kruskal and Snir show that as few as four packet buffers at each switch node can approach infinite buffer performanc... |

134 |
Performance of processor-memory interconnections for multiprocessors
- Patel
- 1981
(Show Context)
Citation Context ... of point-to-point links. The point-to-point interconnections between processors distinguish direct networks from indirect networks (or multistage networks) [27], such as the Omega [19] and the Delta =-=[22]-=- networks. An indirect network does not integrate processors and switches. Consequently, processors cannot communicate directly with each other, but must do so through a set of intervening switching n... |

104 | A complexity theory for VLSI
- Thompson
- 1980
(Show Context)
Citation Context ...n constraints include limits on bisection width, node size, and channel width. Bisection width is defined as the minimum number of wires that must be cut to separate the network into two equal halves =-=[29]-=-. A bisection width constraint is tantamount to an area constraint. A constraint on the node size is assumed to limit the number of pins on the node. Assuming a constraint on the bisection width, Dall... |

86 |
A VLSI Architecture for Concurrent Data Structures
- Dally
- 1987
(Show Context)
Citation Context ...A bisection width constraint is tantamount to an area constraint. A constraint on the node size is assumed to limit the number of pins on the node. Assuming a constraint on the bisection width, Dally =-=[7, 8]-=- analyzed the performance of k-ary n-cube networks implemented in two-dimensional space, using constant, logarithmic, and linear wire delay models. The analysis suggests that a two-dimensional network... |

77 |
Interconnection Networks for Large-Scale Parallel Processing: Theory and Case
- Siegel
- 1990
(Show Context)
Citation Context ...unicate directly with each other over a set of point-to-point links. The point-to-point interconnections between processors distinguish direct networks from indirect networks (or multistage networks) =-=[27]-=-, such as the Omega [19] and the Delta [22] networks. An indirect network does not integrate processors and switches. Consequently, processors cannot communicate directly with each other, but must do ... |

76 |
A large scale, homogeneous, fully distributed parallel machine
- Sullivan, Bashkow
- 1977
(Show Context)
Citation Context ...mputers. Examples of machines that use direct networks include the Caltech Cosmic Cube [25] and the Connection Machine [13]. We will focus on the general class of direct networks called k-ary n-cubes =-=[28]-=-. A k-ary n-cube is a network with n dimensions having k nodes in each dimension. For example, a 100 processor array has n = 2 and k = 10. Given N processors, the relationship N = k n holds between th... |

45 |
The NYU ultracomputer: designing a MIMD, sharedmemory parallel computer
- Gottlieb, Grishman, et al.
- 1983
(Show Context)
Citation Context ...We see that communication locality has a larger relative effect on the two-dimensional network. 5.2 Direct Versus Indirect Networks In the past, shared-memory multiprocessors (e.g., the Ultracomputer =-=[11]-=-, RP3 [23], Cedar [10], and BBN Butterfly) have generally employed indirect networks. These networks provide uniform-cost access to remote memory modules, and have a high bandwidth, but they do not al... |

40 |
Concurrent VLSI architectures
- Seitz
- 1984
(Show Context)
Citation Context ...delay, and wire delay, but on the communication patterns of parallel computations as well. This paper analyses the contribution of these factors to the latency of direct networks. In a direct network =-=[24]-=-, the processing nodes communicate directly with each other over a set of point-to-point links. The point-to-point interconnections between processors distinguish direct networks from indirect network... |

35 |
Cedar: a large scale multiprocessor
- Gajski, Kuck, et al.
- 1983
(Show Context)
Citation Context ...ion locality has a larger relative effect on the two-dimensional network. 5.2 Direct Versus Indirect Networks In the past, shared-memory multiprocessors (e.g., the Ultracomputer [11], RP3 [23], Cedar =-=[10]-=-, and BBN Butterfly) have generally employed indirect networks. These networks provide uniform-cost access to remote memory modules, and have a high bandwidth, but they do not allow the exploitation o... |

34 |
et al.: The J-Machine: A Fine-Grain Concurrent Computer
- Dally
- 1989
(Show Context)
Citation Context ...ensional networks, they are modular, and they are easy to implement. Examples of machine designs that use such networks are the MuNet [12], Ametek 2010 [26], the Caltech Mosaic [3], the MIT J-machine =-=[9]-=-, and the CMU-Intel iWarp [4]. Some recent distributed shared-memory designs are also planning to use low-dimensional direct networks, e.g., HORIZON [18], the Stanford DASH Multiprocessor [20], and th... |

32 |
The Horizon supercomputing system: Architecture and software
- Kuehn, Smith
- 1988
(Show Context)
Citation Context ...[26], the Caltech Mosaic [3], the MIT J-machine [9], and the CMU-Intel iWarp [4]. Some recent distributed shared-memory designs are also planning to use low-dimensional direct networks, e.g., HORIZON =-=[18]-=-, the Stanford DASH Multiprocessor [20], and the MIT Alewife machine [2, 6]. The choice of the optimal network for a multiprocessor is highly sensitive to the assumptions about system parameters and t... |

29 |
Performance of the direct binary n-cube networks for multiprocessors
- Abraham, Padmanabhan
- 1989
(Show Context)
Citation Context ...nel widths ffl Constant bisection width ffl Constant node size We develop a model for buffered low-dimensional direct networks that yields a simple closed form expression for network contention. (See =-=[1]-=- for a model of binary n-cube networks for unit packet sizes.) The model is thoroughly validated through measurements taken from a simulator. Although the assumptions made by the model are tailored to... |

19 |
DirectoryBased Cache-Coherence in Large-Scale Multiprocessors
- Chaiken, Fields, et al.
- 1990
(Show Context)
Citation Context ...essage sizes are small, wider channels -- an advantage of low-dimensional networks -- are less useful. Messages are expected to be smaller in a shared-memory multiprocessor (about 100 bits on average =-=[5]-=-) than in a message passing multicomputer. In addition, as observed in Section 3.3, small messages suffer less contention delay than large messages per unit volume of data transferred. Our analysis sh... |

16 |
The distribution of waiting times in clocked multistage interconnection networks
- Kruskal, Snir, et al.
- 1988
(Show Context)
Citation Context ...ks and message rates; we verified the accuracy of this simple model for several packet sizes by comparing its predictions with simulations. 1 Kruskal, Snir, and Weiss derive more accurate formulas in =-=[17]-=-, but this is sufficient for our purposes. n=2 n=3 n=4 n=5 # #sIndirect | 0.000 | 0.005 | 0.010 | 0.015 | 0.020 100 200 300 400 500 600 700 800 (a) Network Request Rate Latency # # # # # # # # # # # #... |

8 |
A methodology for predicting multiprocessor performance
- Norton, Pfister
- 1985
(Show Context)
Citation Context ...5). Kruskal and Snir's indirect network model predicted a similar dependence of latency on blocksize, prompting Norton and Pfister to consider splitting messages into multiple smaller ones in the RP3 =-=[21]-=-. However, splitting long messages into multiple smaller ones with back to back transmission may not realize the higher predicted throughput because of the destination correlation of these sub-message... |

7 |
The MuNet: A Scalable Decentralized Architecture for Parallel Computation
- Halstead, Ward
- 1980
(Show Context)
Citation Context ...ional networks are favored because they scale better than high-dimensional networks, they are modular, and they are easy to implement. Examples of machine designs that use such networks are the MuNet =-=[12]-=-, Ametek 2010 [26], the Caltech Mosaic [3], the MIT J-machine [9], and the CMU-Intel iWarp [4]. Some recent distributed shared-memory designs are also planning to use low-dimensional direct networks, ... |

3 |
et al. The Architecture and Programming of the Ametek Series 2010 Multicomputer
- Seitz
- 1988
(Show Context)
Citation Context ... favored because they scale better than high-dimensional networks, they are modular, and they are easy to implement. Examples of machine designs that use such networks are the MuNet [12], Ametek 2010 =-=[26]-=-, the Caltech Mosaic [3], the MIT J-machine [9], and the CMU-Intel iWarp [4]. Some recent distributed shared-memory designs are also planning to use low-dimensional direct networks, e.g., HORIZON [18]... |

3 |
A VLSZArchitecturefor Concurrent Data Structures
- Dally
(Show Context)
Citation Context ... A bisection width constraint istantamount to an area constraint. A constraint on the node size is assumed to limit the number of pins on the node. Assuming a constraint on the bisection width, Dally =-=[7, 8]-=- analyzed the performance of k-ary n-cube networks implemented in two-dimensional space, using constant, logarithmic, and linear wire delay models. The analysis suggests that a two-dimensional network... |