Results 1 - 10
of
90
A Delay Model and Speculative Architecture for Pipelined Routers
- In International Symposium on High-Performance Computer Architecture
, 2001
"... This paper introduces a router delay model that accurately models key aspects of modern routers. The model accounts for the pipelined nature of contemporary routers, the specific flow control method employed, the delay of the flowcontrol credit path, and the sharing of crossbar ports across virtual ..."
Abstract
-
Cited by 94 (19 self)
- Add to MetaCart
This paper introduces a router delay model that accurately models key aspects of modern routers. The model accounts for the pipelined nature of contemporary routers, the specific flow control method employed, the delay of the flowcontrol credit path, and the sharing of crossbar ports across virtual channels. Motivated by this model, we introduce a microarchitecture for a speculative virtual-channel router that significantly reduces its router latency to that of a wormhole router. Simulations using our pipelined model give results that differ considerably from the commonly- assumed `unit-latency' model which is unreasonably optimistic. Using realistic pipeline models, we compare wormhole [6] and virtual-channel flow control [4]. Our results show that a speculative virtual-channel router has the same per-hop router latency as a wormhole router, while improving throughput by up to 40%. 1. Introduction Interconnection networks are used to connect processors to memories in multicomputers ...
Low-Latency Virtual-Channel Routers for On-Chip Networks
- In International Symposium on Computer Architecture
, 2004
"... The on-chip communication requirements of many systems are best served through the deployment of a regular chip-wide network. This paper presents the design of a low-latency on-chip network router for such applications. We remove control overheads (routing and arbitration logic) from the critical pa ..."
Abstract
-
Cited by 63 (1 self)
- Add to MetaCart
The on-chip communication requirements of many systems are best served through the deployment of a regular chip-wide network. This paper presents the design of a low-latency on-chip network router for such applications. We remove control overheads (routing and arbitration logic) from the critical path in order to minimise cycle-time and latency. Simulations illustrate that dramatic cycle time improvements are possible without compromising router efficiency. Furthermore, these reductions permit flits to be routed in a single cycle, maximising the effectiveness of the router’s limited buffering resources. 1.
A Delay Model for Router Micro-architectures
- IEEE Micro
, 2000
"... . Current router models [2, 3, 5, 6] assume that clock cycle time depends solely on router latency. However, in practice, routers are heavily pipelined, making cycle time largely independent of router latency. In this paper, we describe a router delay model that accurately accounts for pipelining ba ..."
Abstract
-
Cited by 31 (3 self)
- Add to MetaCart
. Current router models [2, 3, 5, 6] assume that clock cycle time depends solely on router latency. However, in practice, routers are heavily pipelined, making cycle time largely independent of router latency. In this paper, we describe a router delay model that accurately accounts for pipelining based on technology-independent delay estimates derived through detailed gate-level analysis. Simulations of realistic router pipelines show significant performance differences compared with the commonly-assumed unit-latency model. Using realistic pipeline models, we compared wormhole and virtual-channel flow control. Our results show that virtual channels incur a modest additional cycle of per-hop router latency which is more than offset by the 25-40% throughput improvement over a wormhole router. 1. Introduction Most current literature in interconnection networks reports comparisons of different flow control and routing techniques without considering implementation complexity and the impac...
Chaotic Routing - Design and Implementation of an Adaptive Multicomputer Network Router
, 1993
"... Chaotic Routing -- Design and Implementation of an Adaptive Multicomputer Network Router by Kevin Bolding Chairperson of Supervisory Committee: Professor Lawrence Snyder Department of Computer Science and Engineering A crucial component of a massively parallel multicomputer is the interconnection n ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
Chaotic Routing -- Design and Implementation of an Adaptive Multicomputer Network Router by Kevin Bolding Chairperson of Supervisory Committee: Professor Lawrence Snyder Department of Computer Science and Engineering A crucial component of a massively parallel multicomputer is the interconnection network which links all of the nodes of the computer together. This network provides the primary method of communication between the hundreds or thousands of processing nodes and is, thus, critical to the successful operation of the multicomputer. Current state-of-the-art interconnection networks use simple, oblivious routing techniques which achieve very good performance when loading is light, but do not perform well in the presence of non-uniform congestion or faults. Chaotic routing, a non-minimal adaptive routing technique, provides a mechanism which takes into account the presence of congestion and faults when choosing a path for a message and can, thus, achieve better performance. Chaot...
Wormhole Routing Techniques for Directly Connected Multicomputer Systems
- ACM Computing Surveys
, 1998
"... Wormhole routing has emerged as the most widely used switching technique in massively parallel computers. We present here a detailed survey of various techniques for enhancing the performance and reliability of the wormhole routing schemes in directly connected networks. We start with an overview of ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Wormhole routing has emerged as the most widely used switching technique in massively parallel computers. We present here a detailed survey of various techniques for enhancing the performance and reliability of the wormhole routing schemes in directly connected networks. We start with an overview of the direct network topologies and a comparison of various switching techniques. Next, the characteristics of wormhole routing mechanism are described in detail along with the theory behind deadlock-free routing. The performance of routing algorithms depends on the selection of path between the source and the destination, the network traffic, and the router design. The routing algorithms are implemented in the router chips. We outline the router characteristics and describe the functionality of various elements of the router. Depending on the usage of paths between the source and the destination, the routing algorithms are classified as deterministic, fully adaptive, and partially adaptive. ...
Efficient Adaptive Routing in Networks of Workstations with Irregular Topology
- In Proceedings of the First International Workshop on Communication and Architectural Support for Network-Based Parallel Computing (CANPC '97
, 1997
"... . Networks of workstations are rapidly emerging as a costeffective alternative to parallel computers. Switch-based interconnects with irregular topologies allow the wiring flexibility, scalability and incremental expansion capability required in this environment. The irregularity also makes routing ..."
Abstract
-
Cited by 21 (7 self)
- Add to MetaCart
. Networks of workstations are rapidly emerging as a costeffective alternative to parallel computers. Switch-based interconnects with irregular topologies allow the wiring flexibility, scalability and incremental expansion capability required in this environment. The irregularity also makes routing and deadlock avoidance on such systems quite complicated. Current proposals avoid deadlock by removing cyclic dependencies between channels. As a consequence, many messages are routed following non-minimal paths, increasing latency and wasting resources. In this paper, we propose a general methodology for the design of adaptive routing algorithms for networks with irregular topology. These routing algorithms allow messages to follow minimal paths in most cases, reducing message latency and increasing network throughput. The methodology is based on the application of the theory of deadlock avoidance proposed in [14], which increases routing flexibility by allowing cyclic dependencies between ...
A General Theory for Deadlock Avoidance in Wormhole-Routed Networks
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1998
"... Most machines of the last generation of distributed memory parallel computers possess specific routers which are used to exchange messages between non-neighboring nodes in the network. Among the several technologies, wormhole routing is usually prefered because it allows low channel-setup time, and ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
Most machines of the last generation of distributed memory parallel computers possess specific routers which are used to exchange messages between non-neighboring nodes in the network. Among the several technologies, wormhole routing is usually prefered because it allows low channel-setup time, and reduces the dependency between latency and inter-node distance. However, wormhole routing is very susceptible to deadlock because messages are allowed to hold many resources while requesting others. Therefore, designing deadlock-free routing algorithms using few hardware facilities is a major problem for wormhole-routed networks. In this paper, we describe a general theoretical framework for the study of deadlockfree routing functions. We give a general definition of what can be a routing function. This definition captures many specific definitions of the literature (e.g., vertex-dependent, input-dependent, source-dependent, path-dependent, etc.). Using our definition, we give a necessary an...
Analysis and Implementation of Hybrid Switching
, 1995
"... The switching scheme of a point-to-point network determines how packets flow through each node, and is a primary element in determining the network's performance. In this paper, we present and evaluate a new switching scheme called hybrid switching. Hybrid switching dynamically combines both virtual ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
The switching scheme of a point-to-point network determines how packets flow through each node, and is a primary element in determining the network's performance. In this paper, we present and evaluate a new switching scheme called hybrid switching. Hybrid switching dynamically combines both virtual cutthrough and wormhole switching to provide higher achievable throughput than wormhole alone, while significantly reducing the buffer space required at intermediate nodes when compared to virtual cut-through. This scheme is motivated by a comparison of virtual cut-through and wormhole switching through cycle-level simulations, and then evaluated using the same methods. To show the feasibility of hybrid switching, as well as to provide a common base for simulating and implementing a variety of switching schemes, we have designed SPIDER, a communication adapter built around a custom ASIC, the Programmable Routing Controller (PRC). 1 Introduction The effectiveness of a parallel or distribut...
Support for Multiple Classes of Traffic in Multicomputer Routers
- in Proc. Parallel Computer Routing and Communication Workshop
, 1994
"... . Emerging parallel real-time and multimedia applications broaden the range of performance requirements imposed on the interconnection network. This communication typically consists of a mixture of different traffic classes, where guaranteed packets require bounds on latency or throughput while good ..."
Abstract
-
Cited by 18 (7 self)
- Add to MetaCart
. Emerging parallel real-time and multimedia applications broaden the range of performance requirements imposed on the interconnection network. This communication typically consists of a mixture of different traffic classes, where guaranteed packets require bounds on latency or throughput while good average performance suffices for the best-effort traffic. This paper investigates how multicomputer routers can capitalize on low-latency routing and switching techniques for besteffort traffic while still supporting guaranteed communication. Through simulation experiments, we show that certain architectural features are best-suited to particular performance requirements. Based on these results, the paper proposes and evaluates a router architecture that tailors low-level routing, switching, and flow-control policies to the unique needs of best-effort and guaranteed traffic. Careful selection of these policies, coupled with fine-grain arbitration between the classes, allows the guaranteed ...
Optimal Fully Adaptive Wormhole Routing for Meshes
, 1993
"... A deadlock-free fully adaptive routing algorithm for 2D meshes which is optimal in the number of virtual channels required and in the number of restrictions placed on the use of these virtual channels is presented. The routing algorithm imposes less than half as many routing restrictions as any prev ..."
Abstract
-
Cited by 17 (8 self)
- Add to MetaCart
A deadlock-free fully adaptive routing algorithm for 2D meshes which is optimal in the number of virtual channels required and in the number of restrictions placed on the use of these virtual channels is presented. The routing algorithm imposes less than half as many routing restrictions as any previous fully adaptive routing algorithm. It is also proved that, ignoring symmetry, this routing algorithm is the only fully adaptive routing algorithm that achieves both of these goals. The algorithm exploits the fact that for some adaptive routing algorithms, deadlock freedom is possible even when cycles are present in the channel dependency graph. The implementation of the routing algorithm requires relatively simple router control logic. The routing algorithm requires only the minimum number of virtual channels even when extended to arbitrary dimension meshes, yielding a dramatic reduction in the number of virtual channels needed to support fully adaptive routing. Compared to all previous algorithms which required an exponential number of virtual channels with the dimension of the mesh, the new algorithm requires only 4n - 2 virtual channels for an n-dimensional mesh.

