Results 1 - 10
of
176
A survey of wormhole routing techniques in direct networks
- IEEE Computer
, 1993
"... messages is critical to the performance of direct network systems. The popular wormhole routing technique faces several challenges-particularly flow control and deadlock avoidance. 62 0018.9162/93/0200-0062$03.00 0 1993 IEEE assively parallel computers with thousands of processors are considered the ..."
Abstract
-
Cited by 429 (39 self)
- Add to MetaCart
messages is critical to the performance of direct network systems. The popular wormhole routing technique faces several challenges-particularly flow control and deadlock avoidance. 62 0018.9162/93/0200-0062$03.00 0 1993 IEEE assively parallel computers with thousands of processors are considered the most promising technology to achieve teraflops computational power. Such large-scale multiprocessors are usually organized as ensembles of nodes, where each node has its own processor, local memory, and other supporting devices. These nodes may have different functional capabilities. For
Planar-Adaptive Routing: Low-cost Adaptive Networks for Multiprocessors
- In Proceedings of the International Symposium on Computer Architecture
, 1992
"... Network throughput can be increased by allowing multipath, adaptive routing. Adaptive routing allows more freedom in the paths taken by messages, spreading load over physical channels more evenly. The flexibility of adaptive routing introduces new possibilities of deadlock. Previous deadlock avoidan ..."
Abstract
-
Cited by 179 (13 self)
- Add to MetaCart
Network throughput can be increased by allowing multipath, adaptive routing. Adaptive routing allows more freedom in the paths taken by messages, spreading load over physical channels more evenly. The flexibility of adaptive routing introduces new possibilities of deadlock. Previous deadlock avoidance schemes in k-ary ncubes require an exponential number of virtual channels [17]. We describe a family of deadlock-free routing algorithms, called planar-adaptive routing algorithms which require only a constant number of virtual channels, independent of network size and dimension. Planar-adaptive routing algorithms reduce the complexity of deadlock prevention by reducing the number of choices at each routing step. In the fault-free case, planar-adaptive networks are guaranteed to be deadlock-free. In the presence of network faults, the planar-adaptive router can be extended with misrouting to produce a working network which remains provably deadlock free and is provably livelock free. In a...
A Cost and Speed Model for k-ary n-cube Wormhole Routers
- HIT INTERCONNECTS '93
, 1993
"... A great deal of research has been published on the performance of wormhole routers with advanced features such as adaptivity and virtual lanes. In most cases, the effectiveness of such novel routers is evaluated on the basis of the achieved network throughput (channel utilization), ignoring the impo ..."
Abstract
-
Cited by 125 (1 self)
- Add to MetaCart
A great deal of research has been published on the performance of wormhole routers with advanced features such as adaptivity and virtual lanes. In most cases, the effectiveness of such novel routers is evaluated on the basis of the achieved network throughput (channel utilization), ignoring the important effects of implementation complexity. In this paper we describe a parameterized cost model for router performance, characterized by two numbers: router delay and flow control time. Grounding the cost model in a 0.8 micron gate array technology, we use it to compare a number of proposed routing algorithms. Based on these design studies, several insights regarding the implementation complexity of adaptive routing are clear. First, header update and selection is expensive in adaptive routers, suggesting the absolute addressing should be reconsidered. Second, virtual channels are expensive in terms of latency and cycle time, so decisions to include them to support adaptivity or even virtual lanes should not be taken lightly. Third, requirements of larger crossbars and more complex arbitration cause some increase in the complexity of adaptive routers, but the rate of increase is small. Finally, the complexity of adaptive routers significantly increases their setup delay and flow control cycle times, implying that claims of performance advantages in channel utilization and low load latency must be carefully balanced against losses in achievable implementation speed.
A Necessary and Sufficient Condition for Deadlock-Free Routing in Cut-Through and Store-and-Forward Networks
, 1995
"... This paper develops the theoretical background for the design of deadlockfree adaptive routing algorithms for virtual cut-through and store-and-forward switching. This theory is valid for networks using either central buffers or edge buffers. Some basic definitions and three theorems are proposed, d ..."
Abstract
-
Cited by 111 (15 self)
- Add to MetaCart
This paper develops the theoretical background for the design of deadlockfree adaptive routing algorithms for virtual cut-through and store-and-forward switching. This theory is valid for networks using either central buffers or edge buffers. Some basic definitions and three theorems are proposed, developing conditions to verify that an adaptive algorithm is deadlock-free, even when there are cyclic dependencies between routing resources. Moreover, we propose a necessary and sufficient condition for deadlock-free routing. Also, a design methodology is proposed. It supplies fully adaptive, minimal and non-minimal routing algorithms, guaranteeing that they are deadlock-free. The theory proposed in this paper extends the necessary and sufficient condition for wormhole switching previously proposed by us. The resulting routing algorithms are more flexible than the ones for wormhole switching. Also, the design methodology is much easier to apply because it automatically supplies deadlock-fr...
The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus
, 1996
"... This paper describes the interconnection network used in the Cray T3E multiprocessor. The network is a bidirectional 3D torus with fully adaptive routing, optimized virtual channel assignments, integrated barrier synchronization support and considerable fault tolerance. The routers are built with LS ..."
Abstract
-
Cited by 111 (4 self)
- Add to MetaCart
This paper describes the interconnection network used in the Cray T3E multiprocessor. The network is a bidirectional 3D torus with fully adaptive routing, optimized virtual channel assignments, integrated barrier synchronization support and considerable fault tolerance. The routers are built with LSI’s 500K ASIC technology with custom transmitters/ receivers driving low-voltage differential signals at 375 MHz, for a link data payload capacity of approximately 500 MB/s.
Software Overhead in Messaging Layers: Where Does the Time Go?
- In Proceedings of the Sixth Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI
, 1994
"... Despite improvements in network interfaces and software messaging layers, software communication overhead still dominates the hardware routing cost in most systems. In this study, we identify the sources of this overhead by analyzing software costs of typical communication protocols built atop the a ..."
Abstract
-
Cited by 68 (10 self)
- Add to MetaCart
Despite improvements in network interfaces and software messaging layers, software communication overhead still dominates the hardware routing cost in most systems. In this study, we identify the sources of this overhead by analyzing software costs of typical communication protocols built atop the active messages layer on the CM-5. We show that up to 50--70% of the software messaging costs are a direct consequence of the gap between specific network features such as arbitrary delivery order, finite buffering, and limited fault-handling, and the user communication requirements of in-order delivery, end-to-end flow control, and reliable transmission. However, virtually all of these costs can be eliminated if routing networks provide higher-level services such as in-order delivery, end-to-end flow control, and packet-level fault-tolerance. We conclude that significant cost reductions require changing the constraints on messaging layers: we propose designing networks and network interfaces...
Compressionless Routing: A Framework for Adaptive and Fault-tolerant Routing
, 1997
"... Compressionless Routing (CR) is a new adaptive routing framework which provides a unified framework for efficient deadlock-free adaptive routing and fault-tolerance. CR exploits the tight-coupling between wormhole routers for flow control to detect and recover from potential deadlock situations. Fa ..."
Abstract
-
Cited by 57 (5 self)
- Add to MetaCart
Compressionless Routing (CR) is a new adaptive routing framework which provides a unified framework for efficient deadlock-free adaptive routing and fault-tolerance. CR exploits the tight-coupling between wormhole routers for flow control to detect and recover from potential deadlock situations. Fault-tolerant Compressionless Routing (FCR) extends CR to support end-toend fault-tolerant delivery. Detailed routing algorithms, implementation complexity, and performance simulation results for CR and FCR are presented. These results show that the hardware for CR and FCR networks is modest. Further, CR and FCR networks can achieve superior performance to alternatives such as dimension-order routing. Compressionless Routing has several key advantages: deadlock-free adaptive routing in toroidal networks with no virtual channels, simple router designs, order-preserving message transmission, applicability to a wide variety of network topologies, and elimination of the need for buffer allocation messages. Fault-tolerant Compressionless Routing has several additional advantages: data integrity in the presence of transient faults (nonstop fault-tolerance), permanent faults tolerance, and elimination of the need for software buffering and retry for reliability. The advantages of CR and FCR not only simplify hardware support for adaptive routing and fault-tolerance, they also can simplify software communication layers.
A Comparison of Adaptive Wormhole Routing Algorithms
, 1993
"... . Improvement of message latency and network utilization in torus interconnection networks by increasing adaptivity in wormhole routing algorithms is studied. A recently proposed partially adaptive algorithm and four new fully-adaptive routing algorithms are compared with the well-known e-cube algor ..."
Abstract
-
Cited by 57 (2 self)
- Add to MetaCart
. Improvement of message latency and network utilization in torus interconnection networks by increasing adaptivity in wormhole routing algorithms is studied. A recently proposed partially adaptive algorithm and four new fully-adaptive routing algorithms are compared with the well-known e-cube algorithm for uniform, hotspot, and local traffic patterns. Our simulations indicate that the partially adaptive northlast algorithm, which causes unbalanced traffic in the network, performs worse than the nonadaptive e-cube routing algorithm for all three traffic patterns. Another result of our study is that the performance does not necessarily improve with full-adaptivity. In particular, a commonly discussed fully-adaptive routing algorithm, which uses 2 n virtual channels per physical channel of a k-ary n-cube, performs worse than e-cube for uniform and hotspot traffic patterns. The other three fully-adaptive algorithms, which give priority to messages based on distances traveled, perform ...
Multidestination Message Passing Mechanism Conforming to Base Wormhole Routing Scheme
, 1994
"... This paper proposes a novel concept of multidestination worm mechanism which allows a message to be propagated in a wormhole network conforming to the underlying base routing scheme (ecube, planar, turn, or fully adaptive). Using this model, any source has potential to deliver a message to multiple ..."
Abstract
-
Cited by 56 (22 self)
- Add to MetaCart
This paper proposes a novel concept of multidestination worm mechanism which allows a message to be propagated in a wormhole network conforming to the underlying base routing scheme (ecube, planar, turn, or fully adaptive). Using this model, any source has potential to deliver a message to multiple destinations in any valid path in the system conforming to the base routing scheme while encountering only a single communication start-up. The flexibility of sending unicast messages exists under this model as a subset operation. Two schemes are developed and evaluated under this model to perform fast multicasting on 2D/3D meshes/tori. Not only do these schemes demonstrate superiority over Umesh (unicast-based multicast) [25] and Hamiltonian-Path-based [24] schemes, they indicate a very unique and interesting result that the cost of multicast operations can be reduced or kept near-constant in e-cube systems as the degree of multicast (number of destinations/src) increases. The proposed sc...
Energy-Aware Mapping for Tile-based NoC Architectures under Performance Constraints
, 2003
"... In this paper, we present an algorithm which automatically maps the IPs/cores onto a generic regular Network on Chip (NoC) architecture such that the total communication energy is minimized. At the same time, the performance of the mapped system is guaranteed to satisfy the specified constraints thr ..."
Abstract
-
Cited by 52 (3 self)
- Add to MetaCart
In this paper, we present an algorithm which automatically maps the IPs/cores onto a generic regular Network on Chip (NoC) architecture such that the total communication energy is minimized. At the same time, the performance of the mapped system is guaranteed to satisfy the specified constraints through bandwidth reservation. As the main contribution, we first formulate the problem of energy-aware mapping, in a topological sense, and then propose an efficient branch-and-bound algorithm to solve it. Experimental results show that the proposed algorithm is very fast and robust, and significant energy savings can be achieved. For instance, for a complex video/audio SoC design, on average, 60.4% energy savings have been observed compared to an ad-hoc implementation.

