Results 1 - 10
of
45
A Power Model for Routers: Modeling Alpha 21364 and InfiniBand Routers
- IN IEEE MICRO
, 2002
"... As interconnection networks proliferate to many new applications, a low-latency high-throughput fabric is no longer sufficient. Applications are becoming power-constrained. In this paper, we propose an architectural-level power model for interconnection network routers that will allow researchers an ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
As interconnection networks proliferate to many new applications, a low-latency high-throughput fabric is no longer sufficient. Applications are becoming power-constrained. In this paper, we propose an architectural-level power model for interconnection network routers that will allow researchers and designers to easily factor in power when exploring architectural trade-offs. We applied our model to two commercial routers -- the integrated Alpha 21364 router and the IBM 8-port 12X InfiniBand router, and show that the different micro-architectures lead to vastly different power consumption and distribution estimates.
Near-optimal worst-case throughput routing for two-dimensional mesh networks
- In International Symposium on Computer Architecture
, 2005
"... Minimizing latency and maximizing throughput are important goals in the design of routing algorithms for interconnection networks. Ideally, we would like a routing algorithm to (a) route packets using the minimal number of hops to reduce latency and preserve communication locality, (b) deliver good ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
Minimizing latency and maximizing throughput are important goals in the design of routing algorithms for interconnection networks. Ideally, we would like a routing algorithm to (a) route packets using the minimal number of hops to reduce latency and preserve communication locality, (b) deliver good worst-case and average-case throughput and (c) enable low-complexity (and hence, low latency) router implementation. In this paper, we focus on routing algorithms for an important class of interconnection networks: two dimensional (2D) mesh networks. Existing routing algorithms for mesh networks fail to satisfy one or more of design goals mentioned above. Variously, the routing algorithms suffer from poor worst case throughput (ROMM [13], DOR [23]), poor latency due to increased packet hops (VALIANT [31]) or increased latency due to hardware complexity (minimaladaptive [7, 30]). The major contribution of this paper is the design of an oblivious routing algorithm—O1TURN—with provable nearoptimal worst-case throughput, good average-case throughput, low design complexity and minimal number of network hops for 2D-mesh networks, thus satisfying all the stated design goals. O1TURN offers optimal worst-case throughput when the network radix (k in a kxk network) is even. When the network radix is odd, O1TURN is within a 1/k 2 factor of optimal worst-case throughput. O1TURN achieves superior or comparable average-case throughput with global traffic as well as local traffic. For example, O1TURN achieves 18.8%, 0.7 % and 13.6 % higher average-case throughput than DOR, ROMM and VALIANT routing, respectively when averaged over one million random traffic patterns on an 8x8 network. Finally, we demonstrate that O1TURN is well suited for a partitioned router implementation that is of similar delay complexity as a simple dimension-ordered router. Our implementation incurs a marginal increase in switch arbi-tration delay that is completely hidden in pipelined routers as it is not on the clock-critical path. 1.
Thermal Modeling, Characterization and Management of On-chip Networks
- In Proceedings of the International Symposium on Microarchitecture (MICRO
, 2004
"... Due to the wire delay constraints in deep submicron technology and increasing demand for on-chip bandwidth, networks are becoming the pervasive interconnect fabric to connect processing elements on chip. With ever-increasing power density and cooling costs, the thermal impact of onchip networks need ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
Due to the wire delay constraints in deep submicron technology and increasing demand for on-chip bandwidth, networks are becoming the pervasive interconnect fabric to connect processing elements on chip. With ever-increasing power density and cooling costs, the thermal impact of onchip networks needs to be urgently addressed.
Flattened butterfly: A cost-efficient topology for high-radix networks
- in Proc. of the Intl. Symp. on Computer Architecture
, 2007
"... Increasing integrated-circuit pin bandwidth has motivated a corresponding increase in the degree or radix of interconnection networks and their routers. This paper introduces the flattened butterfly, a cost-efficient topology for highradix networks. On benign (load-balanced) traffic, the flattened b ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Increasing integrated-circuit pin bandwidth has motivated a corresponding increase in the degree or radix of interconnection networks and their routers. This paper introduces the flattened butterfly, a cost-efficient topology for highradix networks. On benign (load-balanced) traffic, the flattened butterfly approaches the cost/performance of a butterfly network and has roughly half the cost of a comparable performance Clos network. The advantage over the Clos is achieved by eliminating redundant hops when they are not needed for load balance. On adversarial traffic, the flattened butterfly matches the cost/performance of a folded-Clos network and provides an order of magnitude better performance than a conventional butterfly. In this case, global adaptive routing is used to switch the flattened butterfly from minimal to non-minimal routing — using redundant hops only when they are needed. Minimal and non-minimal, oblivious and adaptive routing algorithms are evaluated on the flattened butterfly. We show that load-balancing adversarial traffic requires non-minimal globally-adaptive routing and show that sequential allocators are required to avoid transient load imbalance when using adaptive routing algorithms. We also compare the cost of the flattened butterfly to folded-Clos, hypercube, and butterfly networks with identical capacity and show that the flattened butterfly is more cost-efficient than folded-Clos and hypercube topologies.
Power-Aware Communication Optimization for Networks-On-Chips With Voltage Scalable Links
- in Proc. CODES+ISSS'04
, 2004
"... Networks-on-Chip (NoC) is emerging as a practical development platform for future systems-on-chip products. We propose an energyefficient static algorithm which optimizes the energy consumption of task communications in NoCs with voltage scalable links. In order to find optimal link speeds, the prop ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Networks-on-Chip (NoC) is emerging as a practical development platform for future systems-on-chip products. We propose an energyefficient static algorithm which optimizes the energy consumption of task communications in NoCs with voltage scalable links. In order to find optimal link speeds, the proposed algorithm (based on a genetic formulation) globally explores the design space of NoCbased systems, including task assignment, tile mapping, routing path allocation, task scheduling and link speed assignment. Experimental results show that the proposed design technique can reduce energy consumption by 28 % on average compared with existing techniques.
Power Aware Scheduling of Bag-of-Tasks Applications with Deadline Constraints on DVS-enabled Clusters
"... Power-aware scheduling problem has been a recent issue in cluster systems not only for operational cost due to electricity cost, but also for system reliability. As recent commodity processors support multiple operating points under various supply voltage levels, Dynamic Voltage Scaling (DVS) schedu ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
Power-aware scheduling problem has been a recent issue in cluster systems not only for operational cost due to electricity cost, but also for system reliability. As recent commodity processors support multiple operating points under various supply voltage levels, Dynamic Voltage Scaling (DVS) scheduling algorithms can reduce power consumption by controlling appropriate voltage levels. In this paper, we provide power-aware scheduling algorithms for bagof-tasks applications with deadline constraints on DVSenabled cluster systems in order to minimize power consumption as well as to meet the deadlines specified by application users. A bag-of-tasks application should finish all the sub-tasks before the deadline, so that the DVS scheduling scheme should consider the deadline as well. We provide the DVS scheduling algorithms for both time-shared and space-shared resource sharing policies. The simulation results show that the proposed algorithms reduce much power consumption compared to static voltage schemes. 1.
A New Scalable and Cost-Effective Congestion Management Strategy for Lossless Multistage Interconnection Networks
- Proc. 11th IEEE Int. Symp. High-Perf. Computer Arch. (HPCA-11
, 2005
"... In this paper, we propose a new congestion management strategy for lossless multistage interconnection networks that scales as network size and/or link bandwidth increase. Instead of eliminating congestion, our strategy avoids performance degradation beyond the saturation point by eliminating the HO ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
In this paper, we propose a new congestion management strategy for lossless multistage interconnection networks that scales as network size and/or link bandwidth increase. Instead of eliminating congestion, our strategy avoids performance degradation beyond the saturation point by eliminating the HOL blocking produced by congestion trees. This is achieved in a scalable manner by using separate queues for congested flows. These are dynamically allocated only when congestion arises, and deallocated when congestion subsides. Performance evaluation results show that our strategy responds to congestion immediately and completely eliminates the performance degradation produced by HOL blocking while using only a small number of additional queues. 1.
Design-Space Exploration of Power-Aware On/Off Interconnection Networks
"... With power a major limiting factor in the design of scalable interconnected systems, power-aware networks will become inherent components of single-chip and multi-chip systems. As communication links consume significant power regardless of utilization, we propose and investigate power-aware networks ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
With power a major limiting factor in the design of scalable interconnected systems, power-aware networks will become inherent components of single-chip and multi-chip systems. As communication links consume significant power regardless of utilization, we propose and investigate power-aware networks whose links are turned on and off in response to bursts and dips in traffic. We explore the design space of such on/off networks, outlining a 5-step design methodology along with solutions at each step that can form the building blocks of numerous designs. Two specific designs targeting links with substantially different on/off times are then presented and evaluated. Our simulations show that up to 54.4% power savings can be achieved along with at most 7.5% increase in latency.
A Feasibility Study for Power Management in LAN Switches
- In Proceedings of the 12th IEEE International Conference on Network Protocols
, 2004
"... Abstract — We examine the feasibility of introducing power management schemes in network devices in the LAN. Specifically, we investigate the possibility of putting various components on LAN switches to sleep during periods of low traffic activity. Traffic collected in our LAN indicates that there a ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Abstract — We examine the feasibility of introducing power management schemes in network devices in the LAN. Specifically, we investigate the possibility of putting various components on LAN switches to sleep during periods of low traffic activity. Traffic collected in our LAN indicates that there are significant periods of inactivity on specific switch interfaces. Using an abstract sleep model devised for LAN switches, we examine the potential energy savings possible for different times of day and different interfaces (e.g., interfaces connecting to hosts to switches, or interfaces connecting switches, or interfaces connecting switches and routers). Algorithms developed for sleeping, based on periodic protocol behavior as well as traffic estimation are shown to be capable of conserving significant amounts of energy. Our results show that sleeping is indeed feasible in the LAN and in some cases, with very little impact on other protocols. However, we note that in order to maximize energy savings while minimizing sleep-related losses, we need hardware that supports sleeping. I.
Compiler-Directed Dynamic Voltage and Frequency Scaling for CPU Power and Energy Reduction
, 2003
"... OF THE DISSERTATION COMPILER-DIRECTED DYNAMIC VOLTAGE AND FREQUENCY SCALING FOR CPU POWER AND ENERGY REDUCTION by Chung-Hsing Hsu Dissertation Director: Ulrich Kremer The high power consumption of a processor is becoming a critical problem for both battery-powered devices and high-performance ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
OF THE DISSERTATION COMPILER-DIRECTED DYNAMIC VOLTAGE AND FREQUENCY SCALING FOR CPU POWER AND ENERGY REDUCTION by Chung-Hsing Hsu Dissertation Director: Ulrich Kremer The high power consumption of a processor is becoming a critical problem for both battery-powered devices and high-performance computers. It reduces circuit reliability, complicates the cooling technology, shortens the battery lifetime, and increases the production and operation costs of a CPU. One e#ective technique, called dynamic voltage scaling (DVS), achieves CPU power reduction through lowering the CPU supply voltage and clock frequency at runtime. It is e#ective because the CPU power is proportional to the clock frequency and to the square of the supply voltage. However, the CPU power savings come at the cost of degraded performance due to the slower clock frequency. Furthermore, the longer the CPU runs, the more power other computer components (e.g., disk and screen) will consume; not to mention that a user may not be willing to sacrifice any performance. Therefore, DVS should only be applied when it will not noticeably a#ect performance.

