Results 1 - 10
of
22
Mercury and Freon: Temperature Emulation and Management for Server Systems
"... Power densities have been increasing rapidly at all levels of server systems. To counter the high temperatures resulting from these densities, systems researchers have recently started work on software-based thermal management. Unfortunately, research in this new area has been hindered by the limita ..."
Abstract
-
Cited by 47 (6 self)
- Add to MetaCart
Power densities have been increasing rapidly at all levels of server systems. To counter the high temperatures resulting from these densities, systems researchers have recently started work on software-based thermal management. Unfortunately, research in this new area has been hindered by the limitations imposed by simulators and real measurements. In this paper, we introduce Mercury, a software suite that avoids these limitations by accurately emulating temperatures based on simple layout, hardware, and componentutilization data. Most importantly, Mercury runs the entire software stack natively, enables repeatable experiments, and allows the study of thermal emergencies without harming hardware reliability. We validate Mercury using real measurements and a widely used commercial simulator. We use Mercury to develop Freon, a system that manages thermal emergencies in a server cluster without unnecessary performance degradation. Mercury will soon become available from
C-Oracle: Predictive Thermal Management for Data Centers
- In Symposium on High-Performance Computer Architecture
, 2008
"... Thermal management has become a critical requirement for today’s power-dense server clusters, due to the negative impact of high temperatures on the reliability of computer hardware. Recognizing this fact, researchers have started to design software-based thermal management policies that leverage hi ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
Thermal management has become a critical requirement for today’s power-dense server clusters, due to the negative impact of high temperatures on the reliability of computer hardware. Recognizing this fact, researchers have started to design software-based thermal management policies that leverage high-level information to control system-wide temperatures effectively. Unfortunately, designing these policies is currently a challenge, since it is difficult to predict the exact temperature and performance that would result from trying to react to a thermal emergency. Reactions that are excessively severe may cause unnecessary performance degradation and/or generate emergencies in other parts of the system, whereas reactions that are excessively mild may take relatively long to become effective (if at all), compromising the reliability of the system. To address this challenge, in this paper we propose C-Oracle, a software infrastructure for Internet services that dynamically predicts the temperature and performance impact of different thermal management reactions into the future, allowing the thermal management policy to select the best reaction at each point in time. C-Oracle makes predictions based on simple models of temperature, component utilization, and policy behavior that can be solved efficiently. We experimentally evaluate C-Oracle for thermal management policies based on load redistribution and dynamic voltage/frequency scaling in both single-tier and multi-tier services. Our results show that, regardless of management policy or service organization, C-Oracle enables non-trivial decisions that effectively manage thermal emergencies, while avoiding unnecessary performance degradation. 1
Power-aware dynamic placement of hpc applications
- in ICS. ACM
"... High Performance Computing applications and platforms have been typically designed without regard to power consumption. With increased awareness of energy cost, power management is now an issue even for compute-intensive server clusters. In this work, we investigate the use of power management techn ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
High Performance Computing applications and platforms have been typically designed without regard to power consumption. With increased awareness of energy cost, power management is now an issue even for compute-intensive server clusters. In this work, we investigate the use of power management techniques for high performance applications on modern power-efficient servers with virtualization support. We consider power management techniques such as dynamic consolidation and usage of dynamic power range enabled by low power states on servers. We identify application performance isolation and virtualization overhead with multiple virtual machines as the key bottlenecks for server consolidation. We perform a comprehensive experimental study to identify the scenarios where applications are isolated from each other. We also establish that the power consumed by HPC applications may be application dependent, non-linear and have a large dynamic range. We show that for HPC applications, working set size is a key parameter to take care of while placing applications on virtualized servers. We use the insights obtained from our experimental study to present a framework and methodology for power-aware application placement for HPC applications.
Multi-mode Energy Management for Multi-tier Server Clusters
"... This paper presents an energy management policy for reconfigurable clusters running a multi-tier application, exploiting DVS together with multiple sleep states. We develop a theoretical analysis of the corresponding power optimization problem and design an algorithm around the solution. Moreover, w ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
This paper presents an energy management policy for reconfigurable clusters running a multi-tier application, exploiting DVS together with multiple sleep states. We develop a theoretical analysis of the corresponding power optimization problem and design an algorithm around the solution. Moreover, we rigorously investigate selection of the optimal number of spare servers for each power state, a problem that has only been approached in an ad-hoc manner in current policies. To validate our results and policies, we implement them on an actual multi-tier server cluster where nodes support all power management techniques considered. Experimental results using realistic dynamic workloads based on the TPC-W benchmark show that exploiting multiple sleep states results in significant additional cluster-wide energy savings up to 23 % with little or no performance degradation.
Joint Optimization of Idle and Cooling Power in Data Centers While Maintaining Response Time
"... Server power and cooling power amount to a significant fraction of modern data centers ’ recurring costs. While data centers provision enough servers to guarantee response times under the maximum loading, data centers operate under much less loading most of the times (e.g., 30-70 % of the maximum lo ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Server power and cooling power amount to a significant fraction of modern data centers ’ recurring costs. While data centers provision enough servers to guarantee response times under the maximum loading, data centers operate under much less loading most of the times (e.g., 30-70 % of the maximum loading). Previous serverpower proposals exploit this under-utilization to reduce the server idle power by keeping active only as many servers as necessary and putting the rest into low-power standby modes. However, these proposals incur higher cooling power due to hot spots created by concentrating the data center loading on fewer active servers, or degrade response times due to standby-to-active transition delays, or both. Other proposals optimize the cooling power but incur considerable idle power. To address the first issue of power, we propose PowerTrade, which trades-off idle power and cooling power for each other, thereby reducing the total power. To address the second issue of response time, we propose SurgeGuard to overprovision the number of active servers beyond that needed by the current loading so as to absorb future increases in the loading. SurgeGuard is a two-tier scheme which uses well-known over-provisioning at coarse time granularities (e.g., one hour) to absorb the common, smooth increases in the loading, and a novel fine-grain replenishment of the over-provisioned reserves at fine time granularities (e.g., five minutes) to handle the uncommon, abrupt loading surges. Using real-world traces, we show that combining Power-Trade and SurgeGuard reduces total power by 30 % compared to previous low-power schemes while maintaining response times within 1.7%.
Server Workload Analysis for Power Minimization using Consolidation
"... Server consolidation has emerged as a promising technique to reduce the energy costs of a data center. In this work, we present the first detailed analysis of an enterprise server workload from the perspective of finding characteristics for consolidation. We observe significant potential for power s ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Server consolidation has emerged as a promising technique to reduce the energy costs of a data center. In this work, we present the first detailed analysis of an enterprise server workload from the perspective of finding characteristics for consolidation. We observe significant potential for power savings if consolidation is performed using off-peak values for application demand. However, these savings come up with associated risks due to consolidation, particularly when the correlation between applications is not considered. We also investigate the stability in utilization trends for low-risk consolidation. Using the insights from the workload analysis, two new consolidation methods are designed that achieve significant power savings, while containing the performance risk of consolidation. We present an implementation of the methodologies in a consolidation planning tool and provide a comprehensive evaluation study of the proposed methodologies.
Risk-Aware Limited Lookahead Control for Dynamic Resource Provisioning in Enterprise Computing Systems
- Cluster Computing
, 2007
"... Utility or on-demand computing, a provisioning model where a service provider makes computing infrastructure available to customers as needed, is becoming increasingly common in enterprise computing systems. Realizing this model requires making dynamic, and sometimes risky, resource provisioning and ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Utility or on-demand computing, a provisioning model where a service provider makes computing infrastructure available to customers as needed, is becoming increasingly common in enterprise computing systems. Realizing this model requires making dynamic, and sometimes risky, resource provisioning and allocation decisions in an uncertain operating environment to maximize revenue while reducing operating cost. This paper develops an optimization framework wherein the resource provisioning problem is posed as one of sequential decision making under uncertainty and solved using a limited lookahead control scheme. The proposed approach accounts for the switching costs incurred during resource provisioning and explicitly encodes risk in the optimization problem. Simulations using workload traces from the Soccer World Cup 1998 web site show that a computing system managed by our controller generates up to 20% more profit than a system without dynamic control while incurring low control overhead.
Cost-aware scheduling for heterogeneous enterprise machines (cash‘em
- in First International Workshop on Green Computing (GreenCom
, 2007
"... power scheduling, data centers, autonomic computing Data centers contain heterogeneous sets of machines. Some machines are faster and some – often the same ones – consume more energy and cost more to operate. The data center coordinator must decide how to allocate these machines to multiple applicat ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
power scheduling, data centers, autonomic computing Data centers contain heterogeneous sets of machines. Some machines are faster and some – often the same ones – consume more energy and cost more to operate. The data center coordinator must decide how to allocate these machines to multiple applications of potentially many customers, each of which has different requirements. Given a stream of customer requests for machines, how does the data center provider decide which machines to give to whom and when? We propose new algorithms for a cost-aware provider to maximize its profit as it makes admission and scheduling decisions for the customer requests. We show that it matters which machines are assigned to each customer, especially when the data center is undersaturated. (Most data centers are.) Our new algorithms do best when they try to anticipate the ”riskiness ” of their decisions, that is, the likelihood that even higher-value requests will arrive later. We also show that turning unused machines off, rather than leaving them idle, even using simple heuristics like “turn off a machine that has been idle for ten minutes, ” can save a lot of money. Finally, we show that having heterogeneity in the data center is, in fact, beneficial. We demonstrate that the same set of customers can be satisfied at a lower cost and a higher profit in a heterogeneous data center rather than in a data center comprised solely of the newest, fastest, machines.
Enhancing Energy Efficiency in Multi-tier Web Server Clusters via Prioritization
"... This paper investigates the design issues and energy savings benefits of service prioritization in multi-tier web server clusters. In many services, classes of clients can be naturally assigned different priorities based on their performance requirements. We show that if the whole multitier system i ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This paper investigates the design issues and energy savings benefits of service prioritization in multi-tier web server clusters. In many services, classes of clients can be naturally assigned different priorities based on their performance requirements. We show that if the whole multitier system is effectively prioritized, additional power and energy savings are realizable while keeping an existing cluster-wide energy management technique, through exploiting the different performance requirements of separate service classes. We find a simple prioritization scheme to be highly effective without requiring intrusive modifications to the system. In order to quantify its benefits, we perform extensive experimental evaluation on a real testbed. It is shown that the scheme significantly improves both total system power savings and energy efficiency, at the same time as improving throughput and enabling the system to meet per-class performance requirements. 1
Thermal Faults Modeling using a RC model with an Application to Web Farms
- In Proceedings of RTS
, 2007
"... Today’s CPUs consume a significant amount of power and generate a high amount of heat, requiring an active cooling system to support reliable operations. In case of cooling system failures, these CPUs can reduce clock speed to prevent damage due to overheating. Unfortunately, when these CPUs are use ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Today’s CPUs consume a significant amount of power and generate a high amount of heat, requiring an active cooling system to support reliable operations. In case of cooling system failures, these CPUs can reduce clock speed to prevent damage due to overheating. Unfortunately, when these CPUs are used in a real-time system, a clock control based on frequency-throttling can cause missed deadlines. In this paper, we first develop and validate a system-wide thermal model that can account for various thermal fault types such as failure of a CPU fan, faults in the case fan and air-conditioning malfunctions. Then we validate the thermal model through experimentation and measurements in AMD Linux boxes. Our soft real-time power-aware load-distribution algorithm for data centers incorporates a thermal model to minimize the number of missed deadlines that can be caused by thermal faults. We implemented the algorithm in a webserver farm simulator to test the efficacy of thermal-aware load-balancing. Our results show that the new algorithm helps keep CPU temperatures within the desired thermal envelope, even in the presence of thermal faults. When thermal faults occur, our algorithm improves the QoS, at the expense of higher energy consumption. 1

