Results 1 - 10
of
83
Mercury and Freon: Temperature Emulation and Management for Server Systems
"... Power densities have been increasing rapidly at all levels of server systems. To counter the high temperatures resulting from these densities, systems researchers have recently started work on software-based thermal management. Unfortunately, research in this new area has been hindered by the limita ..."
Abstract
-
Cited by 97 (9 self)
- Add to MetaCart
(Show Context)
Power densities have been increasing rapidly at all levels of server systems. To counter the high temperatures resulting from these densities, systems researchers have recently started work on software-based thermal management. Unfortunately, research in this new area has been hindered by the limitations imposed by simulators and real measurements. In this paper, we introduce Mercury, a software suite that avoids these limitations by accurately emulating temperatures based on simple layout, hardware, and componentutilization data. Most importantly, Mercury runs the entire software stack natively, enables repeatable experiments, and allows the study of thermal emergencies without harming hardware reliability. We validate Mercury using real measurements and a widely used commercial simulator. We use Mercury to develop Freon, a system that manages thermal emergencies in a server cluster without unnecessary performance degradation. Mercury will soon become available from
NapSAC: Design and Implementation of a Power-Proportional Web Cluster
- In GreenNet
, 2010
"... Energy consumption is a major and costly problem in data centers. A large fraction of this energy goes to powering idle machines that are not doing any useful work. We identify two causes of this inefficiency: low server utilization and a lack of power-proportionality. To address this problem we pre ..."
Abstract
-
Cited by 53 (3 self)
- Add to MetaCart
(Show Context)
Energy consumption is a major and costly problem in data centers. A large fraction of this energy goes to powering idle machines that are not doing any useful work. We identify two causes of this inefficiency: low server utilization and a lack of power-proportionality. To address this problem we present a design for an power-proportional cluster consisting of a power-aware cluster manager and a set of heterogeneous machines. Our design makes use of currently available energy-efficient hardware, mechanisms for transitioning in and out of low-power sleep states, and dynamic provisioning and scheduling to continually adjust to workload and minimize power consumption. With our design we are able to reduce energy consumption while maintaining acceptable response times for a web service workload based on Wikipedia. With our dynamic provisioning algorithms we demonstrate via simulation a 63 % savings in power usage in a typically provisioned datacenter where all machines are left on and awake at all times. Our results show that we are able to achieve close to 90 % of the savings a theoretically optimal provisioning scheme would achieve. We have also built a prototype cluster which runs Wikipedia to demonstrate the use of our design in a real environment.
Energy efficient resource management in virtualized cloud data centers
- in: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, CCGRID’10, IEEE Computer Society
, 2010
"... Rapid growth of the demand for computational power by scientific, business and web-applications has led to the creation of large-scale data centers consuming enormous amounts of electrical power. We propose an energy efficient resource management system for virtualized Cloud data centers that reduce ..."
Abstract
-
Cited by 51 (0 self)
- Add to MetaCart
(Show Context)
Rapid growth of the demand for computational power by scientific, business and web-applications has led to the creation of large-scale data centers consuming enormous amounts of electrical power. We propose an energy efficient resource management system for virtualized Cloud data centers that reduces operational costs and provides required Quality of Service (QoS). Energy savings are achieved by continuous consolidation of VMs according to current utilization of resources, virtual network topologies established between VMs and thermal state of computing nodes. We present first results of simulation-driven evaluation of heuristics for dynamic reallocation of VMs using live migration according to current requirements for CPU performance. The results show that the proposed technique brings substantial energy savings, while ensuring reliable QoS. This justifies further investigation and development of the proposed resource management system. 1.
Joint Optimization of Idle and Cooling Power in Data Centers While Maintaining Response Time
"... Server power and cooling power amount to a significant fraction of modern data centers ’ recurring costs. While data centers provision enough servers to guarantee response times under the maximum loading, data centers operate under much less loading most of the times (e.g., 30-70 % of the maximum lo ..."
Abstract
-
Cited by 48 (0 self)
- Add to MetaCart
(Show Context)
Server power and cooling power amount to a significant fraction of modern data centers ’ recurring costs. While data centers provision enough servers to guarantee response times under the maximum loading, data centers operate under much less loading most of the times (e.g., 30-70 % of the maximum loading). Previous serverpower proposals exploit this under-utilization to reduce the server idle power by keeping active only as many servers as necessary and putting the rest into low-power standby modes. However, these proposals incur higher cooling power due to hot spots created by concentrating the data center loading on fewer active servers, or degrade response times due to standby-to-active transition delays, or both. Other proposals optimize the cooling power but incur considerable idle power. To address the first issue of power, we propose PowerTrade, which trades-off idle power and cooling power for each other, thereby reducing the total power. To address the second issue of response time, we propose SurgeGuard to overprovision the number of active servers beyond that needed by the current loading so as to absorb future increases in the loading. SurgeGuard is a two-tier scheme which uses well-known over-provisioning at coarse time granularities (e.g., one hour) to absorb the common, smooth increases in the loading, and a novel fine-grain replenishment of the over-provisioned reserves at fine time granularities (e.g., five minutes) to handle the uncommon, abrupt loading surges. Using real-world traces, we show that combining Power-Trade and SurgeGuard reduces total power by 30 % compared to previous low-power schemes while maintaining response times within 1.7%.
Weatherman: Automated, online, and predictive thermal mapping and management for data centers
- In International Conference on Autonomic Computing
, 2006
"... Abstract — Recent advances have demonstrated the potential benefits of coordinated management of thermal load in data centers, including reduced cooling costs and improved resistance to cooling system failures. A key unresolved obstacle to the practical implementation of thermal load management is t ..."
Abstract
-
Cited by 47 (1 self)
- Add to MetaCart
(Show Context)
Abstract — Recent advances have demonstrated the potential benefits of coordinated management of thermal load in data centers, including reduced cooling costs and improved resistance to cooling system failures. A key unresolved obstacle to the practical implementation of thermal load management is the ability to predict the effects of workload distribution and cooling configurations on temperatures within a data center enclosure. The interactions between workload, cooling, and temperature are dependent on complex factors that are unique to each data center, including physical room layout, hardware power consumption, and cooling capacity; this dictates an approach that formulates management policies for each data center based on these properties. We propose and evaluate a simple, flexible method to infer a detailed model of thermal behavior within a data center from a stream of instrumentation data. This data — taken during normal data center operation — includes continuous readings taken from external temperature sensors, server instrumentation, and computer room air conditioning units. Experimental results from a representative data center show that automatic thermal mapping can predict accurately the heat distribution resulting from a given workload distribution and cooling configuration, thereby removing the need for static or manual configuration of thermal load management systems. We also demonstrate how our approach adapts to preserve accuracy across changes to cluster attributes that affect thermal behavior — such as cooling settings, workload distribution, and power consumption. I.
C-Oracle: Predictive Thermal Management for Data Centers
- In Symposium on High-Performance Computer Architecture
, 2008
"... Thermal management has become a critical requirement for today’s power-dense server clusters, due to the negative impact of high temperatures on the reliability of computer hardware. Recognizing this fact, researchers have started to design software-based thermal management policies that leverage hi ..."
Abstract
-
Cited by 44 (5 self)
- Add to MetaCart
(Show Context)
Thermal management has become a critical requirement for today’s power-dense server clusters, due to the negative impact of high temperatures on the reliability of computer hardware. Recognizing this fact, researchers have started to design software-based thermal management policies that leverage high-level information to control system-wide temperatures effectively. Unfortunately, designing these policies is currently a challenge, since it is difficult to predict the exact temperature and performance that would result from trying to react to a thermal emergency. Reactions that are excessively severe may cause unnecessary performance degradation and/or generate emergencies in other parts of the system, whereas reactions that are excessively mild may take relatively long to become effective (if at all), compromising the reliability of the system. To address this challenge, in this paper we propose C-Oracle, a software infrastructure for Internet services that dynamically predicts the temperature and performance impact of different thermal management reactions into the future, allowing the thermal management policy to select the best reaction at each point in time. C-Oracle makes predictions based on simple models of temperature, component utilization, and policy behavior that can be solved efficiently. We experimentally evaluate C-Oracle for thermal management policies based on load redistribution and dynamic voltage/frequency scaling in both single-tier and multi-tier services. Our results show that, regardless of management policy or service organization, C-Oracle enables non-trivial decisions that effectively manage thermal emergencies, while avoiding unnecessary performance degradation. 1
Understanding the Performance-Temperature Interactions in Disk I/O of Server Workloads
- Interactions in Disk I/O of Server Workloads. In Proceedings of HPCA
, 2006
"... This paper describes the first infrastructure for integrated studies of the performance and thermal behavior of storage systems. Using microbenchmarks running on this infrastructure, we first gain insight into how I/O characteristics can affect the temperature of disk drives. We use this analysis to ..."
Abstract
-
Cited by 38 (8 self)
- Add to MetaCart
This paper describes the first infrastructure for integrated studies of the performance and thermal behavior of storage systems. Using microbenchmarks running on this infrastructure, we first gain insight into how I/O characteristics can affect the temperature of disk drives. We use this analysis to identify the most promising, yet simple, “knobs ” for temperature optimization of high speed disks, which can be implemented on existing disks. We then analyze the thermal profiles of real workloads that use such disk drives in their storage systems, pointing out which knobs are most useful for dynamic thermal management when pushing the performance envelope.
Dynamic Thermal Management for Distributed Systems
- IN PROCEEDINGS OF THE FIRST WORKSHOP ON TEMPERATURE-AWARE COMPUTER SYSTEMS (TACS’04
, 2004
"... In modern data centers, the impact on the thermal properties by increased scale and power densities is enormous and poses new challenges on the designers of both computing as well as cooling systems. Controltheoretic techniques have proven to manage the heat dissipation and the temperature to avoid ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
In modern data centers, the impact on the thermal properties by increased scale and power densities is enormous and poses new challenges on the designers of both computing as well as cooling systems. Controltheoretic techniques have proven to manage the heat dissipation and the temperature to avoid thermal emergencies, but are not aware of the task currently executing or its specific service requirements. In this work we investigate an approach to dynamic thermal management with respect to the demands of individual applications, users or services. We show that the energy consumption and the temperature can be determined on a fine grained level and without the need for measurement, using information from event monitors embedded in modern processors. We extend the well-known abstraction of resource containers to an infrastructure for transparent energy and temperature management in distributed systems. In a cluster-based server, the processing of a request can be throttled to meet the thermal requirements of the system, even if machine boundaries are crossed, e.g. by remote procedure calls in a client /server relationship. With this facility, energy consumption can be accounted and the resulting heat generation be controlled precisely without the need for expensive hardware. Experiments on a Pentium 4 architecture show that energy and temperature are accurately determined and thermal limits for the individual CPU and the whole distributed system will not be exceeded. Use cases and important implications of our approach are discussed.
Greenhdfs: towards an energy-conserving, storage-efficient, hybrid hadoop compute cluster
- In Proceedings of the 2010 international conference on Power aware computing and systems
, 2010
"... Hadoop Distributed File System (HDFS) presents unique chal-lenges to the existing energy-conservation techniques and makes it hard to scale-down servers. We propose an energy-conserving, hybrid, logical multi-zoned variant of HDFS for managing data-processing intensive, commodity Hadoop cluster. Gre ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
(Show Context)
Hadoop Distributed File System (HDFS) presents unique chal-lenges to the existing energy-conservation techniques and makes it hard to scale-down servers. We propose an energy-conserving, hybrid, logical multi-zoned variant of HDFS for managing data-processing intensive, commodity Hadoop cluster. Green HDFS's data-classi cation-driven data placement allows scale-down by guaranteeing substantially long periods (several days) of idleness in a subset of servers in the datacenter designated as the Cold Zone. These servers are then transitioned to high-energy-saving, inactive power modes. This is done without impacting the perfor-mance of the Hot zone as studies have shown that the servers in the data-intensive compute clusters are under-utilized and, hence, opportunities exist for better consolidation of the workload on the Hot Zone. Analysis of the traces of a Yahoo! Hadoop clus-ter showed signicant heterogeneity in the data's access patterns which can be used to guide energy-aware data placement policies. The trace-driven simulation results with three-month-long real-life HDFS traces from a Hadoop cluster at Yahoo! show a 26% energy consumption reduction by doing only Cold zone power management. Analytical cost model projects savings of $14.6 million in 3-year total cost of ownership (TCO) and simulation results extrapolate savings of $2.4 million annually when Green-HDFS technique is applied across all Hadoop clusters (amounting to 38000 servers) at Yahoo. 1.
Data center evolution: A tutorial on state of the art, issues,
- and challenges,” Computer Networks,
, 2009
"... Abstract Data centers form a key part of the infrastructure upon which a variety of information technology services are built. As data centers continue to grow in size and complexity, it is desirable to understand aspects of their design that are worthy of carrying forward, as well as existing or u ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
(Show Context)
Abstract Data centers form a key part of the infrastructure upon which a variety of information technology services are built. As data centers continue to grow in size and complexity, it is desirable to understand aspects of their design that are worthy of carrying forward, as well as existing or upcoming shortcomings and challenges that would have to be addressed. We envision the data center evolving from owned physical entities to potentially outsourced, virtualized and geographically distributed infrastructures that still attempt to provide the same level of control and isolation that owned infrastructures do. We define a layered model for such data centers and provide a detailed treatment of state of the art and emerging challenges in storage, networking, management and power/thermal aspects.