Results 1 - 10
of
47
Energy-Efficient, Thermal-Aware Task Scheduling for Homogeneous, High Performance Computing Data Centers: A Cyber-Physical Approach
"... Abstract—High Performance Computing data centers have been rapidly growing, both in number and size. Thermal management of data centers can address dominant problems associated with cooling such as the recirculation of hot air from the equipment outlets to their inlets, and the appearance of hot spo ..."
Abstract
-
Cited by 76 (5 self)
- Add to MetaCart
(Show Context)
Abstract—High Performance Computing data centers have been rapidly growing, both in number and size. Thermal management of data centers can address dominant problems associated with cooling such as the recirculation of hot air from the equipment outlets to their inlets, and the appearance of hot spots. In this paper, we show through formalization that minimizing the peak inlet temperature allows for the lowest cooling power needs. Using a low-complexity, linear heat recirculation model, we define the problem of minimizing the peak inlet temperature within a data center through task assignment (MPIT-TA), consequently leading to minimal cooling requirement. We also provide two methods to solve the formulation, XInt-GA, which uses a genetic algorithm and XInt-SQP, which uses sequential quadratic programming. Results from small-scale data center simulations show that solving the formulation leads to an inlet temperature distribution that, compared to other approaches, is 2 °C to 5 °C lower and achieves about 20%-30 % cooling energy savings at common data center utilization rates. Moreover, our algorithms consistently outperform MinHR, a recirculation-reducing task placement algorithm in the literature.
A survey of Autonomic Computing -- degrees, models and applications
"... Autonomic Computing is a concept that brings together many fields of computing with the purpose of creating computing systems that self-manage. In its early days it was criticised as being a “hype topic” or a rebadging of some Multi Agent Systems work. In this survey, we hope to show that this was n ..."
Abstract
-
Cited by 65 (1 self)
- Add to MetaCart
Autonomic Computing is a concept that brings together many fields of computing with the purpose of creating computing systems that self-manage. In its early days it was criticised as being a “hype topic” or a rebadging of some Multi Agent Systems work. In this survey, we hope to show that this was not indeed ’hype ’ and that, though it draws on much work already carried out by the Computer Science and Control communities, its innovation is strong and lies in its robust application to the specific self-management of computing systems. To this end, we first provide an introduction to the motivation and concepts of autonomic computing and describe some research that has been seen as seminal in influencing a large proportion of early work. Taking the components of an established reference model in turn, we discuss the works that have provided significant contributions to that area. We then look at larger scaled systems that compose autonomic systems illustrating the hierarchical nature of their architectures. Autonomicity is not a well defined subject and as such different systems adhere to different degrees of Autonomicity, therefore we cross-slice the body of work in terms of these degrees. From this we list the key applications of autonomic computing and discuss the research work that is missing and what we believe the community should be considering.
Joint Optimization of Idle and Cooling Power in Data Centers While Maintaining Response Time
"... Server power and cooling power amount to a significant fraction of modern data centers ’ recurring costs. While data centers provision enough servers to guarantee response times under the maximum loading, data centers operate under much less loading most of the times (e.g., 30-70 % of the maximum lo ..."
Abstract
-
Cited by 48 (0 self)
- Add to MetaCart
(Show Context)
Server power and cooling power amount to a significant fraction of modern data centers ’ recurring costs. While data centers provision enough servers to guarantee response times under the maximum loading, data centers operate under much less loading most of the times (e.g., 30-70 % of the maximum loading). Previous serverpower proposals exploit this under-utilization to reduce the server idle power by keeping active only as many servers as necessary and putting the rest into low-power standby modes. However, these proposals incur higher cooling power due to hot spots created by concentrating the data center loading on fewer active servers, or degrade response times due to standby-to-active transition delays, or both. Other proposals optimize the cooling power but incur considerable idle power. To address the first issue of power, we propose PowerTrade, which trades-off idle power and cooling power for each other, thereby reducing the total power. To address the second issue of response time, we propose SurgeGuard to overprovision the number of active servers beyond that needed by the current loading so as to absorb future increases in the loading. SurgeGuard is a two-tier scheme which uses well-known over-provisioning at coarse time granularities (e.g., one hour) to absorb the common, smooth increases in the loading, and a novel fine-grain replenishment of the over-provisioned reserves at fine time granularities (e.g., five minutes) to handle the uncommon, abrupt loading surges. Using real-world traces, we show that combining Power-Trade and SurgeGuard reduces total power by 30 % compared to previous low-power schemes while maintaining response times within 1.7%.
Energy-Efficient Management of Data Center Resources for Cloud Computing: A Vision, Architectural Elements, and Open Challenges
"... Cloud computing is offering utility-oriented IT services to users worldwide. Based on a pay-as-you-go model, it enables hosting of pervasive applications from consumer, scientific, and business domains. However, data centers hosting Cloud applications consume huge amounts of energy, contributing to ..."
Abstract
-
Cited by 46 (3 self)
- Add to MetaCart
(Show Context)
Cloud computing is offering utility-oriented IT services to users worldwide. Based on a pay-as-you-go model, it enables hosting of pervasive applications from consumer, scientific, and business domains. However, data centers hosting Cloud applications consume huge amounts of energy, contributing to high operational costs and carbon footprints to the environment. Therefore, we need Green Cloud computing solutions that can not only save energy for the environment but also reduce operational costs. This paper presents vision, challenges, and architectural elements for energy-efficient management of Cloud computing environments. We focus on the development of dynamic resource provisioning and allocation algorithms that consider the synergy between various data center infrastructures (i.e., the hardware, power units, cooling and software), and holistically work to boost data center energy efficiency and performance. In particular, this paper proposes (a) architectural principles for energy-efficient management of Clouds; (b) energy-efficient resource allocation policies and scheduling algorithms considering quality-of-service expectations, and devices power usage characteristics; and (c) a novel software technology for energy-efficient management of Clouds. We have validated our approach by conducting a set of rigorous performance evaluation study using the CloudSim toolkit. The results demonstrate that Cloud computing model has immense potential as it offers significant performance gains as regards to response time and cost saving under
C-Oracle: Predictive Thermal Management for Data Centers
- In Symposium on High-Performance Computer Architecture
, 2008
"... Thermal management has become a critical requirement for today’s power-dense server clusters, due to the negative impact of high temperatures on the reliability of computer hardware. Recognizing this fact, researchers have started to design software-based thermal management policies that leverage hi ..."
Abstract
-
Cited by 44 (5 self)
- Add to MetaCart
(Show Context)
Thermal management has become a critical requirement for today’s power-dense server clusters, due to the negative impact of high temperatures on the reliability of computer hardware. Recognizing this fact, researchers have started to design software-based thermal management policies that leverage high-level information to control system-wide temperatures effectively. Unfortunately, designing these policies is currently a challenge, since it is difficult to predict the exact temperature and performance that would result from trying to react to a thermal emergency. Reactions that are excessively severe may cause unnecessary performance degradation and/or generate emergencies in other parts of the system, whereas reactions that are excessively mild may take relatively long to become effective (if at all), compromising the reliability of the system. To address this challenge, in this paper we propose C-Oracle, a software infrastructure for Internet services that dynamically predicts the temperature and performance impact of different thermal management reactions into the future, allowing the thermal management policy to select the best reaction at each point in time. C-Oracle makes predictions based on simple models of temperature, component utilization, and policy behavior that can be solved efficiently. We experimentally evaluate C-Oracle for thermal management policies based on load redistribution and dynamic voltage/frequency scaling in both single-tier and multi-tier services. Our results show that, regardless of management policy or service organization, C-Oracle enables non-trivial decisions that effectively manage thermal emergencies, while avoiding unnecessary performance degradation. 1
Sensor-based fast thermal evaluation model for energy efficient high-performance datacenters
- IN INT’L CONF. INTELLIGENT SENSING & INFO. PROC. (ICISIP2006
, 2006
"... In this work, we propose an abstract heat flow model which uses temperature information from onboard and ambient sensors, characterizes hot air recirculation based on these information, and accelerates the thermal evaluation process for high performance datacenters. This is critical to minimize ener ..."
Abstract
-
Cited by 39 (9 self)
- Add to MetaCart
In this work, we propose an abstract heat flow model which uses temperature information from onboard and ambient sensors, characterizes hot air recirculation based on these information, and accelerates the thermal evaluation process for high performance datacenters. This is critical to minimize energy costs, optimize computing resources, and maximize computation capability of the datacenters. Given a workload and thermal profile, obtained from various distributed sensors, we predict the resulting temperature distribution in a fast and accurate manner taking into account the recirculation characterization of a datacenter topology. Simulation results confirm our hypothesis that heat recirculation can be characterized as cross interference in our abstract heat flow model. Moreover, fast thermal evaluation based on cross interference can be used in online thermal management to predict temperature distribution in real-time.
S.K.S.: Spatio-temporal thermal-aware job scheduling to minimize energy consumption in virtualized heterogeneous data centers (Elsevier
- Computer Networks, Special Issue on Resource Management in Heterogeneous Data Centers
"... Job scheduling in data centers can be considered from a cyber-physical point of view, as it affects the data center’s computing performance (i.e. the cyber aspect) and energy efficiency (the physical aspect). Driven by the growing needs to green contemporary data centers, this paper uses recent tech ..."
Abstract
-
Cited by 38 (9 self)
- Add to MetaCart
Job scheduling in data centers can be considered from a cyber-physical point of view, as it affects the data center’s computing performance (i.e. the cyber aspect) and energy efficiency (the physical aspect). Driven by the growing needs to green contemporary data centers, this paper uses recent technological advances in data center virtualization and proposes cyber-physical, spatio-temporal (i.e. start time and servers assigned), thermal-aware job scheduling algorithms that minimize the energy consumption of the data center under performance constraints (i.e. deadlines). Savings are possible by being able to temporally “spread ” the workload, assign it to energy-efficient computing equipment, and further reduce the heat recirculation and therefore the load on the cooling systems. This paper provides three categories of thermal-aware energy-saving scheduling techniques: a) FCFS-Backfill-XInt and FCFS-Backfill-LRH, thermal-aware job placement enhancements to the popular first-come first-serve with back-filling (FCFSbackfill) scheduling policy; b) EDF-LRH, an online earliest-deadline-first scheduling algorithm with thermal-aware placement; and c) an offline genetic algorithm for SCheduling to minimize thermal cross-INTerference (SCINT), which is suited for batch scheduling of backlogs. Simulation results, based on real job logs from the ASU Fulton HPC data center, show that the thermal-aware enhancements to FCFS-backfill achieve up to 25 % savings compared to FCFS-backfill with first-fit placement, depending on the intensity of the incoming workload, while SCINT achieves up to 60 % savings. The performance of EDF-LRH nears that of the offline SCINT for low loads, and it degrades to the performance of FCFS-backfill for high loads. However, EDF-LRH requires milliseconds
Understanding and abstracting total data center power
- In Workshop on Energy-Efficient Design
, 2009
"... The alarming growth of data center power consumption has led to a surge in research activity on data center energy effi-ciency. Though myriad, most existing energy-efficiency efforts focus narrowly on a single data center subsystem. Sophis-ticated power management increases dynamic power ranges and ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
The alarming growth of data center power consumption has led to a surge in research activity on data center energy effi-ciency. Though myriad, most existing energy-efficiency efforts focus narrowly on a single data center subsystem. Sophis-ticated power management increases dynamic power ranges and can lead to complex interactions among IT, power, and cooling systems. However, reasoning about total data center power is difficult because of the diversity and complexity of this infrastructure. In this paper, we develop an analytic framework for model-ing total data center power. We collect power models from a variety of sources for each critical data center component. These component-wise models are suitable for integration into a detailed data center simulator that tracks subsystem utiliza-tion and interaction at fine granularity. We outline the de-sign for such a simulator. Furthermore, to provide insight into average data center behavior and enable rapid back-of-the-envelope reasoning, we develop abstract models that replace key simulation steps with simple parametric models. To our knowledge, our effort is the first attempt at a comprehensive framework for modeling total data center power. 1.
Thermal-aware task scheduling for data centers through minimizing heat recirculation
- in IEEE Cluster
, 2007
"... Abstract — The thermal environment of data centers plays a significant role in affecting the energy efficiency and the reliability of data center operation. A dominant problem associated with cooling data centers is the recirculation of hot air from the equipment outlets to their inlets, causing the ..."
Abstract
-
Cited by 28 (5 self)
- Add to MetaCart
(Show Context)
Abstract — The thermal environment of data centers plays a significant role in affecting the energy efficiency and the reliability of data center operation. A dominant problem associated with cooling data centers is the recirculation of hot air from the equipment outlets to their inlets, causing the appearance of hot spots and an uneven inlet temperature distribution. Heat is generated due to the execution of tasks, and it varies according to the power profile of a task. We are looking into the prospect of assigning the incoming tasks around the data center in such a way so as to make the inlet temperatures as even as possible; this will allow for considerable cooling power savings. Based on our previous research work on characterizing the heat recirculation in terms of cross-interference coefficients, we propose a task scheduling algorithm for homogeneous data centers, called XInt, that minimizes the inlet temperatures, and leads to minimal heat recirculation and minimal cooling energy cost for data center operation. We verify, through both theoretical formalization and simulation, that minimizing heat recirculation will result in the best cooling energy efficiency. XInt leads to an inlet temperature distribution that is 2 ◦ C to 5 ◦ C lower than other approaches, and achieves about 20%-30 % energy savings at moderate data center utilization rates. XInt also consistently achieves the best energy efficiency compared to another recirculation minimized algorithm, MinHR. I.
Thermal Aware Server Provisioning And Workload Distribution For Internet Data Centers ∗
"... With the increasing popularity of Internet-based information retrieval and cloud computing, saving energy in Internet data centers (a.k.a. hosting centers, server farms) is of increasing importance. Current research approaches are based on dynamically adjusting the active server set in order to turn ..."
Abstract
-
Cited by 23 (7 self)
- Add to MetaCart
(Show Context)
With the increasing popularity of Internet-based information retrieval and cloud computing, saving energy in Internet data centers (a.k.a. hosting centers, server farms) is of increasing importance. Current research approaches are based on dynamically adjusting the active server set in order to turn off a portion of the servers and save energy without compromising the quality of service; the workload is then distributed, conventionally equally (i.e. balanced), across the active servers. Although there is ample work that demonstrates energy savings through dynamic server provisioning, there is little work on thermal-aware server provisioning. This paper provides a formulation of the thermal aware active server set provisioning (TASP), in a nonlinear minimax binary integer programming form, and a series of heuristic approaches to solving them, namely MiniMax, bb-sLRH, CP-sLRH and sLRH. Furthermore, it introduces thermal-aware workload distribution (TAWD) among the active servers. The proposed heuristics are evaluated using a thermal model of the ASU HPCI data center, while the request traffic is based on real web traces of the 1998 FIFA World Cup as well as the SPECweb2009 suite. The TASP heuristics are found to outperform a power-aware–only server set selection scheme (CPSP), by up to 9.3 % for the simulated scenario. The order of achieved energy efficiency is: MiniMax (9.3 % savings), CP-sLRH (9.2%), bb-sLRH (8.6%), sLRH (5.8%), compared to CPSP.