DMCA
Literature Review A Dynamic Control System for Energy Efficient Cooling of Data Centres A Dynamic Control System for Energy Efficient Cooling of Data Centres
BibTeX
@MISC{White_literaturereview,
author = {Mark White},
title = {Literature Review A Dynamic Control System for Energy Efficient Cooling of Data Centres A Dynamic Control System for Energy Efficient Cooling of Data Centres},
year = {}
}
OpenURL
Abstract
Overview This literature review analyses 4 academic papers. The main objectives of the review are to: 1. Trace the evolution of recent (2000 -2011) data centre energy efficiency research and development 2. Document the most relevant and significant literature relating to energy efficient cooling for high-density data centres 3. Identify opportunities for further research Consumer demand is increasing for higher volumes of data storage and faster data transmission rates. As a result of the recent economic downturn, data centre operating budgets are being more closely monitored. Crucially, with reduced capital budgets most data centres do not have the option of purchasing additional space. To deal with demand, many have already outsourced a percentage of their operation with many more considering similar moves in the months ahead. For the existing infrastructure in the data centre to remain viable, space has become a priority. The primary solution adopted by the industry in recent times has been to increase the density of IT equipment. Increasing density is like adding more rooms to a house without extending the property. The house can now accommodate more people but each person has less space. In the data centre there are now more servers (known as blades) resulting in more compute/storage capability per square foot. Despite the space-saving advantages and virtualisation techniques which reduce the number of servers required to host applications even further, the primary disadvantage of increased density is that each blades require significantly more power than its predecessor. A standard rack with 65-70 blades operating at high loads might require 20 -30kW of power compared with previous rack consumptions of 2 -5kW. This additional power generates additional heat. Heat in the rack, and resultant heat in the room, must be removed to maintain the equipment at a safe operating temperature and humidity. In relation to energy efficiency opportunities, the hardware components which constitute the main subsystems of a typical data centre are: Examining these subsystems in a little more detail will help to narrow the focus of this review. Power Supply & Distribution to IT Equipment The power being provided to the IT equipment in the racks is typically routed through an Uninterruptible Power Supply (UPS) which feeds Power Distribution Units (PDUs) located in or near the rack. Through use of better components, circuit design and right-sizing strategies, manufacturers such as APC and Liebert have turned their attention to maximising efficiency across the full load spectrum, without sacrificing redundancy. Some opportunities may exist in efforts to re-balance the load across the 3 phases supplying the power to the racks but efficiencies in the power supply & distribution system are outside the scope of this research. Servers, Storage Devices & Network Equipment Manufacturers such as IBM and Intel are designing increasingly efficient blades with features such as chip-level thermal strategies, multicore processors and power management leading the way. Energy efficiencies in this subsystem are beyond the scope of most data centres. However, there may be opportunities to harness these technological developments in designs for future cooling systems. Enterprise operators such as Google and Facebook have recently designed and installed their own servers which have demonstrated increased efficiencies but these servers are specifically "fit-for-purpose". They may not be sufficiently generic to be applicable for other data centre configurations. Cooling There are a variety of standard systems for cooling in data centres but all typically involve Air Handling Units (AHUs) or Computer Room Air Handlers (CRAHs). The majority of modern data centres have aligned their racks in an alternating hot aisle / cold aisle configuration with cold air from the AHU(s) entering the cold aisle through perforated or grated tiles above a sub-floor plenum. Hot air is exhausted from the rear of the racks and removed from the room by the same AHU(s). Depending on the configuration of the data centre, the heat removal system might potentially consume 50% of a typical data centre"s energy. This review focuses attention on possible efficiency gains specific to this subsystem. Industry is currently embracing a number of opportunities -involving temperature and airflow analysis -for increased efficiency in the cooling system, most notably: 1. aisle containment strategies 2. increasing the temperature rise (ΔT) across the rack 3. raising the operating temperature of the AHU(s) 4. repositioning AHU temperature and humidity sensors 5. thermal management by balancing the IT load layout [1, 2] 6. "free cooling" -eliminating the high-consumption chiller from the system through the use of strategies such as air and water-side economisers In addition to temperature maintenance, the AHUs also vary the humidity of the air entering the room according to setpoints. Low humidity (dry air) may cause static which has the potential to create short circuits in the electronics. High levels of moisture in the air may lead to faster component degradation. Although less of a concern as a result of field experience and recent studies performed by Intel and others, humidity ranges have been defined for the industry and should be observed to maximise the lifetime of the IT equipment. Maintaining humidity ranges may increase equipment replacement intervals and, as a result, have a net positive outcome on capital expenditure budgets. Industry Standards & Guidelines Standards Energy efficiency standards are widely recognised across the data centre industry. Power Usage Effectiveness 2 (PUE2) [3a] is now the de facto indicator of a data centre"s efficiency. It is defined as the ratio of all electricity used by the data centre to the electricity used just by the IT equipment. In contrast to the original PUE [3b] rated in kilowatts of power (kW), PUE2 must be the highest measured kilowatt hour (kWh) reading. In 3 of the 4 PUE categories now defined, the readings must span a 12 month period, eliminating the effect of seasonal fluctuations in ambient temperature: Surprisingly, efforts to improve efficiency have not been implemented to the extent one would expect. 73% of respondents to the most recent Uptime Institute survey Guidelines Data centre guidelines are intermittently published by The American Society of Heating, Refrigeration and Air Conditioning Engineers (ASHRAE). These guidelines [6a, 6b, 6c] suggest "allowable" and "recommended" temperature and humidity ranges within which it is safe to operate IT equipment. The most recent edition of the guidelines [6c] suggests operating temperatures of 18 -27⁰C. The maximum for humidity is 60% Relative Humidity (RH). One of the more interesting objectives of the recent guidelines is to copper fasten the rack inlet as the position where the temperature and humidity should be measured. The majority of data centres currently measure at the return inlet to the AHU despite the inlet to the racks A Dynamic Control System for Energy Efficient Cooling of Data Centres August 15, 2011 5 being a more accurate position to measure temperature and humidity metrics. The purpose of the cooling system is to maintain the equipment in the racks at safe operating temperatures. It is the temperature (and humidity) at these racks that should be monitored rather than the air returning to the AHU. The closer to the server the measurements are taken the more "sensitive-to-reality" dependent cooling actuations will be. An even more accurate measurement position than the rack inlet is discussed in Paper 4 of this review where it is suggested that the temperature, humidity and air flow metrics should be collected from the servers themselves. With recent developments in chip design, this is becoming increasingly possible but has not yet been implemented in the wider data centre environment. EPA Report to US Congress In response to Public Law 109-431 the U.S. EPA ENERGY STAR Program released a report to the U.S. Congress on 2 nd August 2007 [7]. The report included assessment of opportunities for energy efficiency improvements for data centres in the United States. Prior to the report, global energy efficiency efforts were fragmented across the data centre industry. The process of preparing the report brought all the major industry players together and formed the baseline for most of the subsequent research and development which has taken place. Energy Efficiency Opportunities In an effort to identify a range of energy efficiency opportunities, 3 main improvement scenarios were formulated by the EPA report: 1. Improved Operation: maximises the efficiency of the existing data centre infrastructure by utilising improvements such as "free cooling" and raising temperature/humidity set-points. Minimal capital cost ("the low hanging fruit") is incurred by the operator 2. Best Practice: adopt practices and technologies used in the most energy-efficient facilities 3. State-of-the-art: uses all available energy efficiency practices and technologies The suggested improvements directly relating to the cooling system included: It was clear that the increase in power density would need a proportionate response from the cooling system responsible for removing the associated heat. It was predicted in the paper that: "…and at some point in the not so distant future, hardware manufacturers are going to have to consider a return to water cooling or other methods of removing heat from their boxes". The different responsibilities of the equipment manufacturer and the data centre operator were also presented: "Removing heat from the data centre is not the responsibility of the hardware manufacturer," the paper states. Manufacturers must exhaust the heat from their own products but it is then up to the data centre operator to remove that heat from both the rack and the wider room environment. The crucial element in the operational equation of the CRAC, regardless of the system deployed, is the set-point. The set-point is manually set by data centre staff and generally requires considerable analysis of the data centre environment before any adjustment is made. Typically, the set-point is configured (when the CRAC is initially installed) according to some prediction of the future cooling demand. Due to a number of factors (including the cost of consultancy) it is all too common that no regular analysis of the room"s thermal dynamics is performed (if at all). This is despite instalment of additional IT equipment (and increased work load on the existing infrastructure) throughout the lifecycle of the data centre. Clearly a very static situation exists in this case. So it is evident that as a typical data centre matures and the thermodynamics of the environment change with higher CPU loads and additional IT equipment, the cooling system should have a dynamic cooling control system to configure it for continuous maximum efficiency. Boucher et al. propose that this control system should be based on the 3 available actuators above. CRAC fan speed 3. The knowledge of each variable"s effect on data centre environment. Solution: the paper focused on how each of the actuator variables (2.1, 2.2 and 2.3 above) can affect the thermal dynamic of the data centre. Included in the findings of the study were: CRAC supply temperatures have an approximate linear relationship with rack inlet temperatures. An anomaly was identified where the magnitude of the rack inlet response to a change in CRAC supply temperature was not of the same order. Further study was suggested. Under-provisioned flow provided by the CRAC fans affects the Supply Heat Index (SHI*) but overprovisioning has a negligible effect. SHI is a non-dimensional measure of the local magnitude of hot and cold air mixing. Slower air flow rates cause an increase in SHI (more mixing) whereas faster air flow rates have little or no effect. A Dynamic Control System for Energy Efficient Cooling of Data Centres August 15, 2011 11 *SHI is also referred to as Heat Density Factor (HDF). The metric is based on the principle of a thermal multiplier θ i which was formulated by Sharma et al. Increasing density involves replacing low-density racks with high-density blade servers and has been the chosen alternative to purchasing (or renting) additional space for most data centres in recent years. New enterprise and colocation data centres also implement the strategy to maximise the available space. Densification leads to increased power dissipation and corresponding heat flux within the data centre environment. A typical cooling system performs two types of work: 1. Thermodynamic -removes the heat dissipated by the IT equipment 2. Airflow -moves the air through the data centre and systems The metric chosen by Shah et al. for evaluation in this case is the "grand" Coefficient of Performance (COP G ) which is a development of the original COP metric suggested by Patel et al. In order to calculate the COP G of the model used for the test case each component of the cooling system needed to be evaluated separately, before applying each result to the overall system. Difficulties arose where system-level data was either simply unavailable or, due to high heterogeneity, impossible to infer. However, the model was generic enough that it could be applied to the variety of cooling systems currently being used by "real world" data centres. The assumption that increased density leads to less efficiency in the cooling system is incorrect. If elements of the cooling system were previously running at low loads they would 13 typically have been operating at sub-optimal efficiency levels. Increasing the load on a cooling system may in fact increase its overall efficiency through improved operational efficiencies in one or more of its subsystems. 94 existing low-density racks were replaced with high-density Hewlett Packard (HP) blades. The heat load increased from 1.9MW to 4.7MW. The new load was still within the acceptable range for the existing cooling infrastructure. No modifications to the ensemble were required. Upon analysis of the results, COP G was found to have increased by 15%. This was, in part, achieved with improved efficiencies in the compressor system of the CRACs. While it is acknowledged that there is a crossover point at which compressors become less efficient, the increase in heat flux of the test model resulted in raising the work of the compressor to a point somewhere below this crossover. The improvement in compressor efficiency was attributed to the higher density HP blade servers operating at a higher ΔT (reduced flow rates) across the rack. The burden on the cooling ensemble was reduced -resulting in a higher COP G . With the largest individual power consumption (about 40% in this case) typically coming from the CRAC -which contains the compressor -it makes sense to direct an intelligent analysis of potential operational efficiencies at that particular part of the system. The paper states that: "The continuously changing nature of the heat load distribution in the room makes optimization of the layout challenging; therefore, to compensate for recirculation effects, the CRAC units may be required to operate at higher speeds and lower supply temperature than necessary. Utilization of a dynamically coupled thermal solution, which modulates the CRAC operating points based on sensed heat load, can help reduce this load". In this paper Shah et al. present a model for performing evaluation of the cooling ensemble using COP G , filling the gap of knowledge through detailed experimentation with measurements across the entire system. They conclude that energy efficiencies are possible via increased COP in one or more of the cooling infrastructure components. Where thermal management strategies capable of handling increased density are in place, there is significant motivation to increase density without any adverse impact on energy efficiency.