Results 1 - 10
of
167
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction
, 2003
"... This paper proposes and evaluates single-ISA heterogeneous multi-core architectures as a mechanism to reduce processor power dissipation. Our design incorporates heterogeneous cores representing different points in the power/performance design space; during an application 's execution, system s ..."
Abstract
-
Cited by 349 (22 self)
- Add to MetaCart
(Show Context)
This paper proposes and evaluates single-ISA heterogeneous multi-core architectures as a mechanism to reduce processor power dissipation. Our design incorporates heterogeneous cores representing different points in the power/performance design space; during an application 's execution, system software dynamically chooses the most appropriate core to meet specific performance and power requirements.
An analysis of efficient multi-core global power management policies: Maximizing performance for a given power budget.
- In Proc. of MICRO,
, 2006
"... ..."
(Show Context)
Heat-and-run: leveraging SMT and CMP to manage power density through the operating system
- In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems
, 2004
"... Power density in high-performance processors continues to increase with technology generations as scaling of current, clock speed, and device density outpaces the downscaling of supply voltage and thermal ability of packages to dissipate heat. Power density is characterized by localized chip hot spo ..."
Abstract
-
Cited by 154 (0 self)
- Add to MetaCart
(Show Context)
Power density in high-performance processors continues to increase with technology generations as scaling of current, clock speed, and device density outpaces the downscaling of supply voltage and thermal ability of packages to dissipate heat. Power density is characterized by localized chip hot spots that can reach critical temperatures and cause failure. Previous architectural approaches to power density have used global clock gating, fetch toggling, dynamic frequency scaling, or resource duplication to either prevent heating or relieve overheated resources in a superscalar processor. Previous approaches also evaluate design technologies where power density is not a major problem and most applications do not overheat the processor. Future processors, however, are likely to be chip multiprocessors (CMPs) with simultaneously-multithreaded (SMT) cores. SMT CMPs pose unique challenges and opportunities for power density. SMT and CMP increase throughput and thus on-chip heat, but also provide natural granularities for managing power-density. This paper is the first work to leverage SMT and CMP to address power density. We propose heat-and-run SMT thread assignment to increase processor-resource utilization before cooling becomes necessary by co-scheduling threads that use complementary resources. We propose heat-and-run CMP thread migration to migrate threads away from overheated cores and assign them to free SMT contexts on alternate cores, leveraging availability of SMT contexts on alternate CMP cores to maintain throughput while allowing overheated cores to cool. We show that our proposal has an average of 9 % and up to 34 % higher throughput than a previous superscalar technique running the same number of threads.
Techniques for multicore thermal management: Classification and new exploration
- In ISCA 2006
"... Power density continues to increase exponentially with each new technology generation, posing a major challenge for thermal management in modern processors. Much past work has examined microarchitectural policies for reducing total chip power, but these techniques alone are insufficient if not aimed ..."
Abstract
-
Cited by 146 (3 self)
- Add to MetaCart
(Show Context)
Power density continues to increase exponentially with each new technology generation, posing a major challenge for thermal management in modern processors. Much past work has examined microarchitectural policies for reducing total chip power, but these techniques alone are insufficient if not aimed at mitigating individual hotspots. The industry’s current trend has been toward multicore architectures, which provide additional opportunities for dynamic thermal management. This paper explores various thermal management techniques that exploit the distributed nature of multicore processors. We classify these techniques in terms of core throttling policy, whether that policy is applied locally to a core or to the processor as a whole, and process migration policies. We use Turandot and a HotSpot-based thermal simulator to simulate a variety of workloads under thermal duress on a 4-core PowerPC TM processor. Using benchmarks from the SPEC 2000 suite we characterize workloads in terms of instruction throughput as well as their effective duty cycles. Among a variety of options we find that distributed controltheoretic DVFS alone improves throughput by 2.5X under our test conditions. Our final design involves a PI-based core thermal controller and an outer control loop to decide process migrations. This policy avoids all thermal emergencies and yields an average of 2.6X speedup over the baseline across all workloads. 1.
Energy aware consolidation for Cloud computing
- in: Proceedings of the 2008 conference on Power aware computing and systems
, 2008
"... Consolidation of applications in cloud computing envi-ronments presents a significant opportunity for energy optimization. As a first step toward enabling energy effi-cient consolidation, we study the inter-relationships be-tween energy consumption, resource utilization, and per-formance of consolid ..."
Abstract
-
Cited by 113 (2 self)
- Add to MetaCart
(Show Context)
Consolidation of applications in cloud computing envi-ronments presents a significant opportunity for energy optimization. As a first step toward enabling energy effi-cient consolidation, we study the inter-relationships be-tween energy consumption, resource utilization, and per-formance of consolidated workloads. The study reveals the energy performance trade-offs for consolidation and shows that optimal operating points exist. We model the consolidation problem as a modified bin packing prob-lem and illustrate it with an example. Finally, we outline the challenges in finding effective solutions to the con-solidation problem. 1
Temperature and supply voltage aware performance and power modeling at microarchitecture level
- IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems
, 2005
"... Abstract—Performance and power are two primary design issues for systems ranging from server computers to handhelds. Performance is affected by both temperature and supply voltage because of the temperature and voltage dependence of circuit delay. Furthermore, as semiconductor technology scales down ..."
Abstract
-
Cited by 75 (5 self)
- Add to MetaCart
Abstract—Performance and power are two primary design issues for systems ranging from server computers to handhelds. Performance is affected by both temperature and supply voltage because of the temperature and voltage dependence of circuit delay. Furthermore, as semiconductor technology scales down, leakage power’s exponential dependence on temperature and supply voltage becomes significant. Therefore, future design studies call for temperature and voltage aware performance and power modeling. In this paper, we study microarchitecture-level temperature and voltage aware performance and power modeling. We present a leakage power model with temperature and voltage scaling, and show that leakage and total energy vary by 38 % and 24%, respectively, between 65 C and 110 C. We study thermal runaway induced by the interdependence between temperature and leakage power, and demonstrate that without temperature-aware modeling, underestimation of leakage power may lead to the failure of thermal controls, and overestimation of leakage power may result in excessive performance penalties of up to 5.24%. All of these studies underscore the necessity of temperature-aware power modeling. Furthermore, we study optimal voltage scaling for best performance with dynamic power and thermal management under different packaging options. We show that dynamic power and thermal management allows designs to target at the common-case thermal scenario among benchmarks and improves performance by 6.59 % compared to designs targeted at the worst case thermal scenario without dynamic power and thermal management. Additionally, the optimal for the best performance may not be the largest allowed by the given packaging platform, and that advanced cooling techniques can improve throughput significantly. Index Terms—Floorplan, leakage power, microarchitecture, temperature, thermal management. I.
Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors
, 2008
"... Within-die process variation causes individual cores in a Chip Multiprocessor (CMP) to differ substantially in both static power consumed and maximum frequency supported. In this environment, ignoring variation effects when scheduling applications or when managing power with Dynamic Voltage and Freq ..."
Abstract
-
Cited by 72 (6 self)
- Add to MetaCart
Within-die process variation causes individual cores in a Chip Multiprocessor (CMP) to differ substantially in both static power consumed and maximum frequency supported. In this environment, ignoring variation effects when scheduling applications or when managing power with Dynamic Voltage and Frequency Scaling (DVFS) is suboptimal. This paper proposes variation-aware algorithms for application scheduling and power management. One such power management algorithm, called LinOpt, uses linear programming to find the best voltage and frequency levels for each of the cores in the CMP — maximizing throughput at a given power budget. In a 20core CMP, the combination of variation-aware application scheduling and LinOpt increases the average throughput by 12–17 % and reduces the average ED 2 by 30–38 % — all relative to using variation-aware scheduling together with a simple extension to Intel’s Foxton power management algorithm.
ThermalEffective Clustered Microarchitectures”.
- In Proc. of the First Workshop on Temperatura Aware Computer Systems at ISCA
, 2004
"... Abstract ..."
(Show Context)
Paceline: Improving single-thread performance in nanoscale CMPs through core overclocking
- In International Conference on Parallel Architecture and Compilation Techniques
, 2007
"... Under current worst-case design practices, manufacturers specify conservative values for processor frequencies in order to guarantee correctness. To recover some of the lost performance and improve single-thread performance, this paper presents the Paceline leader-checker microarchitecture. In Pacel ..."
Abstract
-
Cited by 37 (4 self)
- Add to MetaCart
(Show Context)
Under current worst-case design practices, manufacturers specify conservative values for processor frequencies in order to guarantee correctness. To recover some of the lost performance and improve single-thread performance, this paper presents the Paceline leader-checker microarchitecture. In Paceline, a leader core runs the thread at higher-than-rated frequency, while passing execution hints and prefetches to a safely-clocked checker core in the same chip multiprocessor. The checker redundantly executes the thread faster than without the leader, while checking the results to guarantee correctness. Leader and checker cores periodically swap functionality. The result is that the thread improves performance substantially without significantly increasing the power density or the hardware design complexity of the chip. By overclocking the leader by 30%, we estimate that Paceline improves SPECint and SPECfp performance by a geometric mean of 21 % and 9%, respectively. Moreover, Paceline also provides tolerance to transient faults such as soft errors. 1
Three-dimensional chip-multiprocessor run-time thermal management
- IEEE Transactions on CAD
, 2008
"... Abstract—Three-dimensional integration has the potential to improve the communication latency and integration density of chip-level multiprocessors (CMPs). However, the stacked high-power density layers of 3-D CMPs increase the importance and difficulty of thermal management. In this paper, we inves ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Three-dimensional integration has the potential to improve the communication latency and integration density of chip-level multiprocessors (CMPs). However, the stacked high-power density layers of 3-D CMPs increase the importance and difficulty of thermal management. In this paper, we investigate the 3-D CMP run-time thermal management problem and describe efficient management techniques. This paper makes the follow-ing main contributions: 1) It identifies and describes the critical concepts required for optimal thermal management, namely the methods by which heterogeneity in both workload power char-acteristics and processor core thermal characteristics should be exploited; and 2) it proposes an efficient proactive continuously engaged hardware and operating system thermal management technique governed by optimal thermal management polices. The proposed technique is evaluated using multiprogrammed and multithreaded benchmarks in an integrated power, performance, and temperature full-system simulation environment. We find that proactive power-thermal budgeting allows a 30 % improvement in instruction throughput compared to a proactive thermal manage-ment approach that bases decisions only upon local information. The software components of the proposed thermal management technique have been implemented in the Linux 2.6.8 kernel. This source code will be publicly released. The analysis and technique developed in this paper provide a general solution for future 3-D