Results 1 - 10
of
35
Techniques for multicore thermal management: Classification and new exploration
- In ISCA 2006
"... Power density continues to increase exponentially with each new technology generation, posing a major challenge for thermal management in modern processors. Much past work has examined microarchitectural policies for reducing total chip power, but these techniques alone are insufficient if not aimed ..."
Abstract
-
Cited by 48 (3 self)
- Add to MetaCart
Power density continues to increase exponentially with each new technology generation, posing a major challenge for thermal management in modern processors. Much past work has examined microarchitectural policies for reducing total chip power, but these techniques alone are insufficient if not aimed at mitigating individual hotspots. The industry’s current trend has been toward multicore architectures, which provide additional opportunities for dynamic thermal management. This paper explores various thermal management techniques that exploit the distributed nature of multicore processors. We classify these techniques in terms of core throttling policy, whether that policy is applied locally to a core or to the processor as a whole, and process migration policies. We use Turandot and a HotSpot-based thermal simulator to simulate a variety of workloads under thermal duress on a 4-core PowerPC TM processor. Using benchmarks from the SPEC 2000 suite we characterize workloads in terms of instruction throughput as well as their effective duty cycles. Among a variety of options we find that distributed controltheoretic DVFS alone improves throughput by 2.5X under our test conditions. Our final design involves a PI-based core thermal controller and an outer control loop to decide process migrations. This policy avoids all thermal emergencies and yields an average of 2.6X speedup over the baseline across all workloads. 1.
Multi-terminal network
- Operations Research
, 1961
"... During recent years, microprocessor energy consumption has been surging and efforts to reduce power and energy have received a lot of attention. At the same time, virtual execution environments (VEEs), such as Java virtual machines, have grown in popularity. Hence, it is important to evaluate the im ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
During recent years, microprocessor energy consumption has been surging and efforts to reduce power and energy have received a lot of attention. At the same time, virtual execution environments (VEEs), such as Java virtual machines, have grown in popularity. Hence, it is important to evaluate the impact of virtual execution environments on microprocessor energy consumption. This paper characterizes the energy and power impact of two important components of VEEs, Just-in-time (JIT) optimization and garbage collection. We find that by reducing instruction counts, JIT optimization significantly reduces energy consumption, while garbage collection incurs runtime overhead that consumes more energy. Importantly, both JIT optimization and garbage collection decrease the average power dissipated by a program. Detailed analysis reveals that both JIT optimizer and JIT optimized code
PICSEL: Measuring User-Perceived Performance to Control Dynamic Frequency Scaling
"... The ultimate goal of a computer system is to satisfy its users. The success of architectural or system-level optimizations depends largely on having accurate metrics for user satisfaction. We propose to derive such metrics from information that is “close to flesh ” and apparent to the user rather th ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
The ultimate goal of a computer system is to satisfy its users. The success of architectural or system-level optimizations depends largely on having accurate metrics for user satisfaction. We propose to derive such metrics from information that is “close to flesh ” and apparent to the user rather than from information that is “close to metal ” and hidden from the user. We describe and evaluate PICSEL, a dynamic voltage and frequency scaling (DVFS) technique that uses measurements of variations in the rate of change of a computer’s video output to estimate user-perceived performance. Our adaptive algorithms, one conservative and one aggressive, use these estimates to dramatically reduce operating frequencies and voltages for graphically-intensive applications while maintaining performance at a satisfactory level for the user. We evaluate PICSEL through user studies conducted on a Pentium M laptop running Windows XP. Experiments performed with 20 users executing three applications indicate that the measured laptop power can be reduced by up to 12.1%, averaged across all of our users and applications, compared to the default Windows XP DVFS policy. User studies revealed that the difference in overall user satisfaction between the more aggressive version of PICSEL and Windows DVFS were statistically insignificant, whereas the conservative version of PICSEL actually improved user satisfaction when compared to Windows DVFS.
M.: Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management
- In: Proceedings of the 39th International Symposium on Microarchitecture (MICRO-39
, 2006
"... Computer architecture has experienced a major paradigm shift from focusing only on raw performance to considering power-performance efficiency as the defining factor of the emerging systems. Along with this shift has come increased interest in workload characterization. This interest fuels two close ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Computer architecture has experienced a major paradigm shift from focusing only on raw performance to considering power-performance efficiency as the defining factor of the emerging systems. Along with this shift has come increased interest in workload characterization. This interest fuels two closely related areas of research. First, various studies explore the properties of workload variations and develop methods to identify and track different execution behavior, commonly referred to as “phase analysis”. Second, a large complementary set of research studies dynamic, on-the-fly system management techniques that can adaptively respond to these differences in application behavior. Both of these lines of work have produced very interesting and widely useful results. Thus far, however, there exists only a weak link between these conceptually related areas, especially for real-system studies. Our work aims to strengthen this link by demonstrating a real-system implementation of a runtime phase predictor that works cooperatively with on-the-fly dynamic management. We describe a fully-functional deployed system that performs accurate phase predictions on running applications. The key insight of our approach is to draw from prior branch predictor designs to create a phase history table that guides predictions. To demonstrate the value of our approach, we implement a prototype system that uses it to guide dynamic voltage and frequency scaling. Our runtime phase prediction methodology achieves above 90 % prediction accuracies for many of the experimented benchmarks. For highly variable applications, our approach can reduce mispredictions by more than 6X over commonly-used statistical approaches. Dynamic frequency and voltage scaling, when guided by our runtime phase predictor, achieves energy-delay product improvements as high as 34 % for benchmarks with non-negligible variability, on average 7 % better than previous methods and 18 % better than a baseline unmanaged system. 1
Reducing exit stub memory consumption in code caches
- In High Performance Embedded Architectures and Compilers
, 2007
"... Abstract. The interest in translation-based virtual execution environments (VEEs) is growing with the recognition of their importance in a variety of applications. However, due to constrained memory and energy resources, developing a VEE for an embedded system presents a number of challenges. In thi ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Abstract. The interest in translation-based virtual execution environments (VEEs) is growing with the recognition of their importance in a variety of applications. However, due to constrained memory and energy resources, developing a VEE for an embedded system presents a number of challenges. In this paper we focus on the VEE’s memory overhead, and in particular, the code cache. Both code traces and exit stubs are stored in a code cache. Exit stubs keep track of the branches off a trace, and we show they consume up to 66.7 % of the code cache. We present four techniques for reducing the space occupied by exit stubs, two of which assume unbounded code caches and the absence of code cache invalidations, and two without these restrictions. These techniques reduce space by 43.5 % and also improve performance by 1.5%. After applying our techniques, the percentage of space consumed by exit stubs in the resulting code cache was reduced to 41.4%. 1
Evaluating the impact of job scheduling and power management on processor lifetime for chip multiprocessors
- In Proc. of SIGMETRICS
, 2009
"... Temperature-induced reliability issues are among the major challenges for multicore architectures. Thermal hot spots and thermal cycles combine to degrade reliability. This research presents new reliability-aware job scheduling and power management approaches for chip multiprocessors. Accurate evalu ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Temperature-induced reliability issues are among the major challenges for multicore architectures. Thermal hot spots and thermal cycles combine to degrade reliability. This research presents new reliability-aware job scheduling and power management approaches for chip multiprocessors. Accurate evaluation of these policies requires a novel simulation framework that can capture architecture-level effects over tens of seconds or longer, while also capturing thermal interactions among cores resulting from dynamic scheduling policies. Using this framework and a set of new thermal management policies, this work shows that techniques that offer similar performance, energy, and even peak temperature can differ significantly in their effects on the expected processor lifetime.
Learning and leveraging the relationship between architecture-level measurements and individual user satisfaction
- In Proceedings of the 35th International Symposium on Computer Architecture
, 2008
"... The ultimate goal of computer design is to satisfy the enduser. In particular computing domains, such as interactive applications, there exists a variation in user expectations and user satisfaction relative to the performance of existing computer systems. In this work, we leverage this variation to ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
The ultimate goal of computer design is to satisfy the enduser. In particular computing domains, such as interactive applications, there exists a variation in user expectations and user satisfaction relative to the performance of existing computer systems. In this work, we leverage this variation to develop more efficient architectures that are customized to end-users. We first investigate the relationship between microarchitectural parameters and user satisfaction. Specifically, we analyze the relationship between hardware performance counter (HPC) readings and individual satisfaction levels reported by users for representative applications. Our results show that the satisfaction of the user is strongly correlated to the performance of the underlying hardware. More importantly, the results show that user satisfaction is highly user-dependent. To take advantage of these observations, we develop a framework called Individualized Dynamic Voltage and Frequency Scaling (iDVFS). We study a group of users to characterize the relationship between the HPCs and individual user satisfaction levels. Based on this analysis, we use artificial neural networks to model the function from HPCs to user satisfaction for individual users. This model is then used online to predict user satisfaction and set the frequency level accordingly. A second set of user studies demonstrates that iDVFS reduces the CPU power consumption by over 25 % in representative applications as compared to the Windows XP DVFS algorithm. 1.
User- and Process-Driven Dynamic Voltage and Frequency Scaling
"... We describe and evaluate two new, independently-applicable power reduction techniques for power management on processors that support dynamic voltage and frequency scaling (DVFS): user-driven frequency scaling (UDFS) and process-driven voltage scaling (PDVS). In PDVS, a CPU-customized profile is der ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
We describe and evaluate two new, independently-applicable power reduction techniques for power management on processors that support dynamic voltage and frequency scaling (DVFS): user-driven frequency scaling (UDFS) and process-driven voltage scaling (PDVS). In PDVS, a CPU-customized profile is derived offline that encodes the minimum voltage needed to achieve stability at each combination of CPU frequency and temperature. On a typical processor, PDVS reduces the voltage below the worst-case minimum operating voltages given in datasheets. UDFS, on the other hand, dynamically adapts CPU frequency to the individual user and the workload through direct user feedback. Our UDFS algorithms dramatically reduce typical operating frequencies and voltages while maintaining performance at a satisfactory level for each user. We evaluate our techniques independently and together through user studies conducted on a Pentium M laptop running Windows applications. We measure the overall system power and temperature reduction achieved by our methods. Combining PDVS and the best UDFS scheme reduces measured system power by 49.9 % (27.8 % PDVS, 22.1% UDFS), averaged across all our users and applications, compared to Windows XP DVFS. The average temperature of the CPU is decreased by 13.2◦C. User trace-driven simulation to evaluate the CPU only indicates average CPU dynamic power savings of 57.3 % (32.4 % PDVS, 24.9 % UDFS), with a maximum reduction of 83.4%. In a multitasking environment, the same UDFS+PDVS technique reduces the CPU dynamic power by 75.7 % on average. 1
Human-driven Optimization
, 2007
"... The optimization problems associated with adaptive and autonomic computing systems are often difficult to pose well and solve efficiently. A key challenge is that for many applications, particularly interactive applications, the user or developer is unlikely or unable to provide either the objective ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
The optimization problems associated with adaptive and autonomic computing systems are often difficult to pose well and solve efficiently. A key challenge is that for many applications, particularly interactive applications, the user or developer is unlikely or unable to provide either the objective function f, or constraints. It is a key problem encountered broadly in adaptive and autonomic computing. This dissertation argues for using human-driven optimization techniques to solve optimization problems. In particular, it consists of two core ideas. In human-driven specification, we use direct human input from users to pose specific optimization problems, namely to determine the objective function f and expose hidden constraints. Once we have a well-specified problem, we are left with the need to search for a solution in a very large solution space. In human-driven search, we use direct human input to guide the search for a good solution, a valid configuration x that optimizes f(x). My research happens in three contexts. The main context is the Virtuoso system for utility and grid computing based on virtual machines (VMs) interconnected with overlay
A Dynamic Binary Instrumentation Engine for the ARM Architecture
, 2006
"... Dynamic binary instrumentation (DBI) is a powerful technique for analyzing the runtime behavior of software. While numerous DBI frameworks have been developed for general-purpose architectures, work on DBI frameworks for embedded architectures has been fairly limited. In this paper, we describe the ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Dynamic binary instrumentation (DBI) is a powerful technique for analyzing the runtime behavior of software. While numerous DBI frameworks have been developed for general-purpose architectures, work on DBI frameworks for embedded architectures has been fairly limited. In this paper, we describe the design, implementation, and applications of the ARM version of Pin, a dynamic instrumentation system from Intel. In particular, we highlight the design decisions that are geared toward the space and processing limitations of embedded systems. Pin for ARM is publicly available and is shipped with dozens of sample plug-in instrumentation tools. It has been downloaded over 500 times since its release.

