Results 1 - 10
of
40
Power Aware Page Allocation
- In Architectural Support for Programming Languages and Operating Systems
, 2000
"... One of the major challenges of post-PC computing is the need to reduce energy consumption, thereby extending the lifetime of the batteries that p ower these mobile devices. Memory is a particularly important tar get for e orts to improve energy e ciency. Memory technolo gy is becoming available that ..."
Abstract
-
Cited by 121 (9 self)
- Add to MetaCart
One of the major challenges of post-PC computing is the need to reduce energy consumption, thereby extending the lifetime of the batteries that p ower these mobile devices. Memory is a particularly important tar get for e orts to improve energy e ciency. Memory technolo gy is becoming available that o ers power management featur es such as the ability to put individual chips in any one of several di erent power modes. In this paper we explor e the interaction of page plac ement with static and dynamic hardware policies to exploit these emer ginghardwar efeatur es. In p articular, we c onsider p age allo cation p olicies that ancbe employed by an informed operating system to complement the hardware power management strategies. We perform experiments using two complementary simulation envir onments: a tracedriven simulator with workload traces that are representative of mobile computing and an execution-driven simulator with a detaile d processor/memory model and a more memoryintensive set of benchmarks (SPEC2000). Our r esults make a compelling case for a cooperative hardwar e/software approach for exploiting power-aware memory, with down to as little as 45 % of the Energy Delay for the best static policy and 1 % to 20 % of the Ener gyDelay for a traditional fullpower memory. 1.
Every Joule is Precious: The Case for Revisiting Operating System Design for Energy Efficiency
, 2000
"... this paper, we propose the systematic re-examination of all aspects of operating system design and implementation from the point of view of energy efficiency rather than the more traditional OS metric of maximizing performance. In [7], we made the case for energy as a first-class OS-managed resource ..."
Abstract
-
Cited by 69 (8 self)
- Add to MetaCart
this paper, we propose the systematic re-examination of all aspects of operating system design and implementation from the point of view of energy efficiency rather than the more traditional OS metric of maximizing performance. In [7], we made the case for energy as a first-class OS-managed resource. We emphasized the benefits of higher-level control over energy usage policy and the application/OS interactions required to achieve them. This paper explores the implications that this major shift in focus can have upon the services, policies, mechanisms, and internal structure of the OS itself based on our initial experiences with rethinking system design for energy efficiency.
A Hardware-Driven Profiling Scheme for Identifying Program Hot Spots to Support Runtime Optimization
- In Proceedings of the 26th Annual International Symposium on Computer Architecture
, 1999
"... This paper presents a novel hardware-based approach for identifying, profiling, and monitoring hot spots in order to support runtime optimization of generalpurpose programs. The proposed approach consists of a set of tightly coupled hardware tables and control logic modules that are placed in the re ..."
Abstract
-
Cited by 68 (4 self)
- Add to MetaCart
This paper presents a novel hardware-based approach for identifying, profiling, and monitoring hot spots in order to support runtime optimization of generalpurpose programs. The proposed approach consists of a set of tightly coupled hardware tables and control logic modules that are placed in the retirement stage of a processor pipeline removed from the critical path. The features of the proposed design include rapid detection of program hot spots after changes in execution behavior, runtime-tunable selection criteria for hot spot detection, and negligible overhead during application execution. Experiments using several SPEC95 benchmarks, as well as several large WindowsNT applications, demonstrate the promise of the proposed design. 1 Introduction Optimizing compilers can gain significant performance benefits by performing code transformations based on a program's runtime profile. Traditionally, profiles are collected by running an instrumented version of the executable. However, bec...
Time Varying Behavior of Programs
, 1999
"... Modern architecture research relies heavily on detailed pipeline simulation. Furthermore, programs often times exhibit interesting and important time varying behavior on an extremely large scale. Very little analysis has been conducted to classify the time varying behavior of popular benchmarks usin ..."
Abstract
-
Cited by 68 (11 self)
- Add to MetaCart
Modern architecture research relies heavily on detailed pipeline simulation. Furthermore, programs often times exhibit interesting and important time varying behavior on an extremely large scale. Very little analysis has been conducted to classify the time varying behavior of popular benchmarks using detailed simulation for important architecture features. In this paper we classify the behavior of the SPEC95 benchmark suite over their course of execution correlating the behavior between IPC, branch prediction, value prediction, address prediction, cache performance, and reorder buffer occupancy. Branch prediction, cache performance, value prediction, and address prediction are currently some of the most influential architecture features driving microprocessor research, and we show important interactions and relationships between these features. In addition, we show that many programs have wildly different behavior during different parts of their execution, which makes the section of the program simulated of great importance to the relevance and correctness of a study. We show that the large scale behavior of the programs is cyclic in nature, point out the length of cyclic behavior for these programs, and suggest where to simulate to achieve results representative of the program as a whole.
Characterizing the Memory Behavior of Java Workloads: A Structured View and Opportunities for Optimizations
, 2000
"... This paper studies the memory behavior of important Java workloads used in benchmarking Java Virtual Machines (JVMs), based on instrumentation of both application and library code in a state-of-theart JVM, and provides structured information about these workloads to help guide systems' design. We be ..."
Abstract
-
Cited by 54 (3 self)
- Add to MetaCart
This paper studies the memory behavior of important Java workloads used in benchmarking Java Virtual Machines (JVMs), based on instrumentation of both application and library code in a state-of-theart JVM, and provides structured information about these workloads to help guide systems' design. We begin by characterizing the inherent memory behavior of the benchmarks, such as information on the breakup of heap accesses among different categories and on the hotness of references to fields and methods. We then provide detailed information about misses in the data TLB and caches, including the distribution of misses over different kinds of accesses and over different methods. In the process, we make interesting discoveries about TLB behavior and limitations of data prefetching schemes discussed in the literature in dealing with pointer-intensive Java codes. Throughout this paper, we develop a set of recommendations to computer architects and compiler writers on how to optimize computer systems and system software to run Java programs more efficiently. This paper also makes the first attempt to compare the characteristics of SPECjvm98 to those of a server-oriented benchmark, pBOB, and explain why the current set of SPECjvm98 benchmarks may not be adequate for a comprehensive and objective evaluation of JVMs and just-in-time (JIT) compilers. We discover that the fraction of accesses to array elements is quite significant, demonstrate that the number of "hot spots" in the benchmarks is small, and show that field reordering cannot yield significant performance gains. We also show that even a fairly large L2 data cache is not effective for many Java benchmarks. We observe that instructions used to prefetch data into the L2 data cache are often squashed because of high TLB miss ...
Memory Controller Policies for DRAM Power Management
- In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED
, 2001
"... The increasing importance of energy efficiency has produced a multitude of hardware devices with various power management features. This paper investigates memory controller policies for manipulating DRAM power states in cache-based systems. We develop an analytic model that approximates the idle ti ..."
Abstract
-
Cited by 47 (3 self)
- Add to MetaCart
The increasing importance of energy efficiency has produced a multitude of hardware devices with various power management features. This paper investigates memory controller policies for manipulating DRAM power states in cache-based systems. We develop an analytic model that approximates the idle time of DRAM chips using an exponential distribution, and validate our model against trace-driven simulations. Our results show that, for our benchmarks, the simple policy of immediately transitioning a DRAM chip to a lower power state when it becomes idle is superior to more sophisticated policies that try to predict DRAM chip idle time.
Design and Implementation of a Dynamic Optimization Framework for Windows
- In 4th ACM Workshop on Feedback-Directed and Dynamic Optimization (FDDO-4
, 2000
"... In this paper we explicitly address the challenges of building a dynamic optimization infrastructure. Our goal is to provide an engineering manual for researchers to motivate and facilitate engagement in dynamic optimization technology. The dynamic optimization framework we discuss has been implemen ..."
Abstract
-
Cited by 41 (5 self)
- Add to MetaCart
In this paper we explicitly address the challenges of building a dynamic optimization infrastructure. Our goal is to provide an engineering manual for researchers to motivate and facilitate engagement in dynamic optimization technology. The dynamic optimization framework we discuss has been implemented for the IA-32 family of architectures in a Windows environment. We describe implementation challenges specific to Windows and issues with multiple threads and cache management. We also give performance results for our implementation and indicate the overheads involved. This framework opens up opportunities for program introspection and performance enhancement as all program information is available at runtime.
Pipeline and Batch Sharing in Grid Workloads
- In Proceedings of High-Performance Distributed Computing (HPDC-12
, 2003
"... We present a study of six batch-pipelined scientific workloads that are candidates for execution on computational grids. Whereas other studies focus on the behavior of single applications, this study characterizes workloads composed of pipelines of sequential processes that use file storage for comm ..."
Abstract
-
Cited by 33 (11 self)
- Add to MetaCart
We present a study of six batch-pipelined scientific workloads that are candidates for execution on computational grids. Whereas other studies focus on the behavior of single applications, this study characterizes workloads composed of pipelines of sequential processes that use file storage for communication and also share significant data across a batch. This study includes measurements of the memory, CPU, and I/O requirements of individual components as well as analyses of I/O sharing within complete batches. We conclude with a discussion of the ramifications of these workloads for end-to-end scalability and overall system design.
Efficient, Transparent and Comprehensive Runtime Code Manipulation
, 2004
"... This thesis addresses the challenges of building a software system for general-purpose runtime code manipulation. Modern applications, with dynamically-loaded modules and dynamicallygenerated code, are assembled at runtime. While it was once feasible at compile time to observe and manipulate every i ..."
Abstract
-
Cited by 28 (1 self)
- Add to MetaCart
This thesis addresses the challenges of building a software system for general-purpose runtime code manipulation. Modern applications, with dynamically-loaded modules and dynamicallygenerated code, are assembled at runtime. While it was once feasible at compile time to observe and manipulate every instruction — which is critical for program analysis, instrumentation, trace gathering, optimization, and similar tools — it can now only be done at runtime. Existing runtime tools are successful at inserting instrumentation calls, but no general framework has been developed for fine-grained and comprehensive code observation and modification without high overheads. This thesis demonstrates the feasibility of building such a system in software. We present DynamoRIO, a fully-implemented runtime code manipulation system that supports code transformations on any part of a program, while it executes. DynamoRIO uses code caching technology to provide efficient, transparent, and comprehensive manipulation of an unmodified application running on a stock operating system and commodity hardware. DynamoRIO executes large, complex, modern applications with dynamically-loaded, generated, or even modified code. Despite the
Maintaining consistency and bounding capacity of software code caches
- Int’l. Symp. on Code Generation and Optimization
, 2005
"... Software code caches are becoming ubiquitous, in dynamic optimizers, runtime tool platforms, dynamic translators, fast simulators and emulators, and dynamic compilers. Caching frequently executed fragments of code provides significant performance boosts, reducing the overhead of translation and emul ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Software code caches are becoming ubiquitous, in dynamic optimizers, runtime tool platforms, dynamic translators, fast simulators and emulators, and dynamic compilers. Caching frequently executed fragments of code provides significant performance boosts, reducing the overhead of translation and emulation and meeting or exceeding native performance in dynamic optimizers. One disadvantage of caching, memory expansion, can sometimes be ignored when executing a single application. However, as optimizers and translators are applied more and more in production systems, the memory expansion from running multiple applications simultaneously becomes problematic. A second drawback to caching is the added requirement of maintaining consistency between the code cache and the original code. On architectures like IA-32 that do not require explicit application actions when modifying code, detecting code changes is challenging. Again, consistency can be ignored for certain sets of applications, but as caching systems scale up to executing large, modern, complex programs, consistency becomes critical. This paper presents efficient schemes for keeping a software code cache consistent and for dynamically bounding code cache size to match the current working set of the application. These schemes are evaluated in the DynamoRIO runtime code manipulation system, and operate on stock hardware in the presence of multiple threads and dynamic behavior, including dynamically-loaded, generated, and even modified code. 1

