Results 1 - 10
of
12
Quantifying the performance of garbage collection vs. explicit memory management
- in: Proc. ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA
, 2005
"... Garbage collection yields numerous software engineering benefits, but its quantitative impact on performance remains elusive. One can compare the cost of conservative garbage collection to explicit memory management in C/C++ programs by linking in an appropriate collector. This kind of direct compar ..."
Abstract
-
Cited by 31 (5 self)
- Add to MetaCart
Garbage collection yields numerous software engineering benefits, but its quantitative impact on performance remains elusive. One can compare the cost of conservative garbage collection to explicit memory management in C/C++ programs by linking in an appropriate collector. This kind of direct comparison is not possible for languages designed for garbage collection (e.g., Java), because programs in these languages naturally do not contain calls to free. Thus, the actual gap between the time and space performance of explicit memory management and precise, copying garbage collection remains unknown. We introduce a novel experimental methodology that lets us quantify the performance of precise garbage collection versus explicit memory management. Our system allows us to treat unaltered Java programs as if they used explicit memory management by relying
Multi-terminal network
- Operations Research
, 1961
"... During recent years, microprocessor energy consumption has been surging and efforts to reduce power and energy have received a lot of attention. At the same time, virtual execution environments (VEEs), such as Java virtual machines, have grown in popularity. Hence, it is important to evaluate the im ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
During recent years, microprocessor energy consumption has been surging and efforts to reduce power and energy have received a lot of attention. At the same time, virtual execution environments (VEEs), such as Java virtual machines, have grown in popularity. Hence, it is important to evaluate the impact of virtual execution environments on microprocessor energy consumption. This paper characterizes the energy and power impact of two important components of VEEs, Just-in-time (JIT) optimization and garbage collection. We find that by reducing instruction counts, JIT optimization significantly reduces energy consumption, while garbage collection incurs runtime overhead that consumes more energy. Importantly, both JIT optimization and garbage collection decrease the average power dissipated by a program. Detailed analysis reveals that both JIT optimizer and JIT optimized code
DSSWattch: Power estimations in Dynamic SimpleScalar
, 2004
"... DSSWattch is a powerful tool that allows users of the Dynamic SimpleScalar (DSS) toolset to obtain cycle-accurate power estimates in a detailed out-of-order simulation environment. DSSWattch is an adaptation of Wattch, originaly for SimpleScalar on the Alpha and PISA architectures, to DSS for the Po ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
DSSWattch is a powerful tool that allows users of the Dynamic SimpleScalar (DSS) toolset to obtain cycle-accurate power estimates in a detailed out-of-order simulation environment. DSSWattch is an adaptation of Wattch, originaly for SimpleScalar on the Alpha and PISA architectures, to DSS for the PowerPC architecture. 1
Bosschere. Function level parallelism driven by data dependencies
- In Workshop on Design, Architecture and Simulation of Chip MultiProcessors
, 2006
"... With the rise of Chip multiprocessors (CMPs), the amount of parallel computing power will increase significantly in the near future. However, most programs are sequential in nature and have not been explicitly parallelized, so they cannot exploit these parallel resources. Automatic parallelization o ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
With the rise of Chip multiprocessors (CMPs), the amount of parallel computing power will increase significantly in the near future. However, most programs are sequential in nature and have not been explicitly parallelized, so they cannot exploit these parallel resources. Automatic parallelization of sequential, non-regular codes is very hard, as illustrated by the lack of solutions after more than 30 years of research on the topic. The question remains if there is parallelism in sequential programs that can be detected automatically and if so, how much parallelism there is. In this paper, we propose a framework for extracting potential parallelism from programs. Applying this framework to sequential programs can teach us how much parallelism is present in a program, but also tells us what the most appropriate parallel construct for a program is, e.g. a pipeline, master/slave work distribution, etc. Our framework is profile-based, implying that it is not safe. It builds two new graph representations of the profile-data: the interprocedural data flow graph and the data sharing graph. This graphs show the data-flow between functions and the data structures facilitating this data-flow, respectively. We apply our framework on the SPECcpu2000 bzip2 benchmark, achieving a speedup of 3.74 of the compression part and a global speedup of 2.45 on a quad processor system. 1
Efficient Adaptation of Multiple Microprocessor Resources for Energy Reduction Using Dynamic Optimization
, 2005
"... The Dissertation Committee for Shiwen Hu Certifies that this is the approved version of the following dissertation: ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The Dissertation Committee for Shiwen Hu Certifies that this is the approved version of the following dissertation:
HeDGE: Hybrid Dataflow Graph Execution in the Issue Logic
"... Abstract. Exposing more instruction-level parallelism in out-of-order superscalar processors requires increasing the number of dynamic in-flight instructions. However, large instruction windows increase power consumption and latency in the issue logic. We propose a design called Hybrid Dataflow Grap ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. Exposing more instruction-level parallelism in out-of-order superscalar processors requires increasing the number of dynamic in-flight instructions. However, large instruction windows increase power consumption and latency in the issue logic. We propose a design called Hybrid Dataflow Graph Execution (HeDGE) for conventional Instruction Set Architectures (ISAs). HeDGE explicitly maintains dependences between instructions in the issue window by modifying the issue, register renaming, and wakeup logic. The HeDGE wakeup logic notifies only consumer instructions when data values arrive. Explicit consumer encoding naturally leads to the use of Random Access Memory (RAM) instead of Content Addressable Memory (CAM) needed for broadcast. HeDGE is distinguished from prior approaches in part because it dynamically inserts forwarding instructions. Although these additional instructions degrade performance by an average of 3 to 17 % for SPEC C and Fortran benchmarks and 1.5 % to 8 % for DaCapo Java benchmarks, they enable energy efficient execution in large instruction windows. The HeDGE RAM-based instruction window consumes on average 98 % less energy than a conventional CAM as modeled in CACTI for 70nm technology. In conventional designs, this structure contributes 7 to 20 % to total energy consumption. HeDGE allows us to achieve power and energy gains by using RAMs in the issue logic while maintaining a conventional instruction set. 1
The Yin and Yang of Power and Performance for Asymmetric Hardware and Managed Software
"... Abstract—On the hardware side, asymmetric multicore processors present software with the challenge and opportunity of optimizing in two dimensions: performance and power. Asymmetric multicore processors (AMP) combine general-purpose big (fast, high power) cores and small (slow, low power) cores to m ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—On the hardware side, asymmetric multicore processors present software with the challenge and opportunity of optimizing in two dimensions: performance and power. Asymmetric multicore processors (AMP) combine general-purpose big (fast, high power) cores and small (slow, low power) cores to meet power constraints. Realizing their energy efficiency opportunity requires workloads with differentiated performance and power characteristics. On the software side, managed workloads written in languages such as C#, Java, JavaScript, and PHP are ubiquitous. Managed languages abstract over hardware using Virtual Machine (VM) services (garbage collection, interpretation, and/or justin-time compilation) that together impose substantial energy and performance costs, ranging from 10 % to over 80%. We show that these services manifest a differentiated performance and power workload. To differing degrees, they are parallel, asynchronous, communicate infrequently, and are not on the application’s critical path. We identify a synergy between AMP and VM services that we exploit to attack the 40 % average energy overhead due to VM services. Using measurements and very conservative models, we show that adding small cores tailored for VM services should deliver, at least, improvements in performance of 13%, energy of 7%, and performance per energy of 22%. The yin of VM services is overhead, but it meets the yang of small cores on an AMP. The yin of AMP is exposed hardware complexity, but it meets the yang of abstraction in managed languages. VM services fulfill the AMP requirement for an asynchronous, non-critical, differentiated, parallel, and ubiquitous workload to deliver energy efficiency. Generalizing this approach beyond system software to applications will require substantially more software and hardware investment, but these results show the potential energy efficiency gains are significant. I.
Quantifying and Improving the Performance of Garbage Collection
, 2006
"... Computer Science To Sarah for reminding me of everything I can do and to Shoshanna for inspiring me to do more. ACKNOWLEDGMENTS I am most grateful to my advisor, Emery Berger, for everything he has done throughout this thesis. I appreciate his guidance, suggestions, and inspiration. I feel especiall ..."
Abstract
- Add to MetaCart
Computer Science To Sarah for reminding me of everything I can do and to Shoshanna for inspiring me to do more. ACKNOWLEDGMENTS I am most grateful to my advisor, Emery Berger, for everything he has done throughout this thesis. I appreciate his guidance, suggestions, and inspiration. I feel especially fortu-nate for the patience he has shown with me throughout all the twists and turns my life took getting through this dissertation. I must also thank Eliot Moss and Kathryn McKinley for their leadership and support. I will be forever grateful that they took a chance on a student with a less-than-stellar aca-demic record and provided me with a fertile, inspiring research environment. They are both very knowledgeable and I benefited from our discussions in a myriad of ways. They have also served as members of my committee and I appreciate their helpful comments and sug-gestions. Thanks also to Scott Kaplan, another member of my committee, for his advice and feedback.
Fast and Efficient Partial Code Reordering: Taking Advantage of Dynamic Recompilation
"... Poor instruction cache locality can degrade performance on modern architectures. For example, our simulation results show that eliminating all instruction cache misses improves performance by as much as 16 % for a modestly sized instruction cache. In this paper, we show how to take advantage of dyna ..."
Abstract
- Add to MetaCart
Poor instruction cache locality can degrade performance on modern architectures. For example, our simulation results show that eliminating all instruction cache misses improves performance by as much as 16 % for a modestly sized instruction cache. In this paper, we show how to take advantage of dynamic code generation in a Java Virtual Machine (VM) to improve instruction locality at run-time. We develop a dynamic code reordering (DCR) system; a low overhead, online approach for improving instruction locality. DCR has three optimizations: (1) Interprocedural method separation; (2) Intraprocedural code splitting; and (3) Code padding. DCR uses the dynamic call graph and an edge profile that most VMs already collect to separate hot/cold methods and hot/cold code within a method. It also puts padding between methods to minimize conflict misses between frequent caller/callee pairs. It incrementally performs these optimizations only when the VM is optimizing a method at a higher level. We implement DCR in Jikes RVM and show its overhead is negligible. Extensive simulation and run-time experiments show that a simple code space improves average performance on a Pentium 4 by around 6 % on SPEC and DaCapo Java benchmarks. These programs however have very small instruction cache footprints that limit opportunities for DCR to improve performance. Consequently, DCR optimizations on average show little effect, sometimes degrading performance and occasionally improving performance by up to 5%. Our work shows that the VM has the potential to dynamically improve instruction locality incrementally by simply piggybacking on hotspot recompilation.
Trace Generation
, 2006
"... iii Acknowledgments Special thanks to my advisor, Darko Stefanović, for lots of help, guidance, ideas, and support. Thanks to Steve Blackburn, Matthew Hertz, and Hajime Inoue for giving help and correspondence during this project’s development. Thanks to my committee members Patrick Bridges, Hajime ..."
Abstract
- Add to MetaCart
iii Acknowledgments Special thanks to my advisor, Darko Stefanović, for lots of help, guidance, ideas, and support. Thanks to Steve Blackburn, Matthew Hertz, and Hajime Inoue for giving help and correspondence during this project’s development. Thanks to my committee members Patrick Bridges, Hajime Inoue, and Darko Stefanović. Thanks to my friends at Sandia National Labs, especially Rich, for giving support and donating the computational power needed to complete this project. Thanks to Mom and Dad for their support and helping me proof-read this document. This material is based upon work supported by the National Science Foundation (grants CCR-0085792, CCF-0238027, CCF-0540600). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the sponsors. iv A Platform for Research into Object-Level

