Results 1 -
4 of
4
Ubiquitous memory introspection
- In CGO ’07: Proceedings of the International Symposium on Code Generation and Optimization
, 2007
"... Modern memory systems play a critical role in the performance of applications, but a detailed understanding of the application behavior in the memory system is not trivial to attain. It requires time consuming simulations and detailed modeling of the memory hierarchy, often using long address traces ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Modern memory systems play a critical role in the performance of applications, but a detailed understanding of the application behavior in the memory system is not trivial to attain. It requires time consuming simulations and detailed modeling of the memory hierarchy, often using long address traces. It is increasingly possible to access hardware performance counters to count relevant events in the memory system, but the measurements are coarse-grained and better suited for performance summaries than providing instruction level feedback. The availability of a low cost, online, and accurate methodology for deriving finegrained memory behavior profiles can prove extremely useful for runtime analysis and optimization of programs. This paper presents a new methodology for Ubiquitous Memory Introspection (UMI). It is an online and lightweight methodology that uses fast mini-simulations to analyze short memory access traces recorded from frequently executed code regions. The simulations provide profiling results at varying granularities, down to that of a single instruction or address. UMI naturally complements runtime optimizations and enables new opportunities for online memory specific optimizations. We present a prototype runtime system implementing UMI. The prototype has an average runtime overhead of 14%. This overhead is only 1 % more than a state of the art binary instrumentation tool. We used 32 benchmarks, including the full suite of SPEC CPU2000 benchmarks, for evaluation. We show that the mini-simulations accurately reflect the cache performance of two existing memory systems, an Intel Pentium 4 and an AMD Athlon MP (K7). We also demonstrate that UMI predicts delinquent load instructions with an 88 % rate of accuracy for applications with a relatively high number of cache misses, and 61 % overall. The online profiling results are used at runtime to implement a simple software prefetching strategy that achieves an overall speedup of 64 % in the best case.
Energy consumption and garbage collection in low-powered computing
, 2002
"... We have measured the energy efficiency of different memory management strategies on a high performance pocket computer. We conducted our study by measuring the energy consumption of eight C programs with four different memory management strategies each. The memory management strategies are: no deall ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
We have measured the energy efficiency of different memory management strategies on a high performance pocket computer. We conducted our study by measuring the energy consumption of eight C programs with four different memory management strategies each. The memory management strategies are: no deallocation, explicit deallocation, conservative mark-and-sweep garbage collection, and conservative mark-and-sweep incremental garbage collection. Our measurements show that different memory management strategies have very different energy requirements. In the most extreme case, one program consumed 40 times as much energy with incremental garbage collection than with explicit deallocation. We demonstrate that, although overall energy use is strongly correlated with execution time, the processor and peripheral energies separately do not correlate well with execution time.
Effectiveness of Garbage Collection and Explicit Deallocation
, 2000
"... This paper complements our findings. In our experiments, we never saw an effect of stack accuracy, and we never saw differences in gc effectiveness for programs compiled from or interpreting other languages than C. GC-0 corresponds to our gc+shg and GC2 to our gc+hg configuration, and thus Barlett's ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper complements our findings. In our experiments, we never saw an effect of stack accuracy, and we never saw differences in gc effectiveness for programs compiled from or interpreting other languages than C. GC-0 corresponds to our gc+shg and GC2 to our gc+hg configuration, and thus Barlett's results give samples for a difference between these.

