MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

ProfileMe: Hardware Support for Instruction-Level Profiling on Out-of-Order Processors (1997) [103 citations — 3 self]

by Jeffrey Dean ,  James E. Hicks ,  Carl A. Waldspurger ,  William E. Weihl ,  George Chrysos
Add To MetaCart

Abstract:

Profile data is valuable for identifying performance bottlenecks and guiding optimizations. Periodic sampling of a processor's performance monitoring hardware is an effective, unobtrusive way to obtain detailed profiles. Unfortunately, existing hardware simply counts events, such as cache misses and branch mispredictions, and cannot accurately attribute these events to instructions, especially on out-of-order machines. We propose an alternative approach, called ProfileMe, that samples instructions. As a sampled instruction moves through the processor pipeline, a detailed record of all interesting events and pipeline stage latencies is collected. ProfileMe also support paired sampling, which captures information about the interactions between concurrent instructions, revealing information about useful concurrency and the utilization of various pipeline stages while an instruction is in flight. We describe an inexpensive hardware implementation of ProfileMe, outline a variety of software...

Citations

597 Trace Scheduling: A Technique for Global Microcode Compaction – Fisher - 1981
203 Efficient path profiling – Ball, Larus - 1996
182 Continuous profiling: where have all the cycles gone – Anderson, Berc, et al. - 1997
102 Operating system support for improving data locality on CC-NUMA compute servers – Verghese, Devine, et al. - 1996
91 Avoiding conflict misses dynamically in large direct-mapped caches – Bershad, Lee, et al. - 1994
81 Improving the accuracy of static branch prediction using branch correlation – Young, Smith - 1994
51 Informing Memory Operations: Providing Memory Performance Feedback in Modern Processors – Horowitz, Martonosi, et al. - 1996
48 Reducing TLB and Memory Overhead Using Online Superpage Promotion – Romer, Ohlrich, et al. - 1995
37 Accurate and Practical Profile-Driven Compilation Using the Profile Buffer – Conte, Menezes, et al. - 1996
35 Dynamic Page Mapping Policies for Cache Conflict Resolution on Standard Hardware – Romer, Lee, et al. - 1994
33 Predicting data cache misses in non-numeric applications through correlation profiling – Mowry, Luk - 1997
29 Using Branch Handling Hardware to Support Profile-Driven Optimization – Conte, Patel, et al. - 1994
29 Hot cold optimization of large Windows/NT applications – Cohn, Lowney - 1996
25 The Alpha 21264: A 500 MHz out-of-order execution microprocessor – Leibholz, Razdan - 1997
14 Predicting load latencies using cache profiling – Abraham, Rau - 1994
10 Hot Cold Optimization of Large Windows NT Applications – Cohn, Lowney - 1996
4 Instruction sampling instrumentation – Westcott, White - 1992