Results 1 - 10
of
21
The strong correlation between code signatures and performance
- In IEEE International Symposium on Performance Analysis of Systems and Software
, 2005
"... A recent study [1] examined the use of sampled hardware counters to create sampled code signatures. This approach is attractive because sampled code signatures can be quickly gathered for any application. The conclusion of their study was that there exists a fuzzy correlation between sampled code si ..."
Abstract
-
Cited by 38 (10 self)
- Add to MetaCart
A recent study [1] examined the use of sampled hardware counters to create sampled code signatures. This approach is attractive because sampled code signatures can be quickly gathered for any application. The conclusion of their study was that there exists a fuzzy correlation between sampled code signatures and performance predictability. The paper raises the question of how much information is lost in the sampling process, and our paper focuses on examining this issue. We first focus on showing that there exists a strong correlation between code signatures and performance. We then examine the relationship between sampled and full code signatures, and how these affect performance predictability. Our results confirm that there is a fuzzy correlation found in recent work for the SPEC programs with sampled code signatures, but that a strong correlation exists with full code signatures. In addition, we propose converting the sampled instruction counts, used in the prior work, into sampled code signatures representing loop and procedure execution frequencies. These sampled loop and procedure code signatures allow phase analysis to more accurately and easily find patterns, and they correlate better with performance. 1
A Practical Method for Quickly Evaluating Program Optimizations
- In Proceedings of the International Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2005
, 2005
"... This article aims at making iterative optimization practical and usable by speeding up the evaluation of a large range of optimizations. Instead of using a full run to evaluate a single program optimization, we take advantage of periods of stable performance, called phases. For that purpose, we prop ..."
Abstract
-
Cited by 25 (9 self)
- Add to MetaCart
This article aims at making iterative optimization practical and usable by speeding up the evaluation of a large range of optimizations. Instead of using a full run to evaluate a single program optimization, we take advantage of periods of stable performance, called phases. For that purpose, we propose a low-overhead phase detection scheme geared toward fast optimization space pruning, using code instrumentation and versioning implemented in a production compiler. Our approach is driven by simplicity and practicality. We show that a simple phase detection scheme can be sufficient for optimization space pruning. We also show it is possible to search for complex optimizations at run-time without resorting to sophisticated dynamic compilation frameworks. Beyond iterative optimization, our approach also enables one to quickly design selftuned applications.
Motivation for variable length intervals and hierarchical phase behavior
- In IEEE International Symposium on Performance Analysis of Systems and Software
, 2005
"... Most programs are repetitive, where similar behavior can be seen at different execution times. Proposed algorithms automatically group similar portions of a program’s execution into phases, where the intervals in each phase have homogeneous behavior and similar resource requirements. These prior tec ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
Most programs are repetitive, where similar behavior can be seen at different execution times. Proposed algorithms automatically group similar portions of a program’s execution into phases, where the intervals in each phase have homogeneous behavior and similar resource requirements. These prior techniques focus on fixed length intervals (such as a hundred million instructions) to find phase behavior. Fixed length intervals can make a program’s periodic phase behavior difficult to find, because the fixed interval length can be out of sync with the period of the program’s actual phase behavior. In addition, a fixed interval length can only express one level of phase behavior. In this paper, we graphically show that there exists a hierarchy of phase behavior in programs and motivate the need for variable length intervals. We describe the changes applied to SimPoint to support variable length intervals. We finally conclude by providing an initial study into using variable length intervals to guide SimPoint. 1
Characterizing Microarchitecture Soft Error Vulnerability Phase Behavior
, 2006
"... Computer systems increasingly depend on exploiting program dynamic behavior to optimize performance, power and reliability. Prior studies have shown that program execution exhibits phase behavior in both performance and power domains. Reliabilityoriented program phase behavior, however, remains larg ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
Computer systems increasingly depend on exploiting program dynamic behavior to optimize performance, power and reliability. Prior studies have shown that program execution exhibits phase behavior in both performance and power domains. Reliabilityoriented program phase behavior, however, remains largely unexplored. As semiconductor transient faults (soft errors) emerge as a critical challenge to reliable system design, characterizing program phase behavior from a reliability perspective is crucial in order to apply dynamic fault-tolerant mechanisms and to optimize performance/reliability trade-offs. In this paper, we compute run-time program vulnerability to soft errors on four microarchitecture structures (i.e. instruction window, reorder buffer, function units and wakeup table) in a high-performance out-of-order execution superscalar processor. Experimental results on the SPEC2000 benchmarks show a considerable amount of time varying behavior in reliability measurements. Our study shows that a single performance metric, such as IPC, cache miss or branch misprediction, is not a good indicator for program vulnerability. The vulnerabilities of the studied microarchitecture structures are then correlated with program code-structure and run-time events to identify vulnerability phase behavior. We observed that both program code-structure and run-time events appear promising in classifying program reliability phase behavior. Overall, performance counter based schemes achieved an average Coefficient of Variation (COV) of 3.5%, 4.5%, 4.3 % and 5.7 % on the instruction queue, reorder buffer, function units and the wakeup table, while basic block vectors offer COVs of 4.9%, 5.8%, 5.4 % and 6 % on the four studied microarchitecture structures respectively. We found that in general, tracking performance metrics performs better than tracking control flow in identifying reliability phase behavior of applications. To our knowledge, this paper is the first to characterize program reliability phase behavior at the microarchitecture level. 1.
Dynamic Phase Analysis for Cycle-Close Trace Generation
- In International Conference on Hardware/Software Codesign and System Synthesis
, 2005
"... For embedded system development, several companies provide cross-platform development tools to aid in debugging, prototyping and optimization of programs. These are full system emulation systems that can emulate the final binary to be run on the real board, its operating system and devices. Many of ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
For embedded system development, several companies provide cross-platform development tools to aid in debugging, prototyping and optimization of programs. These are full system emulation systems that can emulate the final binary to be run on the real board, its operating system and devices. Many of these emulation systems do not provide cycle level information due to the time consuming nature of cycle accurate simulation. In this paper we propose a method to provide Cycle-Close Traces of cycle-level statistics for the complete execution of the program in orders of magnitude less time than performing full cycle accurate simulation, with an average error of 3.2%. Our approach uses dynamic phase analysis to generate targeted cycle-close simulation samples. Detailed simulation results for these samples are used to produce fast cycleclose traces during a program’s execution, so the user can also watch, pause and debug the currently executing code and its corresponding architecture performance characteristics at any point during execution.
Using Machine Learning to Guide Architecture Simulation
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... An essential step in designing a new computer architecture is the careful examination of different design options. It is critical that computer architects have efficient means by which they may estimate the impact of various design options on the overall machine. This task is complicated by the f ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
An essential step in designing a new computer architecture is the careful examination of different design options. It is critical that computer architects have efficient means by which they may estimate the impact of various design options on the overall machine. This task is complicated by the fact that different programs, and even different parts of the same program, may have distinct behaviors that interact with the hardware in different ways. Researchers use very detailed simulators to estimate processor performance, which models every cycle of an executing program. Unfortunately,
Detecting recurrent phase behavior under real-system variability
- In Proceedings of the IEEE International Symposium on Workload Characterization
, 2005
"... As computer systems become ever more complex and power hungry, research on dynamic on-the-fly system management and adaptations receives increasing attention. Such research relies on recognizing and responding to patterns or phases in application execution, which has therefore become an important an ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
As computer systems become ever more complex and power hungry, research on dynamic on-the-fly system management and adaptations receives increasing attention. Such research relies on recognizing and responding to patterns or phases in application execution, which has therefore become an important and widely-studied research area. While application phase analysis has received significant attention, much of this attention thus far has focused on simulation-based studies. In these cycle-level simulations without indeterministic operating system intervention, applications display behavior that is repeatable from phase to phase and from run to run. A natural question, therefore, concerns how these phases appear in real system runs, where interrupts and time variability can influence the timing and behavior of the program. Our work examines the phase behavior of applications running on real systems. The key goals of our work are to reliably discern and recover phase behavior in the face of application variability stemming from real system effects and time sampling. We propose a set of new, “transitionbased” phase detection techniques. Our techniques can detect repeatable workload phase information from timevarying, real system measurements with less than 5 % false alarm probabilities. In comparison to previous value-based detection methods, our transition-based techniques achieve on average 6X higher recurrent phase detection efficiency under real system variability. 1
M.: Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management
- In: Proceedings of the 39th International Symposium on Microarchitecture (MICRO-39
, 2006
"... Computer architecture has experienced a major paradigm shift from focusing only on raw performance to considering power-performance efficiency as the defining factor of the emerging systems. Along with this shift has come increased interest in workload characterization. This interest fuels two close ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Computer architecture has experienced a major paradigm shift from focusing only on raw performance to considering power-performance efficiency as the defining factor of the emerging systems. Along with this shift has come increased interest in workload characterization. This interest fuels two closely related areas of research. First, various studies explore the properties of workload variations and develop methods to identify and track different execution behavior, commonly referred to as “phase analysis”. Second, a large complementary set of research studies dynamic, on-the-fly system management techniques that can adaptively respond to these differences in application behavior. Both of these lines of work have produced very interesting and widely useful results. Thus far, however, there exists only a weak link between these conceptually related areas, especially for real-system studies. Our work aims to strengthen this link by demonstrating a real-system implementation of a runtime phase predictor that works cooperatively with on-the-fly dynamic management. We describe a fully-functional deployed system that performs accurate phase predictions on running applications. The key insight of our approach is to draw from prior branch predictor designs to create a phase history table that guides predictions. To demonstrate the value of our approach, we implement a prototype system that uses it to guide dynamic voltage and frequency scaling. Our runtime phase prediction methodology achieves above 90 % prediction accuracies for many of the experimented benchmarks. For highly variable applications, our approach can reduce mispredictions by more than 6X over commonly-used statistical approaches. Dynamic frequency and voltage scaling, when guided by our runtime phase predictor, achieves energy-delay product improvements as high as 34 % for benchmarks with non-negligible variability, on average 7 % better than previous methods and 18 % better than a baseline unmanaged system. 1
Detecting phases in parallel applications on shared memory architectures
- In International Parallel and Distributed Processing Symposium
, 2006
"... Most programs are repetitive, where similar behavior can be seen at different execution times. Algorithms have been proposed that automatically group similar portions of a program’s execution into phases, where samples of execution in the same phase have homogeneous behavior and similar resource req ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Most programs are repetitive, where similar behavior can be seen at different execution times. Algorithms have been proposed that automatically group similar portions of a program’s execution into phases, where samples of execution in the same phase have homogeneous behavior and similar resource requirements. In this paper, we examine applying these phase analysis algorithms and how to adapt them to parallel applications running on shared memory processors. Our approach relies on a separate representation of each thread’s activity. We first focus on showing its ability to identify similar intervals of execution across threads for a single run. We then show that it is effective at identifying similar behavior of a program when the number of threads is varied between runs. This can be used by developers to examine how different phases scale across different number of threads. Finally, we examine using the phase analysis to pick simulation points to guide multithreaded simulation. 1
Quick and practical run-time evaluation of multiple program optimizations
- Transactions on High Performance Embedded Architectures and Compilation Techniques (HiPEAC
"... Abstract. This article aims at making iterative optimization practical and usable by speeding up the evaluation of a large range of optimizations. Instead of using a full run to evaluate a single program optimization, we take advantage of periods of stable performance, called phases. For that purpos ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract. This article aims at making iterative optimization practical and usable by speeding up the evaluation of a large range of optimizations. Instead of using a full run to evaluate a single program optimization, we take advantage of periods of stable performance, called phases. For that purpose, we propose a low-overhead phase detection scheme geared toward fast optimization space pruning, using code instrumentation and versioning implemented in a production compiler. Our approach is driven by simplicity and practicality. We show that a simple phase detection scheme can be sufficient for optimization space pruning. We also show it is possible to search for complex optimizations at run-time without resorting to sophisticated dynamic compilation frameworks. Beyond iterative optimization, our approach also enables one to quickly design selftuned applications. Considering 5 representative SpecFP2000 benchmarks, our approach speeds up iterative search for the best program optimizations by a factor of 32 to 962. Phase prediction is 99.4% accurate on average, with an overhead of only 2.6%. The resulting self-tuned implementations bring an average speed-up of 1.4. 1

