Results 1 - 10
of
23
Simpoint 3.0: Faster and more flexible program analysis
- Journal of Instruction Level Parallelism
, 2005
"... This paper describes the new features available in the Sim-Point 3.0 release. The release provides two techniques for drastically reducing the run-time of SimPoint: faster searching to find the best clustering, and efficiently clustering large numbers of intervals. SimPoint 3.0 also provides an opti ..."
Abstract
-
Cited by 38 (2 self)
- Add to MetaCart
This paper describes the new features available in the Sim-Point 3.0 release. The release provides two techniques for drastically reducing the run-time of SimPoint: faster searching to find the best clustering, and efficiently clustering large numbers of intervals. SimPoint 3.0 also provides an option to output only the simulation points that represent the majority of execution, which can reduce simulation time without much increase in error. Finally, this release provides support for correctly clustering variable length intervals, taking into consideration the weight of each interval during clustering. This paper describes SimPoint 3.0’s new features, how to use them, and points out some common pitfalls. 1
Online phase detection algorithms
- In The International Symposium on Code Generation and Optimization
, 2006
"... Today’s virtual machines (VMs) dynamically optimize an application as it is executing, often employing optimizations that are specialized for the current execution profile. An online phase detector determines when an executing program is in a stable period of program execution (a phase) or is in tra ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
Today’s virtual machines (VMs) dynamically optimize an application as it is executing, often employing optimizations that are specialized for the current execution profile. An online phase detector determines when an executing program is in a stable period of program execution (a phase) or is in transition. A VM using an online phase detector can apply specialized optimizations during a phase or reconsider optimization decisions between phases. Unfortunately, extant approaches to detecting phase behavior rely on either offline profiling, hardware support, or are targeted toward a particular optimization. In this work, we focus on the enabling technology of online phase detection. More specifically, we contribute (a) a novel framework for online phase detection, (b) multiple instantiations of the framework that produce novel online phase detection algorithms, (c) a novel client- and machine-independent baseline methodology for evaluating the accuracy of an online phase detector, (d) a metric to compare online detectors to this baseline, and (e) a detailed empirical evaluation, using Java applications, of the accuracy of the numerous phase detectors. 1
Performance prediction based on inherent program similarity
- In PACT
, 2006
"... A key challenge in benchmarking is to predict the performance of an application of interest on a number of platforms in order to determine which platform yields the best performance. This paper proposes an approach for doing this. We measure a number of microarchitecture-independent characteristics ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
A key challenge in benchmarking is to predict the performance of an application of interest on a number of platforms in order to determine which platform yields the best performance. This paper proposes an approach for doing this. We measure a number of microarchitecture-independent characteristics from the application of interest, and relate these characteristics to the characteristics of the programs from a previously profiled benchmark suite. Based on the similarity of the application of interest with programs in the benchmark suite, we make a performance prediction of the application of interest. We propose and evaluate three approaches (normalization, principal components analysis and genetic algorithm) to transform the raw data set of microarchitecture-independent characteristics into a benchmark space in which the relative distance is a measure for the relative performance differences. We evaluate our approach using all of the SPEC CPU2000 benchmarks and real hardware performance numbers from the SPEC website. Our framework estimates per-benchmark machine ranks with a 0.89 average and a 0.80 worst case rank correlation coefficient.
Automatic logging of operation system effects to guide application-level architecture simulation
- In Proceedings of SIGMetrics/Performance 2006
, 2006
"... Modern architecture research relies heavily on applicationlevel detailed pipeline simulation. A time consuming part of building a simulator is correctly emulating the operating system effects, which is required even if the goal is to simulate just the application code, in order to achieve functional ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
Modern architecture research relies heavily on applicationlevel detailed pipeline simulation. A time consuming part of building a simulator is correctly emulating the operating system effects, which is required even if the goal is to simulate just the application code, in order to achieve functional correctness of the application’s execution. Existing applicationlevel simulators require manually hand coding the emulation of each and every possible system effect (e.g., system call, interrupt, DMA transfer) that can impact the application’s execution. Developing such an emulator for a given operating system is a tedious exercise, and it can also be costly to maintain it to support newer versions of that operating system. Furthermore, porting the emulator to a completely different operating system might involve building it all together from scratch. In this paper, we describe a tool that can automatically log operating system effects to guide architecture simulation of application code. The benefits of our approach are: (a) we do not have to build or maintain any infrastructure for emulating the operating system effects, (b) we can support simulation of more complex applications on our applicationlevel simulator, including those applications that use asynchronous interrupts, DMA transfers, etc., and (c) using the system effects logs collected by our tool, we can deterministically re-execute the application to guide architecture simulation that has reproducible results.
Characterizing Microarchitecture Soft Error Vulnerability Phase Behavior
, 2006
"... Computer systems increasingly depend on exploiting program dynamic behavior to optimize performance, power and reliability. Prior studies have shown that program execution exhibits phase behavior in both performance and power domains. Reliabilityoriented program phase behavior, however, remains larg ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
Computer systems increasingly depend on exploiting program dynamic behavior to optimize performance, power and reliability. Prior studies have shown that program execution exhibits phase behavior in both performance and power domains. Reliabilityoriented program phase behavior, however, remains largely unexplored. As semiconductor transient faults (soft errors) emerge as a critical challenge to reliable system design, characterizing program phase behavior from a reliability perspective is crucial in order to apply dynamic fault-tolerant mechanisms and to optimize performance/reliability trade-offs. In this paper, we compute run-time program vulnerability to soft errors on four microarchitecture structures (i.e. instruction window, reorder buffer, function units and wakeup table) in a high-performance out-of-order execution superscalar processor. Experimental results on the SPEC2000 benchmarks show a considerable amount of time varying behavior in reliability measurements. Our study shows that a single performance metric, such as IPC, cache miss or branch misprediction, is not a good indicator for program vulnerability. The vulnerabilities of the studied microarchitecture structures are then correlated with program code-structure and run-time events to identify vulnerability phase behavior. We observed that both program code-structure and run-time events appear promising in classifying program reliability phase behavior. Overall, performance counter based schemes achieved an average Coefficient of Variation (COV) of 3.5%, 4.5%, 4.3 % and 5.7 % on the instruction queue, reorder buffer, function units and the wakeup table, while basic block vectors offer COVs of 4.9%, 5.8%, 5.4 % and 6 % on the four studied microarchitecture structures respectively. We found that in general, tracking performance metrics performs better than tracking control flow in identifying reliability phase behavior of applications. To our knowledge, this paper is the first to characterize program reliability phase behavior at the microarchitecture level. 1.
Dynamic Phase Analysis for Cycle-Close Trace Generation
- In International Conference on Hardware/Software Codesign and System Synthesis
, 2005
"... For embedded system development, several companies provide cross-platform development tools to aid in debugging, prototyping and optimization of programs. These are full system emulation systems that can emulate the final binary to be run on the real board, its operating system and devices. Many of ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
For embedded system development, several companies provide cross-platform development tools to aid in debugging, prototyping and optimization of programs. These are full system emulation systems that can emulate the final binary to be run on the real board, its operating system and devices. Many of these emulation systems do not provide cycle level information due to the time consuming nature of cycle accurate simulation. In this paper we propose a method to provide Cycle-Close Traces of cycle-level statistics for the complete execution of the program in orders of magnitude less time than performing full cycle accurate simulation, with an average error of 3.2%. Our approach uses dynamic phase analysis to generate targeted cycle-close simulation samples. Detailed simulation results for these samples are used to produce fast cycleclose traces during a program’s execution, so the user can also watch, pause and debug the currently executing code and its corresponding architecture performance characteristics at any point during execution.
Using Machine Learning to Guide Architecture Simulation
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... An essential step in designing a new computer architecture is the careful examination of different design options. It is critical that computer architects have efficient means by which they may estimate the impact of various design options on the overall machine. This task is complicated by the f ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
An essential step in designing a new computer architecture is the careful examination of different design options. It is critical that computer architects have efficient means by which they may estimate the impact of various design options on the overall machine. This task is complicated by the fact that different programs, and even different parts of the same program, may have distinct behaviors that interact with the hardware in different ways. Researchers use very detailed simulators to estimate processor performance, which models every cycle of an executing program. Unfortunately,
Detecting recurrent phase behavior under real-system variability
- In Proceedings of the IEEE International Symposium on Workload Characterization
, 2005
"... As computer systems become ever more complex and power hungry, research on dynamic on-the-fly system management and adaptations receives increasing attention. Such research relies on recognizing and responding to patterns or phases in application execution, which has therefore become an important an ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
As computer systems become ever more complex and power hungry, research on dynamic on-the-fly system management and adaptations receives increasing attention. Such research relies on recognizing and responding to patterns or phases in application execution, which has therefore become an important and widely-studied research area. While application phase analysis has received significant attention, much of this attention thus far has focused on simulation-based studies. In these cycle-level simulations without indeterministic operating system intervention, applications display behavior that is repeatable from phase to phase and from run to run. A natural question, therefore, concerns how these phases appear in real system runs, where interrupts and time variability can influence the timing and behavior of the program. Our work examines the phase behavior of applications running on real systems. The key goals of our work are to reliably discern and recover phase behavior in the face of application variability stemming from real system effects and time sampling. We propose a set of new, “transitionbased” phase detection techniques. Our techniques can detect repeatable workload phase information from timevarying, real system measurements with less than 5 % false alarm probabilities. In comparison to previous value-based detection methods, our transition-based techniques achieve on average 6X higher recurrent phase detection efficiency under real system variability. 1
Detecting phases in parallel applications on shared memory architectures
- In International Parallel and Distributed Processing Symposium
, 2006
"... Most programs are repetitive, where similar behavior can be seen at different execution times. Algorithms have been proposed that automatically group similar portions of a program’s execution into phases, where samples of execution in the same phase have homogeneous behavior and similar resource req ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Most programs are repetitive, where similar behavior can be seen at different execution times. Algorithms have been proposed that automatically group similar portions of a program’s execution into phases, where samples of execution in the same phase have homogeneous behavior and similar resource requirements. In this paper, we examine applying these phase analysis algorithms and how to adapt them to parallel applications running on shared memory processors. Our approach relies on a separate representation of each thread’s activity. We first focus on showing its ability to identify similar intervals of execution across threads for a single run. We then show that it is effective at identifying similar behavior of a program when the number of threads is varied between runs. This can be used by developers to examine how different phases scale across different number of threads. Finally, we examine using the phase analysis to pick simulation points to guide multithreaded simulation. 1
Combining Simulation and Virtualization through Dynamic Sampling
"... The high speed and faithfulness of state–of–the–art Virtual Machines (VMs) make them the ideal front-end for a system simulation framework. However, VMs only emulate the functional behavior and just provide the minimal timing for the system to run correctly. In a simulation framework supporting the ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
The high speed and faithfulness of state–of–the–art Virtual Machines (VMs) make them the ideal front-end for a system simulation framework. However, VMs only emulate the functional behavior and just provide the minimal timing for the system to run correctly. In a simulation framework supporting the exploration of different configurations, a timing backend is still necessary to accurately determine the performance of the simulated target. As it has been extensively researched, sampling is an excellent approach for fast timing simulation. However, existing sampling mechanisms require capturing information for every instruction and memory access. Hence, coupling a standard sampling technique to a VM implies disabling most of the “tricks ” used by a VM to accelerate execution, such as the caching and linking of dynamically compiled code. Without code caching, the performance of a VM is severely impacted. In this paper we present a novel dynamic sampling mechanism that overcomes this problem and enables the use of VMs for timing simulation. By making use of the internal information collected by the VM during functional simulation, we can quickly assess important characteristics of the simulated applications (such as phase changes), and activate or deactivate the timing simulation accordingly. This allows us to run unmodified OS and applications over emulated hardware at near-native speed, yet providing a way to insert timing measurements that yield a final accuracy similar to state–of–the–art sampling methods. 1.

