Results 1 - 10
of
43
Characterizing and Predicting Program Behavior and its Variability
- In International Conference on Parallel Architectures and Compilation Techniques
, 2003
"... To reach the next level of performance and energy efficiency, optimizations are increasingly applied in a dynamic and adaptive manner. Current adaptive systems are typically reactive and optimize hardware or software in response to detecting a shift in program behavior. We argue that program behavio ..."
Abstract
-
Cited by 83 (3 self)
- Add to MetaCart
To reach the next level of performance and energy efficiency, optimizations are increasingly applied in a dynamic and adaptive manner. Current adaptive systems are typically reactive and optimize hardware or software in response to detecting a shift in program behavior. We argue that program behavior variability requires adaptive systems to be predictive rather than reactive. In order to be effective, systems need to adapt according to future rather than most recent past behavior. In this paper we explore the potential of incorporating prediction into adaptive systems. We study the time-varying behavior of programs using metrics derived from hardware counters on two different micro-architectures. Our evaluation shows that programs do indeed exhibit significant behavior variation even at a granularity of millions of instructions. In addition, while the actual behavior across metrics may be different, periodicity in the behavior is shared across metrics. We exploit these characteristics in the design of on-line statistical and table-based predictors. We introduce a new class of predictors, cross-metric predictors, that use one metric to predict another, thus making possible an efficient coupling of multiple predictors. We evaluate these predictors on the SPECcpu2000 benchmark suite and show that table-based predictors outperform statistical predictors by as much as 69 % on benchmarks with high variability. 1.
Dynamically managing the communication-parallelism trade-off in future clustered processors
- IN PROCEEDINGS OF INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE
, 2003
"... Clustered microarchitectures are an attractive alternative to large monolithic superscalar designs due to their potential for higher clock rates in the face of increasingly wire-delay-constrained process technologies. As increasing transistor counts allow an increase in the number of clusters, there ..."
Abstract
-
Cited by 47 (10 self)
- Add to MetaCart
Clustered microarchitectures are an attractive alternative to large monolithic superscalar designs due to their potential for higher clock rates in the face of increasingly wire-delay-constrained process technologies. As increasing transistor counts allow an increase in the number of clusters, thereby allowing more aggressive use of instructionlevel parallelism (ILP), the inter-cluster communication increases as data values get spread across a wider area. As a result of the emergence of this trade-off between communication and parallelism, a subset of the total on-chip clusters is optimal for performance. To match the hardware to the application’s needs, we use a robust algorithm to dynamically tune the clustered architecture. The algorithm, which is based on program metrics gathered at periodic intervals, achieves an 11 % performance improvement on average over the best statically defined architecture. We also show that the use of additional hardware and reconfiguration at basic block boundaries can achieve average improvements of 15%. Our results demonstrate that reconfiguration provides an effective solution to the communication and parallelism trade-off inherent in the communicationbound processors of the future.
A Dynamic Compilation Framework for Controlling Microprocessor Energy and Performance
- In MICRO 38: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
, 2005
"... Dynamic voltage and frequency scaling (DVFS) is an effective technique for controlling microprocessor energy and performance. Existing DVFS techniques are primarily based on hardware, OS timeinterrupts, or static-compiler techniques. However, substantially greater gains can be realized when control ..."
Abstract
-
Cited by 42 (4 self)
- Add to MetaCart
Dynamic voltage and frequency scaling (DVFS) is an effective technique for controlling microprocessor energy and performance. Existing DVFS techniques are primarily based on hardware, OS timeinterrupts, or static-compiler techniques. However, substantially greater gains can be realized when control opportunities are also explored in a dynamic compilation environment. There are several advantages to deploying DVFS and managing energy/performance tradeoffs through the use of a dynamic compiler. Most importantly, dynamic compiler driven DVFS is fine-grained, code-aware, and adaptive to the current microarchitecture environment. This paper presents a design framework of the run-time DVFS optimizer in a general dynamic compilation system. A prototype of the DVFS optimizer is implemented and integrated into an industrialstrength dynamic compilation system. The obtained optimization system is deployed in a real hardware platform that directly measures
The strong correlation between code signatures and performance
- In IEEE International Symposium on Performance Analysis of Systems and Software
, 2005
"... A recent study [1] examined the use of sampled hardware counters to create sampled code signatures. This approach is attractive because sampled code signatures can be quickly gathered for any application. The conclusion of their study was that there exists a fuzzy correlation between sampled code si ..."
Abstract
-
Cited by 38 (10 self)
- Add to MetaCart
A recent study [1] examined the use of sampled hardware counters to create sampled code signatures. This approach is attractive because sampled code signatures can be quickly gathered for any application. The conclusion of their study was that there exists a fuzzy correlation between sampled code signatures and performance predictability. The paper raises the question of how much information is lost in the sampling process, and our paper focuses on examining this issue. We first focus on showing that there exists a strong correlation between code signatures and performance. We then examine the relationship between sampled and full code signatures, and how these affect performance predictability. Our results confirm that there is a fuzzy correlation found in recent work for the SPEC programs with sampled code signatures, but that a strong correlation exists with full code signatures. In addition, we propose converting the sampled instruction counts, used in the prior work, into sampled code signatures representing loop and procedure execution frequencies. These sampled loop and procedure code signatures allow phase analysis to more accurately and easily find patterns, and they correlate better with performance. 1
Transition phase classification and prediction
- In 11th International Symposium on High Performance Computer Architecture
, 2005
"... Most programs are repetitive, where similar behavior can be seen at different execution times. Proposed on-line systems automatically group these similar intervals of execution into phases, where the intervals in a phase have homogeneous behavior and similar resource requirements. These systems are ..."
Abstract
-
Cited by 34 (7 self)
- Add to MetaCart
Most programs are repetitive, where similar behavior can be seen at different execution times. Proposed on-line systems automatically group these similar intervals of execution into phases, where the intervals in a phase have homogeneous behavior and similar resource requirements. These systems are driven by algorithms that dynamically classify intervals of execution into phases and predict phase changes. In this paper, we examine several improvements to dynamic phase classification and prediction. The first improvement is to appropriately deal with phase transitions. This modification identifies phase transitions for what they are, instead of classifying them into a new phase, which increases phase prediction accuracy. We also describe an adaptive system that dynamically adjusts classification thresholds and splits phases with poor homogeneity. This modification increase the homogeneity of the hardware metrics across the intervals in each phase. We improve phase prediction accuracy by applying confidence to phase prediction, and we develop architectures that can accurately predict the outcome of the next phase change, and the length of the next phase. 1
EXPERT: Expedited Simulation Exploiting Program Behavior Repetition
, 2004
"... Studying program behavior is a central component in architectural designs. In this paper, we study and exploit one aspect of program behavior, the behavior repetition, to expedite simulation. Detailed ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
Studying program behavior is a central component in architectural designs. In this paper, we study and exploit one aspect of program behavior, the behavior repetition, to expedite simulation. Detailed
Motivation for variable length intervals and hierarchical phase behavior
- In IEEE International Symposium on Performance Analysis of Systems and Software
, 2005
"... Most programs are repetitive, where similar behavior can be seen at different execution times. Proposed algorithms automatically group similar portions of a program’s execution into phases, where the intervals in each phase have homogeneous behavior and similar resource requirements. These prior tec ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
Most programs are repetitive, where similar behavior can be seen at different execution times. Proposed algorithms automatically group similar portions of a program’s execution into phases, where the intervals in each phase have homogeneous behavior and similar resource requirements. These prior techniques focus on fixed length intervals (such as a hundred million instructions) to find phase behavior. Fixed length intervals can make a program’s periodic phase behavior difficult to find, because the fixed interval length can be out of sync with the period of the program’s actual phase behavior. In addition, a fixed interval length can only express one level of phase behavior. In this paper, we graphically show that there exists a hierarchy of phase behavior in programs and motivate the need for variable length intervals. We describe the changes applied to SimPoint to support variable length intervals. We finally conclude by providing an initial study into using variable length intervals to guide SimPoint. 1
The Thrifty Barrier: Energy-aware synchronization in shared-memory multiprocessors
- In International Symposium on High-Performance Computer Architecture
, 2004
"... Much research has been devoted to making microprocessors energy-efficient. However, little attention has been paid to multiprocessor environments where, due to the co-operative nature of the computation, the most energy-efficient execution in each processor may not translate into the most energyeffi ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
Much research has been devoted to making microprocessors energy-efficient. However, little attention has been paid to multiprocessor environments where, due to the co-operative nature of the computation, the most energy-efficient execution in each processor may not translate into the most energyefficient overall execution. We present the thrifty barrier, a hardware-software approach to saving energy in parallel applications that exhibit barrier synchronization imbalance. Threads that arrive early to a thrifty barrier pick among existing low-power processor sleep states based on predicted barrier stall time and other factors. We leverage the coherence protocol and propose small hardware extensions to achieve timely wake-up of these dormant threads, maximizing energy savings while minimizing the impact on performance. 1
A Case for Run-time Adaptation in Packet Processing Systems
- ACM SIGCOMM COMPUTER COMMUNICATION REVIEW
, 2003
"... Most packet processing applications receive and process multiple types of packets. Today, the processors available within packet processing systems are allocated to packet types at design time. In this paper, we explore the benefits and challenges of adapting allocations of processors to packet type ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
Most packet processing applications receive and process multiple types of packets. Today, the processors available within packet processing systems are allocated to packet types at design time. In this paper, we explore the benefits and challenges of adapting allocations of processors to packet types in packet processing systems. We demonstrate that, for all the applications and traces considered, run-time adaptation can reduce energy consumption by 70-80% and processor provisioning level by 40-50%. The adaptation benefits are maximized if processor allocations can be adapted at fine timescales and if the total available processing power can be allocated to packet types in small granularities. We show that, of these two factors, allocating processing power to packet types in small granularity is more important---if the allocation granularity is large, then even a very fine adaptation time-scale yields meager benefits.
Power reduction techniques for microprocessor systems
- ACM Computing Surveys
, 2005
"... Power consumption is a major factor that limits the performance of computers. We survey the “state of the art ” in techniques that reduce the total power consumed by a microprocessor system over time. These techniques are applied at various levels ranging from circuits to architectures, architecture ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Power consumption is a major factor that limits the performance of computers. We survey the “state of the art ” in techniques that reduce the total power consumed by a microprocessor system over time. These techniques are applied at various levels ranging from circuits to architectures, architectures to system software, and system

