Results 1 - 10
of
59
Modeling Program Predictability
- in 25th Annual International Symposium on Computer Architecture
, 1998
"... Basic properties of program predictability -- for both values and control -- are defined and studied. We take the view that program predictability originates at certain points during a program's execution, flows through subsequent instructions, and then ends at other points in the program. Thes ..."
Abstract
-
Cited by 47 (3 self)
- Add to MetaCart
(Show Context)
Basic properties of program predictability -- for both values and control -- are defined and studied. We take the view that program predictability originates at certain points during a program's execution, flows through subsequent instructions, and then ends at other points in the program. These key components of predictability: generation, propagation, and termination; are defined in terms of a model. The model is based on a graph derived from dynamic data dependences and a predictor.
Control-Flow Speculation through Value Prediction for Superscalar Processors
, 1998
"... In this paper, we introduce a new branch predictor that predicts the outcomes of branches by predicting the value of their inputs and performing an early computation of their results according to the predicted values. The design of a hybrid predictor comprising our branch predictor and a correlating ..."
Abstract
-
Cited by 35 (3 self)
- Add to MetaCart
In this paper, we introduce a new branch predictor that predicts the outcomes of branches by predicting the value of their inputs and performing an early computation of their results according to the predicted values. The design of a hybrid predictor comprising our branch predictor and a correlating branch predictor is presented. We also propose a new selector that chooses the most reliable prediction for each branch. This selector is based on the path followed to reach the branch. Results for immediate updates show that for a processor that already has a value prediction unit, our hybrid predictor, with a size of 4KB, achieves the same miss ratio as a conventional hybrid predictor of 64KB. The reduction in misprediction penalty is about 40% for all predictor sizes. Furthermore, if the cost of the value predictor is considered as an additional cost of the hybrid predictor, our proposal still reduces the miss ratio with respect to a conventional hybrid predictor for all different predic...
Differential FCM: Increasing Value Prediction Accuracy by Improving Table Usage Efficiency
- Seventh International Symposium on High Performance Computer Architecture
, 2001
"... Value prediction is a relatively new technique to increase the Instruction Level Parallelism (ILP) in future microprocessors. An important problem when designing a value predictor is efficiency: an accurate predictor requires huge prediction tables. This is especially the case for the finite context ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
(Show Context)
Value prediction is a relatively new technique to increase the Instruction Level Parallelism (ILP) in future microprocessors. An important problem when designing a value predictor is efficiency: an accurate predictor requires huge prediction tables. This is especially the case for the finite context method (FCM) predictor, the most accurate one. In this paper, we show that the prediction accuracy of the FCM can be greatly improved by making the FCM predict strides instead of values. This new predictor is called the differential finite context method (DFCM) predictor. The DFCM predictor outperforms a similar FCM predictor by as much as 33%, depending on the prediction table size. If we take the additional storage into account, the difference is still 15 % for realistic predictor sizes. We use several metrics to show that the key to this success is reduced aliasing in the level-2 table. We also show that the DFCM is superior to hybrid predictors based on FCM and stride predictors, since its prediction accuracy is higher than that of a hybrid one using a perfect metapredictor. 1.
Exploring Last n Value Prediction
- in International Conference on Parallel Architectures and Compilation Techniques
, 1999
"... Most load value predictors retain a large number of previously loaded values for making future predictions. In this paper we evaluate the trade-off between tall and slim versus short and wide predictors of the same total size, i.e., between retaining a few values for a large number of load instructi ..."
Abstract
-
Cited by 34 (8 self)
- Add to MetaCart
(Show Context)
Most load value predictors retain a large number of previously loaded values for making future predictions. In this paper we evaluate the trade-off between tall and slim versus short and wide predictors of the same total size, i.e., between retaining a few values for a large number of load instructions and many values for a proportionately smaller number of loads. Our results show, for example, that even modest predictors holding sixteen kilobytes of values benefit from retaining four values per load instruction when running SPECint95.
Whole Execution Traces
, 2004
"... Different types of program profiles (control flow, value, address, and dependence) have been collected and extensively studied by researchers to identify program characteristics that can then be exploited to develop more effective compilers and architectures. Due to the large amounts of profile data ..."
Abstract
-
Cited by 32 (5 self)
- Add to MetaCart
Different types of program profiles (control flow, value, address, and dependence) have been collected and extensively studied by researchers to identify program characteristics that can then be exploited to develop more effective compilers and architectures. Due to the large amounts of profile data produced by realistic program runs, most work has focused on separately collecting and compressing different types of profiles. In this paper we present a unified representation of profiles called Whole Execution Trace (WET) which includes the complete information contained in each of the above types of traces. Thus WETs provide a basis for a next generation software tool that will enable mining of program profiles to identify program characteristics that require understanding of relationships among various types of profiles. The key features of our WET representation are: WET is constructed by labeling a static program representation with profile information such that relavent and related profile information can be directly accessed by analysis algorithms as they traverse the representation; a highly effective two tier strategy is used to significantly compress the WET; and compression techniques are designed such that they do not adversely affect the ability to rapidly traverse WET for extracting subsets of information corresponding to individual profile types as well as a combination of profile types (e.g., in form of dynamic slices of WETs). Our experimentation shows that on an average execution traces resulting from execution of 647 Million statements can be stored in 331 Megabytes of storage after compression. The compression factors range from 16 to 83. Moreover the rates at which different types of profiles can be individually or simultaneously extracted are high.
Hybrid Load Value Predictors
- IEEE Transactions on Computers
, 2000
"... Microprocessors are becoming faster at such a rapid pace that memory systems cannot keep up. As a result, the relative latency of load instructions grows constantly and already impedes processor performance. Load value predictors alleviate this problem by allowing the CPU to speculatively continue p ..."
Abstract
-
Cited by 29 (6 self)
- Add to MetaCart
(Show Context)
Microprocessors are becoming faster at such a rapid pace that memory systems cannot keep up. As a result, the relative latency of load instructions grows constantly and already impedes processor performance. Load value predictors alleviate this problem by allowing the CPU to speculatively continue processing without having to wait for load instructions to complete, which can significantly improve the execution speed. While several hybrid load value predictors have been proposed and found to work well, no systematic study of such predictors has been performed to date. In this paper, we investigate the performance of all hybrids that can be built out of a register value, a last value, a stride 2-delta, a last four value, and a finite context method predictor. Our analysis shows that hybrids can deliver 25% more speedup than the best single-component predictors. Analyzing the individual components of hybrids revealed that predictors with a poor standalone performance sometimes make excel...
The VPC Trace-Compression Algorithms
- IEEE Transactions on Computers
, 2005
"... Execution traces, which are used to study and analyze program behavior, are often so large that they need to be stored in compressed form. This paper describes the design and implementation of four value prediction based compression (VPC) algorithms for traces that record the PC as well as other inf ..."
Abstract
-
Cited by 25 (3 self)
- Add to MetaCart
(Show Context)
Execution traces, which are used to study and analyze program behavior, are often so large that they need to be stored in compressed form. This paper describes the design and implementation of four value prediction based compression (VPC) algorithms for traces that record the PC as well as other information about executed instructions. VPC1 directly compresses traces using value predictors, VPC2 adds a second compression stage, and VPC3 utilizes value predictors to convert traces into streams that can be compressed better and more quickly than the original traces. VPC4 introduces further algorithmic enhancements and is automatically synthesized. Of the 55 SPECcpu2000 traces we evaluate, VPC4 compresses 36 better, decompresses 26 faster and compresses 53 faster than BZIP2, MACHE, PDATS II, SBC and SEQUITUR. It delivers the highest geometric-mean compression rate, decompression speed and compression speed because of the predictors ’ simplicity and their ability to exploit local value locality. Most other compres-sion algorithms can only exploit global value locality.
The Role of Return Value Prediction in Exploiting Speculative Method-Level Parallelism
- JOURNAL OF INSTRUCTION-LEVEL PARALLELISM
, 2003
"... This work studies the performance impact of return value prediction in a system that supports speculative method-level parallelism (SMLP). A SMLP system creates a speculative thread at each method call, allowing the method and the code from which it is called to be executed in parallel. To improv ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
This work studies the performance impact of return value prediction in a system that supports speculative method-level parallelism (SMLP). A SMLP system creates a speculative thread at each method call, allowing the method and the code from which it is called to be executed in parallel. To improve performance, the return values of methods are predicted in hardware so that no method has to wait for its sub-method to complete before continuing to execute. For Java programs, we find that two-thirds of methods have a non-void return type, and perfect return value prediction improves performance by an average of 44% compared with a system with no return value prediction. However, the performance of realistic predictors is limited by poor prediction accuracy on integer return values and undesirable update characteristics due to the SMLP environment. A Parameter Stride (PS) return value predictor is proposed to address some of the deficiencies of the standard predictors by predicting based on method arguments. Combining the PS predictor with previous predictors results in 7% speedup on average versus a system with hybrid return value prediction, and 21% speedup versus a system with no return value prediction.
Prediction Outcome History-based Confidence Estimation for Load Value Prediction
- JOURNAL OF INSTRUCTION-LEVEL PARALLELISM
, 1999
"... Load instructions occasionally incur very long latencies that can significantly affect system performance. Load value ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
Load instructions occasionally incur very long latencies that can significantly affect system performance. Load value
Automatic Generation of High-Performance Trace Compressors
"... Program execution traces are frequently used in industry and academia. Yet, most trace-compression algorithms have to be re-implemented every time the trace format is changed, which takes time, is error prone, and often results in inefficient solutions. This paper describes and evaluates TCgen, a to ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
(Show Context)
Program execution traces are frequently used in industry and academia. Yet, most trace-compression algorithms have to be re-implemented every time the trace format is changed, which takes time, is error prone, and often results in inefficient solutions. This paper describes and evaluates TCgen, a tool that automatically generates portable, customized, high-performance trace compressors. All the user has to do is provide a description of the trace format and select one or more predictors to compress the fields in the trace records. TCgen translates this specification into C source code and optimizes it for the specified trace format and predictor algorithms. On average, the generated code is faster and compresses better than the six other compression algorithms we have tested. For example, a comparison with SBC, one of the best trace-compression algorithms in the current literature, shows that TCgen’s synthesized code compresses SPECcpu2000 address traces 23 % more, decompresses them 24 % faster, and compresses them 1029 % faster.