Results 1 - 10
of
110
Speculative Precomputation: Long-range Prefetching of Delinquent Loads
, 2001
"... This paper explores Speculative Precomputation, a technique that uses idle thread contexts in a multithreaded architecture to improve performance of single-threaded applications. It attacks program stalls from data cache misses by pre-computing future memory accesses in available thread contexts, an ..."
Abstract
-
Cited by 180 (23 self)
- Add to MetaCart
This paper explores Speculative Precomputation, a technique that uses idle thread contexts in a multithreaded architecture to improve performance of single-threaded applications. It attacks program stalls from data cache misses by pre-computing future memory accesses in available thread contexts, and prefetching these data. This technique is evaluated by simulating the performance of a research processor based on the Itanium TM ISA supporting Simultaneous Multithreading. Two primary forms of Speculative Precomputation are evaluated. If only the non-speculative thread spawns speculative threads, performance gains of up to 30% are achieved when assuming ideal hardware. However, this speedup drops considerably with more realistic hardware assumptions. Permitting speculative threads to directly spawn additional speculative threads reduces the overhead associated with spawning threads and enables significantly more aggressive speculation, overcoming this limitation. Even with realistic costs for spawning threads, speedups as high as 169% are achieved, with an average speedup of 76%. 1.
The YAGS branch prediction scheme
- In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture
, 1998
"... The importance of an accurate branch prediction mechanism has been well documented. Since the introduction of gshare [1] and the observation that aliasing in the PHT is a major factor in reducing prediction accuracy [2,3,4,5], several schemes have been proposed to reduce aliasing in the PHT [6, 7, 8 ..."
Abstract
-
Cited by 113 (0 self)
- Add to MetaCart
(Show Context)
The importance of an accurate branch prediction mechanism has been well documented. Since the introduction of gshare [1] and the observation that aliasing in the PHT is a major factor in reducing prediction accuracy [2,3,4,5], several schemes have been proposed to reduce aliasing in the PHT [6, 7, 8, 9]. All these schemes are aimed at maximizing the prediction accuracy with the fewest resources. In this paper we introduce Yet Another Global Scheme (YAGS) — a new scheme to reduce the aliasing in the PHT — that combines the strong points of several previous schemes. YAGS introduces tags into the PHT that allows it to be reduced without sacrificing key branch outcome information. The size reduction more than offsets the cost of the tags. Our experimental results show that YAGS gives better
The agree predictor: A mechanism for reducing negative branch history interference.
- In Proceedings of the 24th International Symposium on Computer architecture,
, 1997
"... ..."
Trading Conflict and Capacity Aliasing in Conditional Branch Predictors
- In Proceedings of the 24th International Symposium on Computer Architecture
, 1997
"... As modern microprocessors employ deeper pipelines and issue multiple instructions per cycle, they are becoming increasingly dependent on accurate branch prediction. Because hardware resources for branch-predictor tables are invariably limited, it is not possible to hold all relevant branch history f ..."
Abstract
-
Cited by 95 (8 self)
- Add to MetaCart
(Show Context)
As modern microprocessors employ deeper pipelines and issue multiple instructions per cycle, they are becoming increasingly dependent on accurate branch prediction. Because hardware resources for branch-predictor tables are invariably limited, it is not possible to hold all relevant branch history for all active branches at the same time, especially for large workloads consisting of multiple processes and operating-system code. The problem that results, commonly referred to as aliasing in the branch-predictor tables, is in many ways similar to the misses that occur in finite-sized hardware caches. In this paper we propose a new classification for branch aliasing based on the three-Cs model for caches, and show that conflict aliasing is a significant source of mispredictions. Unfortunately, the obvious method for removing conflicts -- adding tags and associativity to the predictor tables -- is not a cost-effective solution. To address this problem, we propose the skewed branch predict...
Analysis of Branch Prediction via Data Compression
- in Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems
, 1996
"... Branch prediction is an important mechanism in modem microprocessor design. The focus of research in this area has been on designing new branch prediction schemes. In contrast, very few studies address the theoretical basis behind these prediction schemes. Knowing this theoretical basis helps us to ..."
Abstract
-
Cited by 91 (2 self)
- Add to MetaCart
(Show Context)
Branch prediction is an important mechanism in modem microprocessor design. The focus of research in this area has been on designing new branch prediction schemes. In contrast, very few studies address the theoretical basis behind these prediction schemes. Knowing this theoretical basis helps us to evaluate how good a prediction scheme is and how much we can expect to improve its accuracy.
Efficient Procedure Mapping using Cache Line Coloring
- IN PROCEEDINGS OF THE SIGPLAN'97 CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION
, 1997
"... As the gap between memory and processor performance continues to widen, it becomes increasingly important to exploit cache memory effectively. Both hardware and software approaches can be explored to optimize cache performance. Hardware designers focus on cache organization issues, including replace ..."
Abstract
-
Cited by 81 (13 self)
- Add to MetaCart
As the gap between memory and processor performance continues to widen, it becomes increasingly important to exploit cache memory effectively. Both hardware and software approaches can be explored to optimize cache performance. Hardware designers focus on cache organization issues, including replacement policy, associativity, line size and the resulting cache access time. Software writers use various optimization techniques, including software prefetching, data scheduling and code reordering. Our focus is on improving memory usage through code reordering compiler techniques. In this
Improving Branch Prediction Accuracy by Reducing Pattern History Table Interference
, 1996
"... Today's deeply pipelined, superscalar processors rely on accurate branch prediction in order to approach their performance potential. Branch mispredictions result in a flushing of the speculative information in the pipeline, thus limiting the amount of useful work that can be done. The 2-level ..."
Abstract
-
Cited by 76 (5 self)
- Add to MetaCart
(Show Context)
Today's deeply pipelined, superscalar processors rely on accurate branch prediction in order to approach their performance potential. Branch mispredictions result in a flushing of the speculative information in the pipeline, thus limiting the amount of useful work that can be done. The 2-level branch predictors have been shown to achieve high prediction accuracy. However, it has also been shown that there is a high degree of pattern history table interference in 2-level branch predictors and that the interference generally has a negative effect on the prediction accuracy. This paper introduces a method for reducing the pattern history table interference by dynamically identifying some easily predictable branches and inhibiting the pattern history table update for these branches. We show how this technique reduces pattern history table interference for two versions of the 2-level branch predictor and that this significantly improves branch prediction accuracy for the SPEC 95 benchmarks....
Interprocedural Conditional Branch Elimination
, 1997
"... The existence of statically detectable correlation among conditional branches enables their elimination, an optimization that has a number of benefits. This paper presents techniques to determine whether an interprocedural execution path leading to a conditional branch exists along which the branch ..."
Abstract
-
Cited by 74 (17 self)
- Add to MetaCart
The existence of statically detectable correlation among conditional branches enables their elimination, an optimization that has a number of benefits. This paper presents techniques to determine whether an interprocedural execution path leading to a conditional branch exists along which the branch outcome is known at compile time, and then to eliminate the branch along this path through code restructuring. The technique consists of a demand driven interprocedural analysis that determines whether a specific branch outcome is correlated with prior statements or branch outcomes. The optimization is performed using a code restructuring algorithm that replicates code to separate out the paths with correlation. When the correlated path is affected by a procedure call, the restructuring is based on procedure entry splitting and exit splitting. The entry splitting transformation creates multiple entries to a procedure, and the exit splitting transformation allows a procedure to return control...
Multiple-Block Ahead Branch Predictors
, 1996
"... A basic rule in computer architecture is that a processor cannot execute an application faster than it fetches its instructions. This paper presents a novel costeffective mechanism called the two-block ahead branch predictor. Information from the current instruction block is not used for predicting ..."
Abstract
-
Cited by 70 (5 self)
- Add to MetaCart
A basic rule in computer architecture is that a processor cannot execute an application faster than it fetches its instructions. This paper presents a novel costeffective mechanism called the two-block ahead branch predictor. Information from the current instruction block is not used for predicting the address of the next instruction block, but rather for predicting the block following the next instruction block. This approach overcomes the instruction fetch bottleneck exhibited by wide-dispatch "brainiac" processors by enabling them to efficiently predict addresses of two instruction blocks in a single cycle. Furthermore, pipelining the branch prediction process can also be done by means of our predictor for "speed demon" processors to achieve higher clock rate or to improve the prediction accuracy by means of bigger prediction structures. Moreover, and unlike the previously-proposed multiple predictor schemes, multiple-block ahead branch predictors can use any of the branch predictio...
Alternative Implementations of Hybrid Branch Predictors
- In Proceedings of the 28th Annual International Symposium on Microarchitecture
, 1995
"... Very accurate branch prediction is an important requirement for achieving high performance on deeply pipelined, superscalar processors. To improve on the prediction accuracy of current single-scheme branch predictors, hybrid (multiple-scheme) branch predictors have been proposed [6, 7]. These predic ..."
Abstract
-
Cited by 65 (1 self)
- Add to MetaCart
(Show Context)
Very accurate branch prediction is an important requirement for achieving high performance on deeply pipelined, superscalar processors. To improve on the prediction accuracy of current single-scheme branch predictors, hybrid (multiple-scheme) branch predictors have been proposed [6, 7]. These predictors combine multiple single-scheme predictors into a single predictor. They use a selection mechanism to decide for each branch, which single-scheme predictor to use. The performance of a hybrid predictor depends on its singlescheme predictor components and its selection mechanism. Using known single-scheme predictors and selection mechanisms, this paper identifies the most effective hybrid predictor implementation. In addition, it introduces a new selection mechanism, the 2-level selector, which further improves the performance of the hybrid branch predictor. 1 Introduction Branches can significantly reduce the performance of high-performance processors. Speculative execution is one solu...