## Dynamic Branch Prediction with Perceptrons

### Cached

### Download Links

- [www.cs.utsa.edu]
- [cava.cs.utsa.edu]
- [www.cs.rutgers.edu]
- [taco.cs.utsa.edu]
- [faculty.cse.tamu.edu]
- [faculty.cse.tamu.edu]
- [www.utdallas.edu]
- [www.cs.utexas.edu]
- [www.cs.utsa.edu]
- [www.cs.utexas.edu]
- [web.cecs.pdx.edu]
- [web.cecs.pdx.edu]
- [web.cecs.pdx.edu]
- [www.cs.utexas.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 147 - 18 self |

### BibTeX

@MISC{Jiménez_dynamicbranch,

author = {Daniel A. Jiménez and Calvin Lin},

title = {Dynamic Branch Prediction with Perceptrons},

year = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper presents a new method for branch prediction. The key idea is to use one of the simplest possible neural networks, the perceptron, as an alternative to the commonly used two-bit counters. Our predictor achieves increased accuracy by making use of long branch histories, which are possible because the hardware resources for our method scale linearly with the history length. By contrast, other purely dynamic schemes require exponential resources. We describe our design and evaluate it with respect to two well known predictors. We show that for a 4K byte hardware budget our method improves misprediction rates for the SPEC 2000 benchmarks by 10.1 % over the gshare predictor. Our experiments also provide a better understanding of the situations in which traditional predictors do and do not perform well. Finally, we describe techniques that allow our complex predictor to operate in one cycle.

### Citations

4331 |
Computer Architecture: a quantitative approach (3rd ed
- Hennessy, Patterson
- 2003
(Show Context)
Citation Context ...struction-level parallelism. For example, data that is likely to be read in the near future is speculatively prefetched, and predicted values are speculatively used before actual values are available =-=[10, 24]-=-. Accurate prediction mechanisms have been the driving force behind these techniques, so increasing the accuracy of predictors increases the performance benefit of speculation. Machine learning techni... |

604 |
Adaptive switching circuits
- Widrow, Hoff
- 1960
(Show Context)
Citation Context ... forms of machine learning, such as decision trees, are less attractive because of excessive implementation costs. For this work, we also considered other simple neural architectures, such as ADALINE =-=[25]-=- and Hebb learning [8], but we found that these were less effective than perceptrons (lower hardware efficiency for ADALINE, less accuracy for Hebb). One benefit of perceptrons is that by examining th... |

580 | Combining branch predictors - McFarling - 1993 |

266 |
Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms
- Rosenblatt
- 1962
(Show Context)
Citation Context ...ng process is easy to understand as the result of a simple mathematical formula. We discuss this property in more detail in Section 5.7. 3.2 How Perceptrons Work The perceptron was introduced in 1962 =-=[19]-=- as a way to study brain function. We consider the simplest of many types of perceptrons [2], a single-layer perceptron consisting of one artificial neuron connecting several input units by weighted e... |

199 | Highly accurate data value prediction using hybrid predictors
- Wang, Franklin
- 1997
(Show Context)
Citation Context ...struction-level parallelism. For example, data that is likely to be read in the near future is speculatively prefetched, and predicted values are speculatively used before actual values are available =-=[10, 24]-=-. Accurate prediction mechanisms have been the driving force behind these techniques, so increasing the accuracy of predictors increases the performance benefit of speculation. Machine learning techni... |

164 | Assigning confidence to conditional branch predictions
- Jacobsen, Rotenberg, et al.
- 1996
(Show Context)
Citation Context ...y execute both branch paths when confidence is low, and to execute only the predicted path when confidence is high. Some branch prediction schemes explicitly compute a confidence in their predictions =-=[11]-=-, but in our predictor this information comes for free. We have observed experimentally that the probability that a branch will be taken can be accurately estimated as a linear function of the output ... |

160 | Branch prediction for free
- Ball, Larus
- 1993
(Show Context)
Citation Context ...plying program features, such as controlflow and opcode information, as input to a trained neural network. This approach achieves an 80% correct prediction rate, compared to 75% for static heuristics =-=[1, 3]-=-. Static branch prediction performs worse than existing dynamic techniques, but is useful for performing static compiler optimizations. Branch prediction and genetic algorithms. Neural networks are pa... |

148 |
Two-Level Adaptive Branch Prediction
- Yeh, Patt
- 1991
(Show Context)
Citation Context ...nnot describe our new predictor. 2.2 Dynamic Branch Prediction Dynamic branch prediction has a rich history in the literature. Recent research focuses on refining the two-level scheme of Yeh and Patt =-=[26]-=-. In this scheme, a pattern history table (PHT) of two-bit saturating counters is indexed by a combination of branch address and global or per-branch history. The high bit of the counter is taken as t... |

117 | The Alpha 21264 Microprocessor Architecture
- Kessler, McLellan, et al.
- 1998
(Show Context)
Citation Context ...lobal predictors from the branch prediction literature. We also evaluate a hybrid gshare/perceptron predictor that uses a 2K byte choice table and the same choice mechanism as that of the Alpha 21264 =-=[14]-=-. The goal of our hybrid predictor is to show that because the perceptron has complementary strengths to gshare, a hybrid of the two performs well. All of the simulated predictors use only global patt... |

103 | The YAGS Branch Prediction Scheme
- Eden, Mudge
- 1998
(Show Context)
Citation Context ...nstructions issued per cycle increases, the penalty for a misprediction increases. Recent efforts to improve branch prediction focus primarily on eliminating aliasing in two-level adaptive predictors =-=[17, 16, 22, 4]-=-, which occurs when two unrelated branches destructively interfere by using the same prediction resources. We take a different approach—one that is largely orthogonal to previous work—by improving the... |

100 | Using hybrid branch predictors to improve branch prediction accuracy in the presence of context switches
- Evers, Chang, et al.
- 1996
(Show Context)
Citation Context ...either per-branch nor path information is used. Thus, we have not yet compared our hybrid against existing global/per-branch hybrid schemes. Per-branch and path information can yield greater accuracy =-=[6, 14]-=-, but our restriction to global information is typical of recent work in branch prediction [16, 4]. Gathering traces. Our simulations use the instrumented assembly output of the gcc 2.95.1 compiler wi... |

99 |
The bi-mode branch predictor
- Lee, Chen, et al.
- 1997
(Show Context)
Citation Context ...nstructions issued per cycle increases, the penalty for a misprediction increases. Recent efforts to improve branch prediction focus primarily on eliminating aliasing in two-level adaptive predictors =-=[17, 16, 22, 4]-=-, which occurs when two unrelated branches destructively interfere by using the same prediction resources. We take a different approach—one that is largely orthogonal to previous work—by improving the... |

93 | The Agree Predictor: A Mechanism for Reducing Negative Branch History Interference
- Sprangle, Chappell, et al.
- 1997
(Show Context)
Citation Context ...nstructions issued per cycle increases, the penalty for a misprediction increases. Recent efforts to improve branch prediction focus primarily on eliminating aliasing in two-level adaptive predictors =-=[17, 16, 22, 4]-=-, which occurs when two unrelated branches destructively interfere by using the same prediction resources. We take a different approach—one that is largely orthogonal to previous work—by improving the... |

89 | Trading Conflict and Capacity Aliasing in Conditional Branch Predictors
- Michaud, Seznec, et al.
- 1997
(Show Context)
Citation Context ...lobal history shift register [7]. Even if a PHT scheme could somehow implement longer history lengths, it would not help because longer history lengths require longer training times for these methods =-=[18]-=-. Variable length path branch prediction [23] is one scheme for considering longer paths. It avoids the PHT capacity problem by computing a hash function of the addresses along the path to the branch.... |

83 | The impact of delay on the design of branch predictors
- JIMÉNEZ, KECKLER, et al.
(Show Context)
Citation Context ...riable length path branch predictor [23]. That work proposes pipelining the predictor to reduce delay. Jiménez et al study a number of techniques for reducing the impact of delay on branch predictors =-=[12]-=-. For example, a cascading perceptron predictor would use a simple predictor to anticipate the address of the next branch to be fetched, and it would use a perceptron to begin predicting the anticipat... |

80 |
The perceptron: A model for brain functioning
- Block
- 1962
(Show Context)
Citation Context ...his property in more detail in Section 5.7. 3.2 How Perceptrons Work The perceptron was introduced in 1962 [19] as a way to study brain function. We consider the simplest of many types of perceptrons =-=[2]-=-, a single-layer perceptron consisting of one artificial neuron connecting several input units by weighted edges to one output unit. A perceptron learns a target Boolean function of inputs. In our cas... |

66 | Evidence-based static branch prediction using machine learning
- Calder, Grunwald, et al.
- 1997
(Show Context)
Citation Context ... including pattern recognition, classification [8], and image understanding [15, 13]. Static branch prediction with neural networks. Neural networks have been used to perform static branch prediction =-=[3]-=-, where the likely direction of a branch is predicted at compile-time by supplying program features, such as controlflow and opcode information, as input to a trained neural network. This approach ach... |

57 | An Analysis of Correlation and Predictability: What Makes TwoLevel Branch Predictors Work
- Evers, Patel, et al.
- 1998
(Show Context)
Citation Context ...[4]. Most two-level predictors cannot consider long history lengths, which becomes a problem when the distance between correlated branches is longer than the length of a global history shift register =-=[7]-=-. Even if a PHT scheme could somehow implement longer history lengths, it would not help because longer history lengths require longer training times for these methods [18]. Variable length path branc... |

56 |
Correlation and aliasing in dynamic branch predictors
- Sechrest, Lee, et al.
- 1996
(Show Context)
Citation Context ...er is taken as the prediction. Once the branch outcome is known, the counter is incremented if the branch is taken, and decremented otherwise. An important problem in two-level predictors is aliasing =-=[20]-=-, and many of the recently proposed branch predictors seek to reduce the aliasing problem [17, 16, 22, 4] but do not change the basic prediction mechanism. Given a generous hardware budget, many of th... |

46 | Variable Length Path Branch Prediction
- Stark, Evers, et al.
- 1998
(Show Context)
Citation Context ...HT scheme could somehow implement longer history lengths, it would not help because longer history lengths require longer training times for these methods [18]. Variable length path branch prediction =-=[23]-=- is one scheme for considering longer paths. It avoids the PHT capacity problem by computing a hash function of the addresses along the path to the branch. It uses a complex multi-pass profiling and c... |

44 | A language for describing predictors and its application to automatic synthesis
- Emer, Gloy
- 1997
(Show Context)
Citation Context ...rediction and genetic algorithms. Neural networks are part of the field of machine learning, which also includes genetic algorithms. Emer and Gloy use genetic algorithms to “evolve” branch predictors =-=[5]-=-, but it is important to note the difference between their work and ours. Their work uses evolution to design more accurate predictors, but the end result is something similar to a highly tuned tradit... |

36 |
Artificial Neural Network for Image Understanding
- Kulkarni
- 1994
(Show Context)
Citation Context ... learn to compute a function using example inputs and outputs. Neural networks have been used for a variety of applications, including pattern recognition, classification [8], and image understanding =-=[15, 13]-=-. Static branch prediction with neural networks. Neural networks have been used to perform static branch prediction [3], where the likely direction of a branch is predicted at compile-time by supplyin... |

31 | Understanding neural networks via rule extraction
- Setiono, Liu
- 1995
(Show Context)
Citation Context ...m of many neural networks is that it is difficult or impossible to determine exactly how the neural network is making its decision. Techniques have been proposed to extract rules from neural networks =-=[21]-=-, but these rules are not always accurate. Perceptrons do not suffer from this opaqueness; the perceptron’s decision-making process is easy to understand as the result of a simple mathematical formula... |

22 | Dynamically weighted ensemble neural networks for classification
- Jimenez, Walsh
- 1998
(Show Context)
Citation Context ... learn to compute a function using example inputs and outputs. Neural networks have been used for a variety of applications, including pattern recognition, classification [8], and image understanding =-=[15, 13]-=-. Static branch prediction with neural networks. Neural networks have been used to perform static branch prediction [3], where the likely direction of a branch is predicted at compile-time by supplyin... |

8 | Fundamentals of Neural Networks: Architectures, Algorithms and Applications - Faucett - 1994 |

4 |
A 2.7ns 0.25um CMOS 54 54b multiplier
- Hagihara, Inui, et al.
- 1998
(Show Context)
Citation Context ...wise,” a quick arithmetic operation since the are at most 9-bit numbers: for each bit in parallel if+ * then := 6 else := end if Delay. A multiplier in a 0.25 m process can operate in 2.7 nanoseconds =-=[9]-=-, which is approximately two clock cycles with a 700 MHz clock. At the longer history lengths, an implementation of our predictor resembles a 54 54 multiply, but the data corresponding to the partial ... |