Results 1 - 10
of
34
Dynamic Branch Prediction with Perceptrons
"... This paper presents a new method for branch prediction. The key idea is to use one of the simplest possible neural networks, the perceptron, as an alternative to the commonly used two-bit counters. Our predictor achieves increased accuracy by making use of long branch histories, which are possible b ..."
Abstract
-
Cited by 123 (17 self)
- Add to MetaCart
This paper presents a new method for branch prediction. The key idea is to use one of the simplest possible neural networks, the perceptron, as an alternative to the commonly used two-bit counters. Our predictor achieves increased accuracy by making use of long branch histories, which are possible because the hardware resources for our method scale linearly with the history length. By contrast, other purely dynamic schemes require exponential resources. We describe our design and evaluate it with respect to two well known predictors. We show that for a 4K byte hardware budget our method improves misprediction rates for the SPEC 2000 benchmarks by 10.1 % over the gshare predictor. Our experiments also provide a better understanding of the situations in which traditional predictors do and do not perform well. Finally, we describe techniques that allow our complex predictor to operate in one cycle.
Neural Methods for Dynamic Branch Prediction
- ACM Transactions on Computer Systems
, 2002
"... This paper presents a new method for branch prediction that is highly accurate. The key idea is to use one of the simplest possible neural methods, the perceptron, as an alternative to the commonly used two-bit counters. The source of our predictor's accuracy is its ability to use long history lengt ..."
Abstract
-
Cited by 71 (9 self)
- Add to MetaCart
This paper presents a new method for branch prediction that is highly accurate. The key idea is to use one of the simplest possible neural methods, the perceptron, as an alternative to the commonly used two-bit counters. The source of our predictor's accuracy is its ability to use long history lengths, because the hardware resources for our method scale linearly, rather than exponentially, with the history length.
Inducing Heuristics To Decide Whether To Schedule
- IN PROCEEDINGS OF THE ACM SIGPLAN ’04 CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION
, 2004
"... Instruction scheduling is a compiler optimization that can improve program speed, sometimes by 10% or more---but it can also be expensive. Furthermore, time spent optimizing is more important in a Java just-in-time (JIT) compiler than in a traditional one because a JIT compiles code at run time, add ..."
Abstract
-
Cited by 36 (8 self)
- Add to MetaCart
Instruction scheduling is a compiler optimization that can improve program speed, sometimes by 10% or more---but it can also be expensive. Furthermore, time spent optimizing is more important in a Java just-in-time (JIT) compiler than in a traditional one because a JIT compiles code at run time, adding to the running time of the program. We found that, on any given block of code, instruction scheduling often does not produce significant benefit and sometimes degrades speed. Thus, we hoped that we could focus scheduling effort on those blocks that benefit from it. Using
Predicting unroll factors using supervised classification
- In Proceedings of International Symposium on Code Generation and Optimization (CGO
, 2005
"... Compilers base many critical decisions on abstracted architectural models. While recent research has shown that modeling is effective for some compiler problems, building accurate models requires a great deal of human time and effort. This paper describes how machine learning techniques can be lever ..."
Abstract
-
Cited by 30 (1 self)
- Add to MetaCart
Compilers base many critical decisions on abstracted architectural models. While recent research has shown that modeling is effective for some compiler problems, building accurate models requires a great deal of human time and effort. This paper describes how machine learning techniques can be leveraged to help compiler writers model complex systems. Because learning techniques can effectively make sense of high dimensional spaces, they can be a valuable tool for clarifying and discerning complex decision boundaries. In this work we focus on loop unrolling, a well-known optimization for exposing instruction level parallelism. Using the Open Research Compiler as a testbed, we demonstrate how one can use supervised learning techniques to determine the appropriateness of loop unrolling. We use more than 2,500 loops — drawn from 72 benchmarks — to train two different learning algorithms to predict unroll factors (i.e., the amount by which to unroll a loop) for any novel loop. The technique correctly predicts the unroll factor for 65 % of the loops in our dataset, which leads to a 5% overall improvement for the SPEC 2000 benchmark suite (9 % for the SPEC 2000 floating point benchmarks). 1.
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach
, 2008
"... Efficiently utilizing off-chip DRAM bandwidth is a critical issue in designing cost-effective, high-performance chip multiprocessors (CMPs). Conventional memory controllers deliver relatively low performance in part because they often employ fixed, rigid access scheduling policies designed for avera ..."
Abstract
-
Cited by 28 (4 self)
- Add to MetaCart
Efficiently utilizing off-chip DRAM bandwidth is a critical issue in designing cost-effective, high-performance chip multiprocessors (CMPs). Conventional memory controllers deliver relatively low performance in part because they often employ fixed, rigid access scheduling policies designed for average-case application behavior. As a result, they cannot learn and optimize the long-term performance impact of their scheduling decisions, and cannot adapt their scheduling policies to dynamic workload behavior. We propose a new, self-optimizing memory controller design that operates using the principles of reinforcement learning (RL) to overcome these limitations. Our RL-based memory controller observes the system state and estimates the long-term performance impact of each action it can take. In this way, the controller learns to optimize its scheduling policy on the fly to maximize long-term performance. Our results show that an RL-based memory controller improves the performance of a set of parallel applications run on a 4-core CMP by 19 % on average (up to 33%), and it improves DRAM bandwidth utilization by 22% compared to a state-of-the-art controller.
Method-specific dynamic compilation using logistic regression
- of ACM SIGPLAN Conferences on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA'06
, 2006
"... Abstract Determining the best set of optimizations to apply to a programhas been a long standing problem for compiler writers. To reduce ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
Abstract Determining the best set of optimizations to apply to a programhas been a long standing problem for compiler writers. To reduce
On Bandwidth Smoothing
- In 4th International Web Caching Workshop
, 1999
"... The bandwidth usage due to HTTP traffic often varies considerably over the course of a day, requiring high network performance during peak periods while leaving network resources unused during off-peak periods. We show that using these extra network resources to prefetch web content during off-peak ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
The bandwidth usage due to HTTP traffic often varies considerably over the course of a day, requiring high network performance during peak periods while leaving network resources unused during off-peak periods. We show that using these extra network resources to prefetch web content during off-peak periods can significantly reduce peak bandwidth usage without compromising cache consistency. With large HTTP traffic variations it is therefore feasible to apply "bandwidth smoothing" to reduce the cost and the required capacity of a network infrastructure. In addition to reducing the peak network demand, bandwidth smoothing improves cache hit rates. We apply machine learning techniques to automatically develop prefetch strategies that have high accuracy. Our results are based on web proxy traces generated at a large corporate Internet exchange point and data collected from recent scans of popular web sites. 1 Introduction The bandwidth usage due to web traffic can vary considerable over th...
Predicting Multiprocessor Memory Access Patterns with Learning Models
, 1997
"... Machine learning techniques are applicable to computer system optimization. We show that shared memory multiprocessors can successfully utilize machine learning algorithms for memory access pattern prediction. In particular three different on-line machine learning prediction techniques were tested t ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Machine learning techniques are applicable to computer system optimization. We show that shared memory multiprocessors can successfully utilize machine learning algorithms for memory access pattern prediction. In particular three different on-line machine learning prediction techniques were tested to learn and predict repetitive memory access patterns for three typical parallel processing applications, the 2-D relaxation algorithm, matrix multiply and Fast Fourier Transform on a shared memory multiprocessor. The predictions were then used by a routing control algorithm to reduce control latency in the interconnection network by configuring the interconnection network to provide needed memory access paths before they were requested. Three trainable prediction techniques were used and tested: 1). a Markov predictor, 2). a linear predictor and 3). a time delay neural network (TDNN) predictor. Different predictors performed best on different applications, but the TDNN produced uniformly go...
Genetic Programming Applied to Compiler Heuristic Optimization
, 2003
"... Genetic programming (GP) has a natural niche in the optimization of small but high payoff software heuristics. We use GP to optimize the priority functions associated with two well known compiler heuristics: predicated hyperblock formation, and register allocation. Our system achieves impressive spe ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Genetic programming (GP) has a natural niche in the optimization of small but high payoff software heuristics. We use GP to optimize the priority functions associated with two well known compiler heuristics: predicated hyperblock formation, and register allocation. Our system achieves impressive speedups over a standard baseline for both problems. For hyperblock selection, application-specific heuristics obtain an average speedup of 23% (up to 73%) for the applications in our suite. By evolving the compiler's heuristic over several benchmarks, the best general-purpose heuristic our system found improves the predication algorithm by an average of 25% on our training set, and 9% on a completely unrelated test set. We also improve a well-studied register allocation heuristic. On average, our system obtains a 6% speedup when it specializes the register allocation algorithm for individual applications. The general-purpose heuristic for register allocation achieves a 3% improvement.
Code Placement for Improving Dynamic Branch Prediction
- In Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation (PLDI’05
, 2005
"... Code placement techniques have traditionally improved instruction fetch bandwidth by increasing instruction locality and decreasing the number of taken branches. However, traditional code placement techniques have less benefit in the presence of a trace cache that alters the placement of instruction ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Code placement techniques have traditionally improved instruction fetch bandwidth by increasing instruction locality and decreasing the number of taken branches. However, traditional code placement techniques have less benefit in the presence of a trace cache that alters the placement of instructions in the instruction cache. Moreover, as pipelines have become deeper to accommodate increasing clock rates, branch misprediction penalties have become a significant impediment to performance. We evaluate pattern history table partitioning, a feedback directed code placement technique that explicitly places conditional branches so that they are less likely to interfere destructively with one another in branch prediction tables. On SPEC CPU benchmarks running on an Intel Pentium 4, branch mispredictions are reduced by up to 22% and 3.5% on average. This reduction yields a speedup of up to 16.0% and 4.5% on average. By contrast, branch alignment, a previous code placement technique, yields only up to a 4.7% speedup and less than 1% on average.

