Results 1 -
5 of
5
Minimal Subset Evaluation: Rapid Warm-up for Simulated Hardware State
- In Proceedings of the 2001 International Conference on Computer Design
, 2001
"... This paper introduces minimal subset evaluation (MSE) as a way to reduce time spent on large-structure warm-up during the fastforwarding portion of processor simulations. Warm up is commonly used prior to full-detail simulation to avoid cold-start bias in large structures like caches and branch pred ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
This paper introduces minimal subset evaluation (MSE) as a way to reduce time spent on large-structure warm-up during the fastforwarding portion of processor simulations. Warm up is commonly used prior to full-detail simulation to avoid cold-start bias in large structures like caches and branch predictors. Unfortunately, warm up can be very time consuming, often representing 50% or more of total simulation time. Previous techniques have used the entire fast-forward interval to obtain accurate warm up, which may be prohibitive for large parameter-space searches, or chosen a short but ad-hoc warm-up length that reduces simulation time but may sacrifice accuracy. MSE probabilistically determines a minimally sufficient fraction of the set of fast-forward transactions that must be executed for warm up to accurately produce state as it would have appeared had the entire fast-forward interval been used for warm up. The paper describes the mathematical underpinnings of MSE and demonstrates its effectiveness for both single-large-sample and multiple-sample simulation styles. In our experiments, MSE yields errors of less than 1% in IPC measurements with cycle-accurate simulation, while reducing simulation times by an average factor of two or more. 1
Memory Reference Reuse Latency: Accelerated Sampled Microarchitecture Simulation
- In Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software
, 2002
"... This paper explores techniques for speeding up sampled microprocessor simulations by exploiting the observation that of the memory references that precede a sample, references that occur nearest to the sample are more likely to be germane during the sample itself. This means that accurately warming ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
This paper explores techniques for speeding up sampled microprocessor simulations by exploiting the observation that of the memory references that precede a sample, references that occur nearest to the sample are more likely to be germane during the sample itself. This means that accurately warming up simulated cache and branch predictor state only requires that a subset of the memory references and control-flow instructions immediately preceding a simulation sample need to be modeled. Our technique takes measurements of the memory reference reuse latencies (MRRLs) and uses these data to choose a point prior to each sample to engage cache hierarchy and branch predictor modeling.
Applying Decay Strategies to Branch Predictors for Leakage Energy Savings
, 2002
"... With technology advancing toward deep submicron, leakage energy is of increasing concern, especially for large onchip array structures such as caches and branch predictors. Recent work has suggested that even larger branch predictors can and should be used in order to improve microprocessor performa ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
With technology advancing toward deep submicron, leakage energy is of increasing concern, especially for large onchip array structures such as caches and branch predictors. Recent work has suggested that even larger branch predictors can and should be used in order to improve microprocessor performance. A further consideration is that the branch predictor is a thermal hot spot, thus further increasing its leakage. For these reasons, it is natural to consider applying decay techniques---already shown to reduce leakage energy for caches---to branch-prediction structures.
Techniques for Accurate, Accelerated Processor Simulation: An Analysis of Reduced Inputs and Sampling
, 2002
"... Detailed execution-driven simulation is an important tool for computer architecture research. It is desirable to drive these simulations with standard benchmark programs that are commonly used to evaluate existing computer systems, such as the SPEC2000 suite. Unfortunately, simulating these benchm ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Detailed execution-driven simulation is an important tool for computer architecture research. It is desirable to drive these simulations with standard benchmark programs that are commonly used to evaluate existing computer systems, such as the SPEC2000 suite. Unfortunately, simulating these benchmark programs to completion using full-detail, cycle-accurate simulation on the designated reference input sets results in intractably long simulation durations. This study evaluates and compares two techniques for combating long simulation times: reduced inputs and sampling. Our objective is to assess the ability of each to reduce simulation running times, while simultaneously minimizing the difference in the results generated by using these techniques relative to the results generated by simulating the benchmark programs to completion using the reference inputs. With the reduced input technique, new input sets are carefully generated by hand to produce run-time characteristics of the benchmark programs that are comparable to the overall characteristics produced when the programs are run with the standard inputs.
The Effects of Context Switching on Branch Predictor Performance
- In Proceedings of the 2001 IEEE International Symposium for Performance Analysis of Systems and Software, November, 2001, Tuscon, AZ
, 2001
"... This paper shows that context switching is not a significant factor to be considered when performing general branch prediction studies. Branch prediction allows for speculative execution by increasing available instruction level parallelism (ILP) and hiding the time required to resolve branch condit ..."
Abstract
- Add to MetaCart
This paper shows that context switching is not a significant factor to be considered when performing general branch prediction studies. Branch prediction allows for speculative execution by increasing available instruction level parallelism (ILP) and hiding the time required to resolve branch conditions. Accurate simulation of branch prediction is important because branch prediction strongly influences the behavior of processor structures. For this study, a timesharing framework was developed by modifying SimpleScalar's branch predictor simulator. A thorough characterization of the effects of branch predictor configuration, branch predictor area, and time slice length is provided. As further verification, branch predictor performance with and without flushing the predictor structures is compared. Experiments show that operating system context switches have little effect on branch prediction rate when using time slices representative of today's operating systems. Our findings show that this results from the fact that time slices are much larger than the training time required by the branch predictor structures. For all predictor configurations tested, the predictors train in under 128K instructions with or without flushing the branch predictor structures.

