Results 1 -
5 of
5
Value Locality and Load Value Prediction
, 1996
"... Since the introduction of virtual memory demand-paging and cache memories, computer systems have been exploiting spatial and temporal locality to reduce the average latency of a memory reference. In this paper, we introduce the notion of value locality, a third facet of locality that is frequently p ..."
Abstract
-
Cited by 331 (18 self)
- Add to MetaCart
Since the introduction of virtual memory demand-paging and cache memories, computer systems have been exploiting spatial and temporal locality to reduce the average latency of a memory reference. In this paper, we introduce the notion of value locality, a third facet of locality that is frequently present in real-world programs, and describe how to effectively capture and exploit it in order to perform load value prediction. Temporal and spatial locality are attributes of storage locations, and describe the future likelihood of references to those locations or their close neighbors. In a similar vein, value locality describes the likelihood of the recurrence of a previously-seen value within a storage location. Modern processors already exploit value locality in a very restricted sense through the use of control speculation (i.e. branch prediction), which seeks to predict the future value of a single condition bit based on previously-seen values. Our work extends this to predict entire 32- and 64-bit register values based on previously-seen values. We find that, just as condition bits are fairly predictable on a per-static-branch basis, full register values being loaded from memory are frequently predictable as well. Furthermore, we show that simple microarchitectural enhancements to two modern microprocessor implementations (based on the PowerPC 620 and Alpha 21164) that enable load value prediction can effectively exploit value locality to collapse true dependencies, reduce average memory latency and bandwidth requirements, and provide measurable performance gains. 1. Introduction and Related
Multiple-Block Ahead Branch Predictors
, 1996
"... A basic rule in computer architecture is that a processor cannot execute an application faster than it fetches its instructions. This paper presents a novel costeffective mechanism called the two-block ahead branch predictor. Information from the current instruction block is not used for predicting ..."
Abstract
-
Cited by 61 (5 self)
- Add to MetaCart
A basic rule in computer architecture is that a processor cannot execute an application faster than it fetches its instructions. This paper presents a novel costeffective mechanism called the two-block ahead branch predictor. Information from the current instruction block is not used for predicting the address of the next instruction block, but rather for predicting the block following the next instruction block. This approach overcomes the instruction fetch bottleneck exhibited by wide-dispatch "brainiac" processors by enabling them to efficiently predict addresses of two instruction blocks in a single cycle. Furthermore, pipelining the branch prediction process can also be done by means of our predictor for "speed demon" processors to achieve higher clock rate or to improve the prediction accuracy by means of bigger prediction structures. Moreover, and unlike the previously-proposed multiple predictor schemes, multiple-block ahead branch predictors can use any of the branch predictio...
Value Locality And Speculative Execution
, 1997
"... This thesis introduces a program attribute called value locality and proposes speculative execution under the weak dependence model. The weak dependence model lays a theoretical foundation for exploiting value locality and other program attributes by speculatively relaxing and deferring the detectio ..."
Abstract
-
Cited by 51 (1 self)
- Add to MetaCart
This thesis introduces a program attribute called value locality and proposes speculative execution under the weak dependence model. The weak dependence model lays a theoretical foundation for exploiting value locality and other program attributes by speculatively relaxing and deferring the detection and enforcement of control- and data-flow dependences between instructions to expose more instruction-level parallelism without violating program correctness. Value locality is a program attribute that describes the likelihood of the recurrence of a previously-seen value within a storage location inside a computer system. Most modern processors already exploit value locality through the use of control speculation (i.e. branch prediction), which seeks to predict the future values of condition code bits and branch-target addresses based on previously-seen values. Experimental results indicate that value locality exists for condition codes and branch target addresses, and for general-purpose ...
Performance evaluation of the powerpc 620 microarchitecture
- In Proceedings of the 22nd International Symposium on Computer Architecture
, 1995
"... The PowerPC 620 superscalar microprocessor is the most recent and performance leading member of the PowerPC family, which is being jointly developed by IBM and Motorola. The 64-bit 620 represents the most aggressive microarchitecture for superscalar processors to date. It employs a two-level branch ..."
Abstract
-
Cited by 24 (5 self)
- Add to MetaCart
The PowerPC 620 superscalar microprocessor is the most recent and performance leading member of the PowerPC family, which is being jointly developed by IBM and Motorola. The 64-bit 620 represents the most aggressive microarchitecture for superscalar processors to date. It employs a two-level branch prediction scheme, dynamic renaming for all the register files, distributed multientry reservation stations, true out-of-order execution by six pipelined execution units, emd a completion buffer for ensuring precise interrupts° This paper presents an instruction-level performance evaluation of the PowerPC 620 micro,architecture. A performance simulator for the 620 is developed using the VMW (Visualization-based Microarchitecmre Workbench) remrgetable framework. The VMW-based simulator accurately models the 620 microarchitecture down to the machine cycle level. Extensive trace-driven simulation is performed using the SPEC92 benchmarks. The experimental results indicate that the 620 is a well balanced design m~d achieves a maximum IPC rating of 1.94 on one of the benchmarks. Detailed quantitative analyses of the effectiveness of all the key microarchitecture features are presented. A brief philosophical comparison with the Alpha AXP 21164 is also include& Keywords: Superscalar processors, Out-of-order execution, Performance evaluation, Instructionlevel parallelism,.
Should Disks be Speed Demons or Brainiacs?
"... Disk drives play a critical role on the performance of I/O intensive applications. Over the years, disk drive performance has grown as a result of advances in magnetic recording density and faster rotational speeds. In essence, the performance driver in disks has been the data rate. In this paper, w ..."
Abstract
- Add to MetaCart
Disk drives play a critical role on the performance of I/O intensive applications. Over the years, disk drive performance has grown as a result of advances in magnetic recording density and faster rotational speeds. In essence, the performance driver in disks has been the data rate. In this paper, we show that data rate is going to be increasingly difficult to optimize, due to power/thermal constraints. Weargue that disk drive designers should instead focus their efforts on providing more computational capabilities that data intensive applications could leverage in order to boost performance. We also discuss the scope for provisioning powerful processors inside disk drives to provide these computational capabilities. 1.

