Results 1 -
4 of
4
Complexity-Effective Superscalar Processors
- In Proceedings of the 24th Annual International Symposium on Computer Architecture
, 1997
"... The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is defined. Then the specific areas of register renaming, instruction window wakeup and selection logic, and operand bypassing are analyzed. Each is modeled and Spice simulated for ..."
Abstract
-
Cited by 385 (5 self)
- Add to MetaCart
The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is defined. Then the specific areas of register renaming, instruction window wakeup and selection logic, and operand bypassing are analyzed. Each is modeled and Spice simulated for feature sizes of 0:8 m, 0:35 m, and0:18 m. Performance results and trends are expressed in terms of issue width and window size. Our analysis indicates that window wakeup and selection logic as well as operand bypass logic are likely to be the most critical in the future. A microarchitecture that simplifies wakeup and selection logic is proposed and discussed. This implementation puts chains of dependent instructions into queues, and issues instructions from multiple queues in parallel. Simulation shows little slowdown as compared with a completely flexible issue window when performance is measured in clock cycles. Furthermore, because only instructions at queue heads need to be awakened and selected, issue logic is simplified and the clock cycle is faster – consequently overall performance is improved. By grouping dependent instructions together, the proposed microarchitecture will help minimize performance degradation due to slow bypasses in future wide-issue machines. 1
Quantifying the Complexity of Superscalar Processors
, 1996
"... The delay of pipeline structures in superscalar processors are studied to determine their potential for limiting clock cycle times in future designs. First, a generic superscalar pipeline is defined. Then the specific areas of register renaming, instruction window wakeup and selection logic, and o ..."
Abstract
-
Cited by 72 (0 self)
- Add to MetaCart
The delay of pipeline structures in superscalar processors are studied to determine their potential for limiting clock cycle times in future designs. First, a generic superscalar pipeline is defined. Then the specific areas of register renaming, instruction window wakeup and selection logic, and operand bypassing are analyzed. Each is modeled and Spice simulated for feature sizes of 0:8 m, 0:35 m, and 0:18 m.
The Energy Complexity of Register Files
- In ISLPED
, 1997
"... Register files represent a substantial portion of the energy budget in modern processors, and are growing rapidly with the trend towards larger Instruction Level Parallelism (ILP). The energy cost of a register file access depends greatly on the register file circuitry used. This paper compares vari ..."
Abstract
-
Cited by 54 (2 self)
- Add to MetaCart
Register files represent a substantial portion of the energy budget in modern processors, and are growing rapidly with the trend towards larger Instruction Level Parallelism (ILP). The energy cost of a register file access depends greatly on the register file circuitry used. This paper compares various register file circuitry techniques for their energy efficiencies, as a function of the architectural parameters such as the number of registers and the number of ports. The Port Priority Selection technique combined with differential reads and low-swing writes was found to be the most energy efficient and provided significant energy savings compared to traditional approaches in the case of large register files. The dependence of register file access energy upon technology scaling is also studied. However, as this paper shows, it appears that none of these will be enough to prevent centralized register files from becoming the dominant power component of next-generation superscalar compute...
Diagonal Registers: Novel Vector Register File Design for High Performance and Multimedia Computing
, 2000
"... Finding new techniques to exploit parallelism in high performance and multimedia applications is one of the goals of many computer architects while developing new architectures. This thesis proposes a novel vector register file design (VRF) that extends the capability of a conventional VRF design. T ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Finding new techniques to exploit parallelism in high performance and multimedia applications is one of the goals of many computer architects while developing new architectures. This thesis proposes a novel vector register file design (VRF) that extends the capability of a conventional VRF design. The new VRF allows accessing the data using new patterns in addition to the already used row pattern. Moreover, the novel VRF can be designed readily in VLSI technology with a little overhead, while reserving the original structure of the VRF. The usage of the new VRF is demonstrated by means of new developed algorithms. By employing the properties of the novel VRF, a linear-time matrix transpose operation is achievable. The proposed algorithms execute matrix transpose operation in half the number of processor cycles of Motorola's AltiVec's matrix transpose algorithms. The novel VRF design opens new doors to build better vector architectures optimized for the modern two-dimensional digital word.

