Results 1 -
3 of
3
Complexity-Effective Superscalar Processors
- In Proceedings of the 24th Annual International Symposium on Computer Architecture
, 1997
"... The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is defined. Then the specific areas of register renaming, instruction window wakeup and selection logic, and operand bypassing are analyzed. Each is modeled and Spice simulated for ..."
Abstract
-
Cited by 385 (5 self)
- Add to MetaCart
The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is defined. Then the specific areas of register renaming, instruction window wakeup and selection logic, and operand bypassing are analyzed. Each is modeled and Spice simulated for feature sizes of 0:8 m, 0:35 m, and0:18 m. Performance results and trends are expressed in terms of issue width and window size. Our analysis indicates that window wakeup and selection logic as well as operand bypass logic are likely to be the most critical in the future. A microarchitecture that simplifies wakeup and selection logic is proposed and discussed. This implementation puts chains of dependent instructions into queues, and issues instructions from multiple queues in parallel. Simulation shows little slowdown as compared with a completely flexible issue window when performance is measured in clock cycles. Furthermore, because only instructions at queue heads need to be awakened and selected, issue logic is simplified and the clock cycle is faster – consequently overall performance is improved. By grouping dependent instructions together, the proposed microarchitecture will help minimize performance degradation due to slow bypasses in future wide-issue machines. 1
Quantifying the Complexity of Superscalar Processors
, 1996
"... The delay of pipeline structures in superscalar processors are studied to determine their potential for limiting clock cycle times in future designs. First, a generic superscalar pipeline is defined. Then the specific areas of register renaming, instruction window wakeup and selection logic, and o ..."
Abstract
-
Cited by 72 (0 self)
- Add to MetaCart
The delay of pipeline structures in superscalar processors are studied to determine their potential for limiting clock cycle times in future designs. First, a generic superscalar pipeline is defined. Then the specific areas of register renaming, instruction window wakeup and selection logic, and operand bypassing are analyzed. Each is modeled and Spice simulated for feature sizes of 0:8 m, 0:35 m, and 0:18 m.
Reducing The Latency Of Floating-Point Arithmetic Operations
, 1993
"... Floating-point (FP) numbers are used in general-purpose scientific computation and increasingly in digital signal processing and graphics as well. The manipulation of these numbers, however, is more complex and has much higher latencies than their integer counterparts. This research attempts to redu ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Floating-point (FP) numbers are used in general-purpose scientific computation and increasingly in digital signal processing and graphics as well. The manipulation of these numbers, however, is more complex and has much higher latencies than their integer counterparts. This research attempts to reduce the latencies of the commonly used FP operations: add, multiply, divide, and square root. The latency of an FADD can be improved by an estimated 20% at no extra hardware expense. By a detailed analysis of the existing two-path implementation, this work shows that the rounding step in both paths can be combined with the mantissa addition step, saving both hardware and time. The algorithm has been demonstrated through extensive computer simulation and silicon implementation. The test chip implemented in a standard 1um CMOS technology has a simulated nominal delay of 17ns. Like existing adders, the implemented adder uses a leading one prediction (LOP) circuit. This work examines LOP in a ge...

