Results 1 - 10
of
11
Extending the Effectiveness of 3D-Stacked DRAM Caches with an Adaptive Multi-Queue Policy
"... 3D-integration is a promising technology to help combat the “Memory Wall ” in future multi-core processors. Past work has considered using 3D-stacked DRAM as a large last-level cache (LLC). While significant performance benefits can be gained with such an approach, there remain additional opportunit ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
3D-integration is a promising technology to help combat the “Memory Wall ” in future multi-core processors. Past work has considered using 3D-stacked DRAM as a large last-level cache (LLC). While significant performance benefits can be gained with such an approach, there remain additional opportunities beyond the simple integration of commodity DRAM chips. In this work, we leverage the hardware organization typical of DRAM architectures to propose new cache management policies that would otherwise not be practical for standard SRAM-based caches. We propose a cache where each set is organized as multiple logical FIFO or queue structures that simultaneously provide performance isolation between threads as well as reduce the number of entries occupied by dead lines. Our results show that beyond the simplistic approach of stacking DRAM as cache, such tightly-integrated 3D architectures enable new opportunities for optimizing and improving system performance.
Fine grain 3D integration for microarchitecture design through cube packing exploration
- In Proceedings of the International Conference on Computer Design. (To appear
, 2007
"... Most previous 3D IC research focused on “stacking ” traditional 2D silicon layers, so the interconnect reduction is limited to interblock delays. In this paper, we propose techniques that enable efficient exploration of the 3D design space where each logical block can span more than one silicon laye ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Most previous 3D IC research focused on “stacking ” traditional 2D silicon layers, so the interconnect reduction is limited to interblock delays. In this paper, we propose techniques that enable efficient exploration of the 3D design space where each logical block can span more than one silicon layers. Although further power and performance improvement is achievable through fine grain 3D integration, the necessary modeling and tool infrastructure has been mostly missing. We develop a cube packing engine which can simultaneously optimize physical and architectural design for effective utilization of 3D in terms of performance, area and temperature. Our experimental results using a design driver show 36 % performance improvement (in BIPS) over 2D and 14 % over 3D with single layer blocks. Additionally multi-layer blocks can provide up to 30 % reduction in power dissipation compared to the single-layer alternatives. Peak temperature of the design is kept within limits as a result of thermal-aware floorplanning and thermal via insertion techniques. 1.
Understanding the Impact of 3D Stacked Layouts on ILP
"... 3D die-stacked chips can alleviate the penalties imposed by long wires within microprocessor circuits. Many recent studies have attempted to partition each microprocessor structure across three dimensions to reduce their access times. In this paper, we implement each microprocessor structure on a si ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
3D die-stacked chips can alleviate the penalties imposed by long wires within microprocessor circuits. Many recent studies have attempted to partition each microprocessor structure across three dimensions to reduce their access times. In this paper, we implement each microprocessor structure on a single 2D die and leverage 3D to reduce the lengths of wires that communicate data between microprocessor structures within a single core. We begin with a criticality analysis of inter-structure wire delays and show that for most traditional simple superscalar cores, 2D floorplans are already very efficient at minimizing critical wire delays. For an aggressive wire-constrained clustered superscalar architecture, an exploration of the design space reveals that 3D can yield higher benefit. However, this benefit may be negated by the higher power density and temperature entailed by 3D integration. Overall, we report a negative result and argue against leveraging 3D for higher ILP.
A Modular 3D Processor for Flexible Product Design and Technology Migration
- CF'08
, 2008
"... The current methodology used in mass-market processor design is to create a single base microarchitecture (e.g., Intel’s “Core” or AMD’s “K8”) that is used throughout all of the PC market segments from laptops to servers. To differentiate the products, manufacturers rely on speed binning, different ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The current methodology used in mass-market processor design is to create a single base microarchitecture (e.g., Intel’s “Core” or AMD’s “K8”) that is used throughout all of the PC market segments from laptops to servers. To differentiate the products, manufacturers rely on speed binning, different cache sizes, and varying the number of cores. In this paper, we propose using 3D integration to provide a new, but complementary, approach to providing product differentiation. Past research on using 3D to improve performance has focused on the construction of “fully 3D ” circuits where functional blocks are partitioned across two or more layers. This approach forces one of two undesirable situations: (1) all products must be implemented in, and therefore pay the cost of, 3D or (2) a 3D-implemented processor is designed for the high-end/high-performance markets and a separate 2D microarchitecture must be designed for the lower-cost markets thereby incurring significant additional design effort and engineering cost. We present a modular processor architecture where 3D can be used to enhance performance within a single unified design and also provides for a more gradual migration path toward fully 3D-integrated designs. To make this work, we describe a generic technique of using “phantom ” components where the baseline processor may believe that 3D-stacked resources exist, but are currently unavailable. Simply using 3D to stack more L2 cache provides a 15.1 % average performance benefit, but our proposal increases performance by 25.4%.
3D Architecture Modeling and Exploration
"... Vertical integration (3D ICs) has demonstrated the potential to reduce inter-block wire latency through flexible block placement and routing. However, there is an untapped potential for 3D ICs to reduce intra-block wire latency through architectural designs that can leverage multiple silicon layers ..."
Abstract
- Add to MetaCart
Vertical integration (3D ICs) has demonstrated the potential to reduce inter-block wire latency through flexible block placement and routing. However, there is an untapped potential for 3D ICs to reduce intra-block wire latency through architectural designs that can leverage multiple silicon layers in innovative ways. Furthermore, it is particularly challenging to simultaneously explore the physical design space and microarchitectural space for vertical integration. The physical design space typically has no information on the microarchitectural impact of latency optimization, and the microarchitectural space has no information on the physical design impact of different architectural alternatives. We make the following contributions in this paper: (1) the introduction of port partitioning, a new approach to constructing multi-layer blocks; (2) the extension of a microarchitectural exploration tool to include the ability to model multilayer blocks and to consider these blocks as alternative implementations of single-layer architectural blocks on the fly, within a single floorplanning run; and (3) the evaluation of vertical integration on a design driver using this framework. For this design driver, we see an average 36 % improvement in performance (measured in BIPS) over a single-layer architecture, and a 29 % improvement in performance over a multi-layer architecture with single-layer blocks. The on-chip temperature is kept below 40 ◦ C.
9B-5 Micro-architecture Pipelining Optimization with Throughput-Aware Floorplanning*
"... Abstract- For modern processor designs in nanometer technologies, both block and interconnect pipelining are needed to achieve multi-gigahertz clock frequency, but previous approaches consider block pipelining and interconnect pipelining separately. For example, all recent works on wire pipelining a ..."
Abstract
- Add to MetaCart
Abstract- For modern processor designs in nanometer technologies, both block and interconnect pipelining are needed to achieve multi-gigahertz clock frequency, but previous approaches consider block pipelining and interconnect pipelining separately. For example, all recent works on wire pipelining assume pre-pipelined components and consider only inserting pipeline stages on point-to-point wire or bus connections. To the best of our knowledge, this paper is the first that considers block pipelining and interconnect pipelining simultaneously. We optimize multiple critical paths or loops in the micro-architecture and insert the pipelines stages optimally in the blocks and wires of these loops to meet the clock frequency requirement. We propose two approaches to this problem. The first approach is based on mixed integer linear programming (MILP) which is theoretically guaranteed to produce the optimal solution, and the second one is an efficient graph-based algorithm that produces near-optimal solutions. Experimental results show that simultaneous block and interconnect pipelining leads to more than 20 % improvement over wire-pipeling alone on the overall processor performance. Moreover, the graph-based approach gives solutions very close to the MILP results ( 2 % more than MILP results on average) but in a much shorter runtime. 1.
Investigating the Effects of Fine-Grain Three-Dimensional Integration on Microarchitecture Design
"... In this article we propose techniques that enable efficient exploration of the 3D design space, where each logical block can span more than one silicon layer. Fine-grain 3D integration provides reduced intrablock wire delay as well as improved power consumption. However, the corresponding power and ..."
Abstract
- Add to MetaCart
In this article we propose techniques that enable efficient exploration of the 3D design space, where each logical block can span more than one silicon layer. Fine-grain 3D integration provides reduced intrablock wire delay as well as improved power consumption. However, the corresponding power and performance advantage is usually underutilized, since various implementations of multilayer blocks require novel physical design and microarchitecture infrastructure to explore 3D microarchitecture design space. We develop a cubic packing engine which can simultaneously optimize physical and architectural design for efficient vertical integration. This technique selects the individual unit designs from a set of single-layer or multilayer implementations to get the best microarchitectural design in terms of performance, temperature, or both. Our experimental results using a design driver of a high-performance superscalar processor show a 36 % performance improvement over traditional 2D for 2–4 layers and 14 % over 3D with single-layer unit implementations. Since thermal characteristics of 3D integrated circuits are among the main challenges, thermal-aware floorplanning and thermal via insertion techniques are employed to keep the peak temperatures below threshold.
to The Graduate School. Date
, 2008
"... This thesis has been read by each member of the following supervisory committee and by majority vote has been found to be satisfactory. Chair: ..."
Abstract
- Add to MetaCart
This thesis has been read by each member of the following supervisory committee and by majority vote has been found to be satisfactory. Chair:
Quantifying and Coping with Parametric Variations in 3D-Stacked Microarchitectures
"... Variability in device characteristics, i.e., parametric variations, is an important problem for shrinking process technologies. They manifest themselves as variations in performance, power consumption, and reduction in reliability in the manufactured chips as well as low yield levels. Their implicat ..."
Abstract
- Add to MetaCart
Variability in device characteristics, i.e., parametric variations, is an important problem for shrinking process technologies. They manifest themselves as variations in performance, power consumption, and reduction in reliability in the manufactured chips as well as low yield levels. Their implications on performance and yield are particularly profound on 3D architectures: a defect on even a single layer can render the entire stack useless. In this paper, we show that instead of causing increased yield losses, we can actually exploit 3D technology to reduce yield losses by intelligently devising the architectures. We take advantage of the layer-to-layer variations to reduce yield losses by splitting critical components among multiple layers. Our results indicate that our proposed method achieves a 30.6 % lower yield loss rate compared to the same pipeline implemented on a 2D architecture.

