Results 1  10
of
21
YieldAware Cache Architectures
 In Proceedings of the 39th International Symposium on Microarchitecture
, 2006
"... One of the major issues faced by the semiconductor industry today is that of reducing chip yields. As the process technologies have scaled to smaller feature sizes, chip yields have dropped to around 50 % or less. This figure is expected to decrease even further in future technologies. To attack thi ..."
Abstract

Cited by 30 (5 self)
 Add to MetaCart
One of the major issues faced by the semiconductor industry today is that of reducing chip yields. As the process technologies have scaled to smaller feature sizes, chip yields have dropped to around 50 % or less. This figure is expected to decrease even further in future technologies. To attack this growing problem, we develop four yieldaware microarchitecture schemes for data caches. The first one is called YieldAware PowerDown (YAPD). YAPD turns off cache ways that cause delay violation and/or have excessive leakage. We also modify this approach to achieve better yields. This new method is called Horizontal YAPD (HYAPD), which turns off horizontal regions of the cache instead of ways. A third approach targets delay violation in data caches. Particularly, we develop a VAriablelatency Cache Architecture (VACA). VACA allows different load accesses to be completed with varying latencies. This is enabled by augmenting the functional units with special buffers that allow the dependants of a load operation to stall for a cycle if the load operation is delayed. As a result, if some accesses take longer than the predefined number of cycles, the execution can still be performed correctly, albeit with some performance degradation. A fourth scheme we devise is called the Hybrid mechanism, which combines the YAPD and the VACA. As a result of these schemes, chips that may be tossed away due to parametric yield loss can be saved. Experimental results demonstrate that the yield losses can be reduced by 68.1 % and 72.4 % with YAPD and HYAPD schemes and by 33.3 % and 81.1 % with VACA and Hybrid mechanisms, respectively, improving the overall yield to as much as 97.0%. 1.
Variability Driven Gate Sizing for Binning Yield Optimization
 IN PROCEEDINGS OF ACM/IEEE DESIGN AUTOMATION CONFERENCE
, 2006
"... Process variations result in a considerable spread in the frequency of the fabricated chips. In high performance applications, those chips that fail to meet the nominal frequency after fabrication are either discarded or sold at a loss which is typically proportional to the degree of timing violatio ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Process variations result in a considerable spread in the frequency of the fabricated chips. In high performance applications, those chips that fail to meet the nominal frequency after fabrication are either discarded or sold at a loss which is typically proportional to the degree of timing violation. The latter is called binning. In this paper we present a gate sizingbased algorithm that optimally minimizes the binning yieldloss. We make the following contributions: 1) prove the binning yield function to be convex, 2) do not make any assumptions about the sources of variability, and their distribution model, 3) we integrate our strategy with statistical timing analysis tools (STA), without making any assumptions about how STA is done, 4) if the objective is to optimize the traditional yield (and not binning yield) our approach can still optimize the same to a very large extent. Comparison of our approach with sensitivitybased approaches under fabrication variability shows an improvement of on average 72 % in the binning yieldloss with an area overhead of an average 6%, while achieving a 2.69 times speedup under a stringent timing constraint. Moreover we show that a worstcase deterministic approach fails to generate a solution for certain delay constraints. We also show that optimizing the binning yieldloss minimizes the traditional yieldloss with a 61 % improvement from a sensitivitybased approach.
Quantifying the Impact of Process Variability on Microprocessor Behavior
 In 2nd Workshop on Architectural Reliability
, 2006
"... Abstract—Architects and chip makers are worried about the impact of increasing CMOS process variability. This variability can impact a processor’s performance and, depending on how aggressively the design is pushed, its reliability. We perform the first quantitative analysis of the impact of process ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Abstract—Architects and chip makers are worried about the impact of increasing CMOS process variability. This variability can impact a processor’s performance and, depending on how aggressively the design is pushed, its reliability. We perform the first quantitative analysis of the impact of process variability on an RTLlevel specification of a microprocessor core. For each pipeline stage, we compute the expected latency, as well as the standard deviation of this latency. We show that with even modest amounts of process variability, the impact on performance can be significant, and this impact can increase when using dynamic voltage scaling. I.
Efficient Decoupling Capacitance Budgeting Considering Operation and Process Variations
"... Abstract — This paper solves the variationaware onchip decoupling capacitance (decap) budgeting problem. Unlike previous work assuming the worstcase current load, we develop a novel stochastic current model, which efficiently and accurately captures operation variation such as temporal correlatio ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
Abstract — This paper solves the variationaware onchip decoupling capacitance (decap) budgeting problem. Unlike previous work assuming the worstcase current load, we develop a novel stochastic current model, which efficiently and accurately captures operation variation such as temporal correlation between clock cycles and logicinduced correlation between ports. The models also considers current variation due to process variation with spatial correlation. We then propose an iterative alternative programming algorithm to solve the decap budgeting problem under the stochastic current model. Experiments using industrial examples show that compared with the baseline model which assumes maximum currents at all ports and under the same decap area constraint, the model considering temporal correlation reduces the noise by up to 5×, and the model considering both temporal and logicinduced correlations reduces the noise by up to 17×. Compared with the model using deterministic process parameters, considering process variation (Leff variation in this paper) reduces the mean noise by up to 4× and the 3σ noise by up to 13×. While the existing stochastic optimization has been used mainly for process variation purpose, this paper to the best of our knowledge is the first indepth study on stochastic optimization taking into account both operation and process variations for power network design. We convincingly show that considering operation variation is highly beneficial for power integrity optimization and this should be researched for optimizing signal and thermal integrity as well. I.
Statistical timing yield optimization by gate sizing
 TCAD
, 2006
"... Abstract—In this paper, we propose a statistical gate sizing approach to maximize the timing yield of a given circuit, under area constraints. Our approach involves statistical gate delay modeling, statistical static timing analysis, and gate sizing. Experiments performed in an industrial framework ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Abstract—In this paper, we propose a statistical gate sizing approach to maximize the timing yield of a given circuit, under area constraints. Our approach involves statistical gate delay modeling, statistical static timing analysis, and gate sizing. Experiments performed in an industrial framework on combinational International Symposium on Circuits and Systems (ISCAS’85) and Microelectronics Center of North Carolina (MCNC) benchmarks show absolute timing yield gains of 30 % on the average, over deterministic timing optimization for at most 10 % area penalty. It is further shown that circuits optimized using our metric have larger timing yields than the same optimized using a worst case metric, for isoarea solutions. Finally, we present an insight into statistical properties of gate delays for a commercial 0.13 m technology library which intuitively provides one reason why statistical timing driven optimization does better than deterministic timing driven optimization. Index Terms—Gate sizing, optimization, statistical gate delay modeling, statistical timing analysis, timing yield, variability, VLSI. I.
Fast mincost buffer insertion under process variations
 In Proc. of the Design Automation Conf
, 2007
"... Process variation has become a critical problem in modern VLSI fabrication. In the presence of process variation, buffer insertion problem under performance constraints becomes more difficult since the solution space expands greatly. We propose efficient dynamic programming approaches to handle the ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Process variation has become a critical problem in modern VLSI fabrication. In the presence of process variation, buffer insertion problem under performance constraints becomes more difficult since the solution space expands greatly. We propose efficient dynamic programming approaches to handle the mincost buffer insertion under process variations. Our approaches handle delay constraints and slew constraints, in trees and in combinational circuits. The experimental results demonstrate that in general, process variations have great impact on slewconstrained buffering, but much less impact on delayconstrained buffering, especially for small nets. Our approaches have less than 9 % runtime overhead on average compared with a single pass of deterministic buffering for delay constrained buffering, and get 56 % yield improvement and 11.8 % buffer area reduction, on average, for slew constrained buffering.
Mitigating the effects of process variations: Architectural approaches for improving batch performance
 In Workshop on Architectural Support for Gigascale Integration (ASGI
, 2007
"... As transistor feature sizes continue to shrink into the sub90nm range and beyond, the effects of process variations on critical path delay have amplified. A common concept to remedy the effects of variation is speedbinning, by which chips from a single batch are rated by a discrete range of freque ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
As transistor feature sizes continue to shrink into the sub90nm range and beyond, the effects of process variations on critical path delay have amplified. A common concept to remedy the effects of variation is speedbinning, by which chips from a single batch are rated by a discrete range of frequencies. In this paper, we argue that under these conditions, architectural optimizations should consider their effect on the “batch ” of microprocessors rather than aiming at increasing the performance of a single processor. We first show that the critical paths are mostly determined by the level 1 data caches on a set of manufactured microprocessors. Then, we propose three new microarchitectural techniques aimed at masking the effects of process variations on level 1 caches. The first two techniques allow individual highlatency cache lines spanning single or multiple sets to be disabled at the postmanufacture testing stage. The third approach introduces a small substitute cache associated with each cache way to replicate the data elements stored in the high latency lines. Our new schemes can be effectively used to boost up the overall chip yield and also shift the chip binning distribution towards higher frequencies. To make a quantitative comparison between the different schemes, we first define a metric called batchperformance that takes into account the chip yield and frequency of chips in each bin. We then analyze our proposed schemes and show that the resizing schemes and the substitute cache can increase the batchperformance by as much as 5.8 % and 11.6%, respectively.
Statistical Timing Analysis With Coupling
"... Abstract—As technology scales to smaller dimensions, increasing process variations and coupling induced delay variations make timing verification extremely challenging. In this paper, the authors establish a theoretical framework for statistical timing analysis with coupling. They prove the converge ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract—As technology scales to smaller dimensions, increasing process variations and coupling induced delay variations make timing verification extremely challenging. In this paper, the authors establish a theoretical framework for statistical timing analysis with coupling. They prove the convergence of their proposed iterative approach and discuss implementation issues under the assumption of a Gaussian distribution for the parameters of variation. A statistical timer based on their proposed approach is developed and experimental results are presented for the International Symposium on Circuits and Systems benchmarks. They juxtapose their timer with a single pass, noniterative statistical timer that does not consider the mutual dependence of coupling with timing, and another statistical timer that handles coupling deterministically. Monte Carlo simulations reveal a distinct gain (up to 24%) in accuracy by their approach in comparison to the others mentioned. Index Terms—Coupling, fixpoint computation, statistical timing analysis, variability, very large scale integration (VLSI). I.
Robust Gate Sizing via Mean Excess Delay Minimization
"... We introduce mean excess delay as a statistical measure of circuit delay in the presence of parameter variations. The βmean excess delay is defined as the expected delay of the circuits that exceed the βquantile of the delay, so it is always an upper bound on the βquantile. However, in contrast t ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We introduce mean excess delay as a statistical measure of circuit delay in the presence of parameter variations. The βmean excess delay is defined as the expected delay of the circuits that exceed the βquantile of the delay, so it is always an upper bound on the βquantile. However, in contrast to the βquantile, it preserves the convexity properties of the underlying delay distribution. We apply the βmean excess delay to the circuit sizing problem, and use it to minimize the delay quantile over the gate sizes. We use the Analytic Centering Cutting Plane Method to perform the minimization and apply this sizing to the ISCAS ‘85 benchmarks. Depending on the structure of the circuit, it can make significant improvements on the 95%quantile.
On the Futility of Statistical Power Optimization
"... In response to the increasing variations in integratedcircuit manufacturing, the current trend is to create designs that take these variations into account statistically. In this paper we try to quantify the difference between the statistical and deterministic optima of leakage power while making n ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
In response to the increasing variations in integratedcircuit manufacturing, the current trend is to create designs that take these variations into account statistically. In this paper we try to quantify the difference between the statistical and deterministic optima of leakage power while making no assumptions about the delay model. We develop a framework for deriving a theoretical upperbound on the suboptimality that is incurred by using the deterministic optimum as an approximation for the statistical optimum. On average, the bound is 2.4 % for a suite of benchmark circuits in a 45nm technology. We further give an intuitive explanation and show, by using solution rank orders, that the practical suboptimality gap is much lower. Therefore, the need for statistical power modeling for the purpose of optimization is questionable. I.