Results 1  10
of
29
YieldAware Cache Architectures
 In Proceedings of the 39th International Symposium on Microarchitecture
, 2006
"... One of the major issues faced by the semiconductor industry today is that of reducing chip yields. As the process technologies have scaled to smaller feature sizes, chip yields have dropped to around 50 % or less. This figure is expected to decrease even further in future technologies. To attack thi ..."
Abstract

Cited by 30 (5 self)
 Add to MetaCart
One of the major issues faced by the semiconductor industry today is that of reducing chip yields. As the process technologies have scaled to smaller feature sizes, chip yields have dropped to around 50 % or less. This figure is expected to decrease even further in future technologies. To attack this growing problem, we develop four yieldaware microarchitecture schemes for data caches. The first one is called YieldAware PowerDown (YAPD). YAPD turns off cache ways that cause delay violation and/or have excessive leakage. We also modify this approach to achieve better yields. This new method is called Horizontal YAPD (HYAPD), which turns off horizontal regions of the cache instead of ways. A third approach targets delay violation in data caches. Particularly, we develop a VAriablelatency Cache Architecture (VACA). VACA allows different load accesses to be completed with varying latencies. This is enabled by augmenting the functional units with special buffers that allow the dependants of a load operation to stall for a cycle if the load operation is delayed. As a result, if some accesses take longer than the predefined number of cycles, the execution can still be performed correctly, albeit with some performance degradation. A fourth scheme we devise is called the Hybrid mechanism, which combines the YAPD and the VACA. As a result of these schemes, chips that may be tossed away due to parametric yield loss can be saved. Experimental results demonstrate that the yield losses can be reduced by 68.1 % and 72.4 % with YAPD and HYAPD schemes and by 33.3 % and 81.1 % with VACA and Hybrid mechanisms, respectively, improving the overall yield to as much as 97.0%. 1.
Variability Driven Gate Sizing for Binning Yield Optimization
 IN PROCEEDINGS OF ACM/IEEE DESIGN AUTOMATION CONFERENCE
, 2006
"... Process variations result in a considerable spread in the frequency of the fabricated chips. In high performance applications, those chips that fail to meet the nominal frequency after fabrication are either discarded or sold at a loss which is typically proportional to the degree of timing violatio ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Process variations result in a considerable spread in the frequency of the fabricated chips. In high performance applications, those chips that fail to meet the nominal frequency after fabrication are either discarded or sold at a loss which is typically proportional to the degree of timing violation. The latter is called binning. In this paper we present a gate sizingbased algorithm that optimally minimizes the binning yieldloss. We make the following contributions: 1) prove the binning yield function to be convex, 2) do not make any assumptions about the sources of variability, and their distribution model, 3) we integrate our strategy with statistical timing analysis tools (STA), without making any assumptions about how STA is done, 4) if the objective is to optimize the traditional yield (and not binning yield) our approach can still optimize the same to a very large extent. Comparison of our approach with sensitivitybased approaches under fabrication variability shows an improvement of on average 72 % in the binning yieldloss with an area overhead of an average 6%, while achieving a 2.69 times speedup under a stringent timing constraint. Moreover we show that a worstcase deterministic approach fails to generate a solution for certain delay constraints. We also show that optimizing the binning yieldloss minimizes the traditional yieldloss with a 61 % improvement from a sensitivitybased approach.
FPGA performance optimization via chipwise placement considering process variations
 in International Conference on FieldProgrammable Logic and Applications
, 2006
"... Both custom IC and FPGA designs in the nanometer regime suffer from process variations. But different from custom ICs, FPGAs ’ programmability offers a unique design freedom to leverage process variation and improve circuit performance. We propose the following variation aware chipwise placement flo ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Both custom IC and FPGA designs in the nanometer regime suffer from process variations. But different from custom ICs, FPGAs ’ programmability offers a unique design freedom to leverage process variation and improve circuit performance. We propose the following variation aware chipwise placement flow in this paper. First, we obtain the variation map for each chip by synthesizing the test circuits for each chip as a preprocessing step before detailed placement. Then we use the tracebased method to estimate the performance gain achievable by chipwise placement. Such estimation provides a lower bound of the performance gain without detailed placement. Finally, if the gain is significant, a variation aware chipwise placement is used to place the circuits according to the variation map for each chip. Our experimental results show that, compared to the existing FPGA placement, variation aware chipwise placement improves circuit performance by up to 19.3 % for the tested variation maps. 1.
Comparative analysis of conventional and statistical design techniques
 in Proceedings of the 44th annual conference on Design automation
, 2007
"... Abstract — We explore the power benefits of changing a microprocessor path histogram through circuit sizing based on statistical timing analysis and optimization (STAO) versus a deterministic timing approach that uses statistical design to establish a global guardband followed by conventional optimi ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Abstract — We explore the power benefits of changing a microprocessor path histogram through circuit sizing based on statistical timing analysis and optimization (STAO) versus a deterministic timing approach that uses statistical design to establish a global guardband followed by conventional optimization (SDGG). Using an analytical modeling approach, we quantify the differences in total power between the two approaches while maintaining an equivalent performance distribution. For a relative 1σ random WID stage delay variation of 5 % and representative microprocessor critical paths, the analysis indicates that the STAO approach enables ∼2 % power reduction over the SDGG approach. To achieve a 4 % and 6 % power reduction through the STAO approach, the process variation needs to increase by a factor of 2x and 4x, respectively.
From finance to flip flops: a study of fast quasiMonte Carlo methods from computational finance applied to statistical circuit analysis
 IEEE ISQED
, 2007
"... Problems in computational finance share many of the characteristics that challenge us in statistical circuit analysis: high dimensionality, profound nonlinearity, stringent accuracy requirements, and expensive sample simulation. We offer a detailed experimental study of how one celebrated technique ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Problems in computational finance share many of the characteristics that challenge us in statistical circuit analysis: high dimensionality, profound nonlinearity, stringent accuracy requirements, and expensive sample simulation. We offer a detailed experimental study of how one celebrated technique from this domain QuasiMonte Carlo (QMC) analysis can be used for fast statistical circuit analysis. In contrast with traditional pseudorandom Monte Carlo sampling, QMC substitutes a (shorter) sequence of deterministically chosen sample points. Across a set of digital and analog circuits, in 90nm and 45nm technologies, varying in size from 30 to 400 devices, we obtain speedups in parametric yield estimation from 2X to 50X. 1.
Efficient Decoupling Capacitance Budgeting Considering Operation and Process Variations
"... Abstract — This paper solves the variationaware onchip decoupling capacitance (decap) budgeting problem. Unlike previous work assuming the worstcase current load, we develop a novel stochastic current model, which efficiently and accurately captures operation variation such as temporal correlatio ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
Abstract — This paper solves the variationaware onchip decoupling capacitance (decap) budgeting problem. Unlike previous work assuming the worstcase current load, we develop a novel stochastic current model, which efficiently and accurately captures operation variation such as temporal correlation between clock cycles and logicinduced correlation between ports. The models also considers current variation due to process variation with spatial correlation. We then propose an iterative alternative programming algorithm to solve the decap budgeting problem under the stochastic current model. Experiments using industrial examples show that compared with the baseline model which assumes maximum currents at all ports and under the same decap area constraint, the model considering temporal correlation reduces the noise by up to 5×, and the model considering both temporal and logicinduced correlations reduces the noise by up to 17×. Compared with the model using deterministic process parameters, considering process variation (Leff variation in this paper) reduces the mean noise by up to 4× and the 3σ noise by up to 13×. While the existing stochastic optimization has been used mainly for process variation purpose, this paper to the best of our knowledge is the first indepth study on stochastic optimization taking into account both operation and process variations for power network design. We convincingly show that considering operation variation is highly beneficial for power integrity optimization and this should be researched for optimizing signal and thermal integrity as well. I.
Statistical DualVdd assignment for FPGA interconnect power reduction
 in Proc. Design Automation and Test in Europe
, 2007
"... Field programmable dualVdd interconnects are effective to reduce FPGA power. However, the deterministic Vdd assignment leverages timing slack exhaustively and significantly increases the number of nearcritical paths, which results in a degraded timing yield with process variation. In this paper, w ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Field programmable dualVdd interconnects are effective to reduce FPGA power. However, the deterministic Vdd assignment leverages timing slack exhaustively and significantly increases the number of nearcritical paths, which results in a degraded timing yield with process variation. In this paper, we present two statistical Vdd assignment algorithms. The first greedy algorithm is based on sensitivity while the second one is based on timing slack budgeting. Both minimize chiplevel interconnect power without degrading timing yield. Evaluated with MCNC circuits, the statistical algorithms reduce interconnect power by 40 % compared to the singleVdd FPGA with power gating. In contrast, the deterministic algorithm reduces interconnect power by 51 % but degrades timing yield from 97.7 % to 87.5%. 1.
On the Futility of Statistical Power Optimization
"... In response to the increasing variations in integratedcircuit manufacturing, the current trend is to create designs that take these variations into account statistically. In this paper we try to quantify the difference between the statistical and deterministic optima of leakage power while making n ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
In response to the increasing variations in integratedcircuit manufacturing, the current trend is to create designs that take these variations into account statistically. In this paper we try to quantify the difference between the statistical and deterministic optima of leakage power while making no assumptions about the delay model. We develop a framework for deriving a theoretical upperbound on the suboptimality that is incurred by using the deterministic optimum as an approximation for the statistical optimum. On average, the bound is 2.4 % for a suite of benchmark circuits in a 45nm technology. We further give an intuitive explanation and show, by using solution rank orders, that the practical suboptimality gap is much lower. Therefore, the need for statistical power modeling for the purpose of optimization is questionable. I.
Novel Algorithms for Fast Statistical Analysis of Scaled Circuits
, 2007
"... As VLSI technology moves to the nanometer scale for transistor feature sizes, the impact of manufacturing imperfections result in large variations in the circuit performance. Traditional CAD tools are not wellequipped to handle this scenario, since they do not model this statistical nature of the c ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
As VLSI technology moves to the nanometer scale for transistor feature sizes, the impact of manufacturing imperfections result in large variations in the circuit performance. Traditional CAD tools are not wellequipped to handle this scenario, since they do not model this statistical nature of the circuit parameters and performances, or if they do, the existing techniques tend to be oversimplified or intractably slow. We draw upon ideas for attacking parallel problems in other technical fields, such as computational finance, machine learning and hydrology, and synthesize them with innovative attacks for our problem domain of integrated circuits, to develop novel solutions to problems of efficient statistical analysis of circuits in the nanometer regime. In particular, this thesis makes three contributions: 1) SiLVR, a nonlinear response surface modeling (RSM) and performancedriven dimensionality reduction strategy, that uses the concepts of projection pursuit and latent variable regression to obtain an absolute improvement in modeling error of up to 34% over the best quadratic RSM method. SiLVR also captures the designer’s insight into the circuit behavior, by automatically extracting quantitative measures of relative