A Framework for Exploiting Data and Functional Parallelism on Distributed Memory Multicomputers
, 1994
A Framework for Exploiting Data and Functional Parallelism on Distributed Memory Multicomputers, 1994
Cited by 12
Recent research efforts have shown the benefits of integrating functional and data parallelism over using either pure data parallelism or pure functional parallelism. The work in this paper presents a theoretical framework for deciding on a good execution strategy for a given program based on the available functional and data parallelism in the program. The framework is based on assumptions about the form of computation and communication cost functions for multicomputer systems. We present mathematical functions for these costs and show that these functions are realistic. The framework also requires specification of the available functional and data parallelism for a given problem. For this purpose, we have developed a graphical programming tool. Currently, we have tested our approach using three benchmark programs on the Thinking Machines CM5 and Intel Paragon. Results presented show that the approach is very effective and can provide a two to threefold increase in speedups over ap...
Speeding up Pipelined Circuits through a Combination of Gate Sizing and Clock Skew Optimization
 Proc. Int'l Conf. on ComputerAided Design
, 1995
Speeding up Pipelined Circuits through a Combination of Gate Sizing and Clock Skew Optimization Proc. Int'l Conf. on ComputerAided Design, 1995
Cited by 12
An algorithm for unifying the techniques of gate sizing and clockskew optimization for acyclic pipelines is presented in this paper. In the design of circuits under very tight timing specifications, the area overhead of gate sizing can be considerable. The procedure utilizes the idea of cycleborrowing using clock skew optimization to relax the stringency of the timing specification on the critical stages of the pipeline. Experimental results verify that cycleborrowing using sizing+skew results in a better overall areadelay tradeoff than with sizing alone.
Optimization techniques for highperformance digital circuits
 in Proc. IEEE Int. Conf. ComputerAided Design (ICCAD
, 1997
Optimization techniques for highperformance digital circuits in Proc. IEEE Int. Conf. ComputerAided Design (ICCAD, 1997
Cited by 10
The relentless push for high performance in custom digital circuits has led to renewed emphasis on circuit optimization or tuning. The parameters of the optimization are typically transistor and interconnect sizes. The design metrics are not just delay, transition times, power and area, but also signal integrity and manufacturability. This tutorial paper discusses some of the recently proposed methods of circuit optimization, with an emphasis on practical application and methodology impact. Circuit optimization techniques fall into three broad categories. The rst is dynamic tuning, based on timedomain simulation of the underlying circuit, typically combined with adjoint sensitivity computation. These methods are accurate but require the specication of input signals, and are best applied to small data
ow circuits and \crosssections " of larger circuits. Ecient sensitivity computation renders feasible the tuning of circuits with a few thousand transistors. Second, static tuners employ static timing analysis to evaluate the performance of the circuit. All paths through the logic are simultaneously tuned, and no input vectors are required. Large control macros are best tuned by these methods. However, in the context of deep submicron custom design, the inaccuracy of the delay models employed by these methods often limits their utility. Aggressive dynamic or static tuning can push a circuit into a precipitous corner of the manufacturing process space, which is a problem addressed by the third class of circuit optimization tools, statistical tuners. Statistical techniques are used to enhance manufacturability or maximize yield. In addition to surveying the above techniques, topics such as the use of stateoftheart nonlinear optimization methods and special considerations for interconnect sizing, clock tree optimization and noiseaware tuning will be brie
y considered. 1
Power vs. Delay in Gate Sizing: Conflicting Objectives?
 IN PROCEEDINGS OF THE IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTERAIDED DESIGN
, 1995
Power vs. Delay in Gate Sizing: Conflicting Objectives? IN PROCEEDINGS OF THE IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTERAIDED DESIGN, 1995
Cited by 10
The problem of sizing gates for powerdelay tradeoffs is of great interest to designers. In this work, the theoretical basis for gate sizing under delay and power considerations is presented, and results on a practical implementation are presented. The dynamic power as well as the shortcircuit power are modeled, using notions of delay and transition density, and the optimization problem is formulated using notions of convex programming. Previous approaches have not modeled the short circuit power, and our experimental results show that the incorporation of this leads to counterintuitive results where the minimumpower circuit is not necessarily the minimumsized circuit.
PowerDelay Optimizations in Gate Sizing
, 2000
PowerDelay Optimizations in Gate Sizing, 2000
Cited by 9
The problem of powerdelay tradeoffs in transistor sizing is examined using a nonlinear optimization formulation. Both the dynamic and the shortcircuit power are considered, and a new modeling technique is used to calculate the shortcircuit power. The notion of transition density is used, with an enhancement that considers the effect of gate delays on the transition density. When the shortcircuit power is neglected, the minimum power circuit is identical to the minimum area circuit. However, under our more realistic models, our experimental results on several circuits show that the minimum power circuit is not necessarily the same as the minimum area circuit.
Optimal allocation of local feedback in multistage amplifiers via geometric programming
 IEEE Transactions on Circuits and Systems I
, 2001
Optimal allocation of local feedback in multistage amplifiers via geometric programming IEEE Transactions on Circuits and Systems I, 2001
Cited by 9
We consider the problem of optimally allocating local feedback to the stages of a multistage amplifier. The local feedback gains affect many performance indices for the overall amplifier, such as bandwidth, gain, risetime, delay, output signal swing, linearity, and noise performance, in a complicated and nonlinear fashion, making optimization of the feedback gains a challenging problem. In this paper we show that this problem, though complicated and nonlinear, can be formulated as a special type of optimization problem called geometric programming. Geometric programs can be solved globally and efficiently using recently developed interiorpoint methods. Our method therefore gives a complete solution to the problem of optimally allocating local feedback gains, taking into account a wide variety of constraints. 1 1
Static Powerdriven Voltage Scaling and Delaydriven Buffer Sizing in Mixed Swing QuadRail
 Proc. Intl. Symposium on Low Power Electronics and Design
, 1996
Static Powerdriven Voltage Scaling and Delaydriven Buffer Sizing in Mixed Swing QuadRail Proc. Intl. Symposium on Low Power Electronics and Design, 1996
Cited by 7
This paper describes and explores the design space of a four powersupply rail methodology (called Mixed Swing QuadRail) for performing low voltage logic in a high threshold voltage CMOS fabrication process. Power and delay tradeoffs are studied to suggest approaches for efficient selection of voltage levels and buffer transistor sizes. Posynomial models for QuadRail power and delay are derived to show that at reduced I/O swings (sub1V), both under and oversizing of transistors can lead to steeply increased delays. Transistor sizing techniques are proposed for optimizing delay and energy per logic operation as a function of load capacitance and voltage levels. Experimental results from detailed HSPICE simulations and an AndOrInvert (AOI222) QuadRail test chip fabricated in the HewlettPackard 0.5µm process are presented to support the models and demonstrate significant power reduction compared to static CMOS. 1
An Efficient Technique for Device and Interconnect Optimization in Deep Submicron Designs
 in Proc. Int. Symp. on Physical Design
, 1997
An Efficient Technique for Device and Interconnect Optimization in Deep Submicron Designs in Proc. Int. Symp. on Physical Design, 1997
Cited by 7
In this paper, we formulated a new class of optimization problem, named the general CHposynomial program, which is more general than the simple and boundedvariation CHposynomial programs in [1]. We revealed the general dominance property so that an efficient and unified algorithm based on the local refinement (LR) operation can be used to optimize the simple, boundedvariation and general CHposynomial programs. We applied the LRbased optimization algorithm to solve the device sizing problem using accurate tablebased model, and the wire sizing and spacing problem with consideration of coupling between multiple nets. Both problems are solved in the context of simultaneous device and wire sizing optimization for deep submicron designs. Experiments show that our LRbased optimization algorithm is very effective and extremely efficient. Up to 16.5% delay reduction is observed when compared with previous work based on the simple device model [1], and up to 31% delay reduction and 100x speedup is observed when compared the global interconnect sizing and spacing work [2]. We believe that our general CHposynomial formulation and LRbased algorithm can also be applied to other optimization problems in the CAD field.
Theory and Algorithm of LocalRefinement Based Optimization with Application to Device and Interconnect Sizing
, 1999
Theory and Algorithm of LocalRefinement Based Optimization with Application to Device and Interconnect Sizing, 1999
Cited by 7
In this paper we formulate three classes of optimization problems: the simple, monotonicallyconstrained, and bounded CHprograms. We reveal the dominance property under the local refinement (LR) operation for the simple CHprogram, as well as the general dominance property under the pseudoLR operation for the monotonicallyconstrained CHprogram and the extendedLR operation for the bounded CHprogram. These properties enable a very efficient polynomialtime algorithm, using different types of LR operations to compute tight lower and upper bounds of the exact solution to any CHprogram. We show that the algorithm is capable of solving many layout optimization problems in deep submicron IC and/or highperformance MCM/PCB designs. In particular, we apply...
Theory and Algorithm of LocalRefinementBased Optimization with Application to Device and Interconnect Sizing
 IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems
, 1999
Theory and Algorithm of LocalRefinementBased Optimization with Application to Device and Interconnect Sizing IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, 1999
Cited by 7
In this paper we formulate three classes of optimization problems: the simple, monotonically constrained, and bounded CongHe (CH)programs. We reveal the dominance property under the local refinement (LR) operation for the simple CHprogram, as well as the general dominance property under the pseudoLR operation for the monotonically constrained CHprogram and the extendedLR operation for the bounded CHprogram. These properties enable a very efficient polynomialtime algorithm, using different types of LR operations to compute tight lower and upper bounds of the exact solution to any CHprogram. We show that the algorithm is capable of solving many layout optimization problems in deep submicron iterative circuit and/or highperformance multichip module (MCM) and printed circuit board (PCB) designs. In particular, we apply the algorithm to the simultaneous transistor and interconnect sizing problem, and to the global interconnect sizing and spacing problem considering the coupling cap...