## Broadening the Scope of Multi-Objective Optimizations in Physical Synthesis of Integrated Circuits (2010)

### BibTeX

@MISC{Papa10broadeningthe,

author = {David Anthony Papa},

title = {Broadening the Scope of Multi-Objective Optimizations in Physical Synthesis of Integrated Circuits},

year = {2010}

}

### OpenURL

### Abstract

would not have been possible without the immeasurable self-sacrifice of my perfect wife, Amy. She has worked day and night by my side for years to make our home and family prosperous. I love you very much. Our two beautiful sons George and Victor have brought me indescribable joy and gave me hope for the future when it seemed all was lost. I love you two in ways I never thought possible. I am eternally grateful to her for the faith she has placed in me. I will do everything I can to reward her investment. I am also deeply indebted to her parents Ren Fang Zhang and Yue Xia Gong who have come from their home in China to live with us and help raise our babies. Without them, I don’t know how it would be possible for me to balance graduate school, a full-time job, and a new family. I will be sorry when they return home. My advisor, Professor Igor Markov, has also poured an incredible amount of work into training me to be capable of writing this dissertation. He has defended me when it was not convenient, supported me when it seemed hopeless, and never gave up on me until the task was complete. I am grateful for all of his efforts as well as all of the opportunities and second chances he has given me. I truly hope it has been as worth it for him as it has been

### Citations

440 |
Mattheyses, “A linear-time heuristic for improving network partitions
- Fiduccia, M
- 1982
(Show Context)
Citation Context ...tioning problem. The Multilevel Fiduccia-Mattheyses (MLFM) framework is a well-studied approach to hypergraph partitioning and is presently the dominant technique for large-scale netlist partitioning =-=[36]-=-. It begins with a coarsening phase during which vertices of the hypergraph are merged to form a clustered hypergraph which has fewer vertices, e.g., half as many. The hypergraph is clustered repeated... |

335 |
Retiming Synchronous Circuitry
- Lieserson, Saxe
- 1991
(Show Context)
Citation Context ...fact, show that SPIRE can handle window sizes of thousands of gates by efficiently encoding the problem as an MILP with linearly many constraints in the size of the circuit. Retiming methods based on =-=[65]-=- enforce timing constraints by requiring a register on every path whose delay exceeds a threshold. However, such methods require computationallyexpensive path enumeration within the linear programming... |

194 | GORDIAN: VLSI placement by quadratic programming and slicing optimization - Kleinhans, Sigl, et al. - 1991 |

127 |
K-means-type algorithms: A generalized convergence theorem and characterization of local optimality
- Selim, Ismail
- 1984
(Show Context)
Citation Context ...st be placed close to the LCBs to reduce the clock skew. Therefore, we employ a geometric clustering algorithm called k-means which finds groups of closely-placed latches to be driven by the same LCB =-=[110]-=-. Pseudocode for our algorithm is given in Figure 9.6. To reduce the disruption caused by moving latches close to LCBs, we define a new parameter 190CLUSTER-LATCHES ✄ Input: VLSI Circuit C , Maximum ... |

87 |
Ginneken, “Buffer placement in distributed RC-tree networks for minimal Elmore delay,” ISCAS
- van
- 1990
(Show Context)
Citation Context ...ign at once. Other approaches focus on a handful of gates, and apply more timeconsuming algorithms to relocate several gates at once, increase drive strength, or insert buffers to improve performance =-=[10, 112, 114]-=-. However, these approaches are limited in scope and only near-linear-time algorithms such as wirelength-driven placement can be applied at a truly global scope. For example, the scope of timing-drive... |

82 | Timing-Driven Placement for FPGAs
- Marquardt, Betz, et al.
- 2000
(Show Context)
Citation Context ...ure 5.1: The contributions in this chapter improve the results of the critical path optimization and slack-histogram compression stages of physical synthesis. 5.1 Introduction Timing-driven placement =-=[17, 73, 111]-=- is a critical step in any physical synthesis flow, and has received steadily increased attention in recent years [8]. Due to its computational expense and complexity, several algorithms optimize timi... |

79 | Analytical Placement: A Linear or a Quadratic Objective Function - Sigl, Doll, et al. - 1991 |

74 | Algorithms for Large-Scale Flat Placement, DAC
- Vygen
- 1997
(Show Context)
Citation Context ...CPU designs. Force-directed Placement. The current physical-synthesis methodology used at IBM relies on a quadrisection-based quadratic placement algorithm for high-performance microprocessor designs =-=[119]-=-. This algorithm works by first solving the quadratic program that is typical in analytic placement algorithms, then divides the cells into 4 groups by drawing cutlines to satisfy a density constraint... |

73 | Timing analysis of computer hardware - Hitchcock, Smith, et al. - 1982 |

73 | Timing Driven Placement for Large Standard Cell Circuits,” DAC
- Swartz, Sechen
- 1995
(Show Context)
Citation Context ...ure 5.1: The contributions in this chapter improve the results of the critical path optimization and slack-histogram compression stages of physical synthesis. 5.1 Introduction Timing-driven placement =-=[17, 73, 111]-=- is a critical step in any physical synthesis flow, and has received steadily increased attention in recent years [8]. Due to its computational expense and complexity, several algorithms optimize timi... |

64 | FastPlace: Efficient Analytical Placement Using Cell Shifting, Iterative Local Refinement and a Hybrid Net Model - Viswanathan, Chu |

48 |
Concurrent flip-flop and repeater insertion for high performance integrated circuits
- COCCHINI
(Show Context)
Citation Context ...delay [48]. Indeed, for high-performance ASIC scaling trends, the number of pipeline latches increases by 2.9× at each technology generation, accounting for as much as 10% of the area of 90nm designs =-=[28]-=- and as many as 18% of the gates in 32nm designs [101]. Hence, the proper placement of pipeline latches is a growing problem for timing closure. 23The choice of computational techniques for latch pla... |

40 | Proud: A sea-of-gates placement algorithm - Tsay, Kuh, et al. - 1988 |

39 | A Methodology and Algorithms for Post-Placement Delay Optimization
- Kannan, Suaris, et al.
- 1994
(Show Context)
Citation Context ...of STA and its interfaces with optimization. Mathematically, circuit optimizations often interact with STA by obtaining arrival times and required arrival times at timing points throughout the design =-=[54, 89]-=-. However, running STA on the entire design to evaluate each potential change is impractical. Therefore, STA can be used (i) in batch mode to evaluate the compound impact of many changes, (ii) in incr... |

39 |
A novel net weighting algorithm for timing-driven placement
- Kong
(Show Context)
Citation Context ...s experimental results. Conclusions are drawn in Section 3.7. 3.2 Background Several approaches improve IC performance by modifying wirelength-driven global placement through timing-based net weights =-=[40, 50, 53, 59, 72, 80]-=-. Such algorithms are generally referred to as timing-driven placement, but the literature has not yet considered the impact of buffering on latch placement during global placement. Due to the lack of... |

38 |
Repeater Scaling and its Impact on CAD
- Saxena, Menezes, et al.
(Show Context)
Citation Context ...s). This trend has been so successful that now the greater part of critical path delay is no longer in the transistors that compose logic gates — delay through signal nets and repeaters now dominates =-=[101]-=-. As a result, logic synthesis can no longer estimate design performance effectively without physical information. A relatively recent solution, physical synthesis optimization algorithms employ a com... |

35 |
Method and System for High Speed Detailed Placement of Cells within an Integrated Circuit Design
- Hill
- 2002
(Show Context)
Citation Context ...ture a faithful delay model. Among these deficiencies includes the potential to create cell overlap; although several post-placement legalization techniques have been adopted in academia and industry =-=[15,41]-=-, there is no guarantee that these procedures will preserve improvements made to timing. Other solutions, including the restriction of cell movement to geometrically disjoint bounding boxes [40,69], s... |

26 | Timing-driven Placement Based on Partitioning with Dynamic Cut-net Control
- Ou, Pedram
- 2000
(Show Context)
Citation Context ...s experimental results. Conclusions are drawn in Section 3.7. 3.2 Background Several approaches improve IC performance by modifying wirelength-driven global placement through timing-based net weights =-=[40, 50, 53, 59, 72, 80]-=-. Such algorithms are generally referred to as timing-driven placement, but the literature has not yet considered the impact of buffering on latch placement during global placement. Due to the lack of... |

25 |
RC delay metrics for performance optimization
- Alpert, Devgan, et al.
- 2001
(Show Context)
Citation Context ...osed-form equation for delay. In addition, several technology-independent, closed-form equations for computing RC network delay were shown to have a low error while being relatively easy to implement =-=[9, 11]-=-. Buffered path delay. Buffering has become indispensable in timing closure and cannot be ignored during interconnect delay estimation [6, 29, 101]. Therefore to calculate new locations of movable gat... |

25 | FastPlace 3.0: A fast multilevel quadratic placement algorithm with placement congestion control - Viswanathan, Pan, et al. - 2007 |

24 |
Static Leakage Reduction through Simultaneous Vt/Tox and State Assignment
- Lee, Blaauw
- 2005
(Show Context)
Citation Context ...r instance, cell f is shown to have two possible sizes, indicating different candidate power levels for the gate. Similar assignments can be obtained if considering dual threshold voltage (Vt) levels =-=[63]-=-. As will be demonstrated later, this generalization permits the simultaneous optimization of placement and other transformations, in a similar spirit to [20] but imposing discrete (rather than contin... |

22 | Min-Max Placement for Large-Scale Timing Optimization
- Kahng, Mantik, et al.
(Show Context)
Citation Context ...s experimental results. Conclusions are drawn in Section 3.7. 3.2 Background Several approaches improve IC performance by modifying wirelength-driven global placement through timing-based net weights =-=[40, 50, 53, 59, 72, 80]-=-. Such algorithms are generally referred to as timing-driven placement, but the literature has not yet considered the impact of buffering on latch placement during global placement. Due to the lack of... |

21 |
FLUTE: Fast lookup table based rectilinear Steiner minimal tree algorithm for VLSI design
- Chu, Wong
(Show Context)
Citation Context ...ms have been developed to compute and optimize many different wirelength calculations, including half-perimeter wirelength (HPWL), quadratic net length, rectilinear Steiner-minimal tree (RSMT) length =-=[27,92,106]-=-. At technology nodes larger than 250nm interconnect delay was a negligible fraction of total path delay, and merely minimizing wirelength was suitable for optimizing design performance, but this has ... |

21 |
Algorithms for multilevel logic optimization
- Wang
- 1989
(Show Context)
Citation Context ...inutes following incremental circuit changes on large ASICs [16]. Further extensions to incremental analysis include level-limited and dominance-limited schemes to reduce the amount of work performed =-=[102, 121]-=-. Lazy evaluation [1, 71], in which propagation is delayed until triggered by a relevant query, represents a particularly important improvement in throughput of static timing analysis engines. The boo... |

20 | Kraftwerk2—A Fast ForceDirected Quadratic Placement Approach Using an Accurate Net Model
- Spindler, Schlichtmann, et al.
- 2008
(Show Context)
Citation Context ...ms have been developed to compute and optimize many different wirelength calculations, including half-perimeter wirelength (HPWL), quadratic net length, rectilinear Steiner-minimal tree (RSMT) length =-=[27,92,106]-=-. At technology nodes larger than 250nm interconnect delay was a negligible fraction of total path delay, and merely minimizing wirelength was suitable for optimizing design performance, but this has ... |

19 | Fitted Elmore Delay: A Simple and Accurate Interconnect Delay Model
- Abou-Seido, Nowak, et al.
- 2004
(Show Context)
Citation Context ...is model could be used to efficiently estimate the delay impact of moving a gate during detailed placement. More recent works have improved upon the accuracy of the Elmore delay model. The authors of =-=[2]-=- improve the accuracy of Elmore delay by fitting curves to HSpice data with technology-specific parameters while maintaining a closed-form equation for delay. In addition, several technology-independe... |

19 | PROP: a recursive paradigm for area-efficient and performance oriented partitioning of large FPGA netlists
- Kužnar, Brglez
- 1995
(Show Context)
Citation Context ...s chapter, we design several highly efficient cloning techniques, also known as cell replication techniques, to improve delay along critical paths. Cloning is not a new synthesis optimization; Brglez =-=[60]-=- and Hwang et al. [47] use cloning as a mechanism to reduce net-cut during partitioning, and cloned gate placement has been studied in the FPGA domain [22, 57]. Since cloning helps in reducing the tot... |

18 | Sensitivity guided net weighting for placement driven synthesis - Ren, Pan, et al. - 2004 |

17 | Seeing the forest and the trees: Steiner wirelength optimization in placement
- Roy, Markov
- 2007
(Show Context)
Citation Context ...ms have been developed to compute and optimize many different wirelength calculations, including half-perimeter wirelength (HPWL), quadratic net length, rectilinear Steiner-minimal tree (RSMT) length =-=[27,92,106]-=-. At technology nodes larger than 250nm interconnect delay was a negligible fraction of total path delay, and merely minimizing wirelength was suitable for optimizing design performance, but this has ... |

16 | Incremental placement for timing optimization
- Choi, Bazargan
- 2003
(Show Context)
Citation Context ...ationship between pin-to-pin wire delay and Manhattan distance is quadratic rather than linear, the inaccuracy of this linear model has been addressed in various ways. For instance, Choi and Bazargan =-=[24]-=- consider an objective function that minimizes total cell displacement to prevent cases where large cell movement invalidates the linear model. The model of Wang et al. [122] assumes that LP-based opt... |

14 | Complexity analysis and speedup techniques for optimal buffer insertion with minimum cost
- Shi, Li, et al.
- 2005
(Show Context)
Citation Context ...1 and S2. This example suggests that one must consider buffering and cloning together to effectively reduce delay. Timing-driven buffering alone can be computationally expensive when used excessively =-=[103]-=-. It is also difficult to use it to derive any guidance for simultaneous cloning and buffering. To be most accurate, one should explore all possible partitionings of sinks for each net, find gate plac... |

12 | Making fast buffer insertion even faster via approximation techniques
- Li, Sze, et al.
- 2005
(Show Context)
Citation Context ...verters for the buffer insertion. We implemented four different optimizations including cloning as follows, to show the benefit of our techniques. They are • Buffering: Timing-driven buffer insertion =-=[67]-=-. This optimization is treated as the baseline to which all other optimizations are compared. • RUMBLE: Moving the original gate and rebuffering as described in Chapter III. • Clone1: Our cloning algo... |

11 | Simultaneous gate sizing and placement
- Chen, Hsieh, et al.
(Show Context)
Citation Context ... design parameters such as gate sizes and placement simultaneously requires an approach that accounts for decisions with finitely-many alternatives, since solutions produced by continuous gate-sizing =-=[20]-=- may degrade unacceptably when mapped to a standard cell library. Such continuous-to-discrete mappings present challenges for any of the aforementioned mathematical programming approaches. In this cha... |

11 |
et al., “Statistical timing for parametric yield prediction fo digital integrated circuits,” Design Automation Conference
- Jess
- 2003
(Show Context)
Citation Context ..., and critical path optimization (through buffering and gate sizing) are completed [7]. We use an industrial timing analysis tool to obtain initial conditions for AATs and RATs throughout the circuit =-=[51]-=-. Our experiments were conducted on an 8-core system with 2.8 GHz AMD Opteron 854 CPUs and 80 GB of memory. Our MILPs were solved with ILOG CPLEX 12.1 configured to use up to 8 cores in parallel. 156... |

10 | Post-Layout Timing-Driven Cell Placement Using an Accurate Net Length Model with Movable Steiner Points
- Ajami, Pedram
- 2001
(Show Context)
Citation Context ...Cpinj ) (V.5) The delay between gates on higher degree nets may be obtained by querying a full-blown industrial timing engine, reconstructing Steiner trees from scratch [18] or via topological repair =-=[4]-=-, or instead by cheaper methods of estimation [9]. The disjunctive timing graph. In the previous paragraphs, we identified the three major components in our formulation of incremental timing-driven pl... |

10 | A performance-driven standard-cell placer based on a modified force-directed algorithm - Chou, Lin - 2001 |

10 | Physical placement driven by sequential timing analysis
- Hurst, Chong, et al.
- 2004
(Show Context)
Citation Context ...ring and logic cloning. A further opportunity is to perform global placement so as to increase the potential for such improvements. This potential is expressed by the metric known as sequential slack =-=[46]-=-. Optimizing sequential slack during placement can provide improved opportunities for clock skew scheduling and retiming, and thus further broadens the scope of physical synthesis optimization. We exp... |

10 |
Utilizing the retiming skew equivalence in a practical algorithm for retiming large circuits
- Sapatnekar, Deokar
- 1996
(Show Context)
Citation Context ...ct to ∀(u, v) ∈ E, r(u) − r(v) ≤ w(u, v) ∀(u, v) ∈ E|D(u, v) > P, r(u) − r(v) ≤ W(u, v) − 1 Figure 7.5: An LP for min-area, period-constrained retiming. Prior work in retiming also includes the ASTRA =-=[99]-=- algorithm, which is a faster approach. It relates the problem of clock skew optimization at each flip-flop to a retiming solution for minimum-period retiming, and uses the Bellman Ford algorithm to d... |

10 | Timing Driven Gate Duplication
- Srivastava, Kastner, et al.
- 2000
(Show Context)
Citation Context ...ad-dependent gate delay model and zero-wire delay is known to be NP-complete [107]. Under the same delay model, a cloning in sink-to-source order can improve the timing of a technology-mapped circuit =-=[108]-=-. Due to the computational complexity of the problem, heuristics are often proposed to speed up the technique. However, all of these techniques neglect two key features of the problem: interconnect de... |

9 |
Alpert et al., “Techniques for Fast Physical Synthesis
- J
(Show Context)
Citation Context ... design. We operate on these benchmarks after timing-driven synthesis, timing-driven placement, electrical correction, and critical path optimization (through buffering and gate sizing) are completed =-=[7]-=-. We use an industrial timing analysis tool to obtain initial conditions for AATs and RATs throughout the circuit [51]. Our experiments were conducted on an 8-core system with 2.8 GHz AMD Opteron 854 ... |

9 | New Adaptive Multistart Techniques for - Boese, Kahng, et al. - 1994 |

9 | FastPlace 2.0: An Efficient Analytical Placer for Mixed-Mode Designs - Viswanathan, Pan, et al. - 2006 |

9 | Deriving a new efficient algorithm for min-period retiming
- Zhou
- 2005
(Show Context)
Citation Context ...s the problem of clock skew optimization at each flip-flop to a retiming solution for minimum-period retiming, and uses the Bellman Ford algorithm to derive the longest path. Recently, the authors of =-=[123]-=- used program derivation to automatically generate an algorithm for min-period retiming. Retiming was also explored for slack budgeting and power minimization for FPGAs [45]. 142Challenges in min-per... |

8 | A New LP Based Incremental Timing Driven Placement for High Performance Designs
- Luo
- 2006
(Show Context)
Citation Context ...ng stage, our subcircuit selection algorithm must address the presence of buffers. These buffers will 2 Variations on this theme, such as metrics that incorporate the degree of neighbors’ criticality =-=[69, 122]-=- and the size of the subcircuit bounding box are also possible. 42AAT = +20 Clock Period = 20 RAT = 0 Clock Period = 20 Delay = +10 Clock Period = 20 Delay = +10 AAT = +11 RAT = +1 AAT = +15 RAT +19 ... |

6 | An LP-based methodology for improved timing-driven placement
- Wang, Lillis, et al.
- 2005
(Show Context)
Citation Context ...al synthesis flow. While no previous work has attempted to solve this particular problem, other solutions do exist that may be able to help with the placement of poorly placed latches. The authors of =-=[122]-=- propose a linear programming formulation that minimizes downstream delay to choose locations for gates in field-programmable gate arrays (FPGAs). The authors of [26] model static timing analysis (STA... |

5 |
et al., “Zero Skew Clock Routing with Minimum Wirelength
- Chao
- 1992
(Show Context)
Citation Context ...d K(F2). Repeating this procedure for all fanins, we can find the final K(F). This bottom-up merging process is very similar to the Deferred-Merge Embedding (DME) algorithm in clock tree construction =-=[19]-=- though the goal there is to get a zero skew arc. With a similar procedure to the one shown in [19], it is not hard to prove that K(F) is always a Manhattan Arc or a single point, and our merging proc... |

5 | Enhancing timing-driven FPGA placement for pipelined netlists
- Eguro, Hauck
- 2008
(Show Context)
Citation Context ...d on temporal logic is proposed in [76], along with an algorithm to efficiently retrieve answers to those queries. Algorithms for incremental timing analysis [105] and incremental criticality updates =-=[35]-=- have been proposed in the context of FPGAs. The authors of [30] explore an extension of static timing analysis to model coupling, and exploit circuit structure to determine an effective node ordering... |

5 |
Slack in static timing analysis
- Vygen
(Show Context)
Citation Context ...when the transformation is designed and tested. In particular, differences in slew rate can greatly affect timing for the whole path in ways that are difficult to predict while only considering slack =-=[120]-=-. For this reason, timing analysis tools support a mode to limit slew rate propagation to a constant number of levels. This mode provides a convenient way to limit the scope of timing changes and impr... |

4 |
Closed-form delay and slew metrics made easy,” Computer-Aided Design of Integrated Circuits and Systems
- Alpert, Liu, et al.
- 2004
(Show Context)
Citation Context ...osed-form equation for delay. In addition, several technology-independent, closed-form equations for computing RC network delay were shown to have a low error while being relatively easy to implement =-=[9, 11]-=-. Buffered path delay. Buffering has become indispensable in timing closure and cannot be ignored during interconnect delay estimation [6, 29, 101]. Therefore to calculate new locations of movable gat... |

4 |
Almost Optimum Placement Legalization by
- Brenner, Pauli, et al.
(Show Context)
Citation Context ...ture a faithful delay model. Among these deficiencies includes the potential to create cell overlap; although several post-placement legalization techniques have been adopted in academia and industry =-=[15,41]-=-, there is no guarantee that these procedures will preserve improvements made to timing. Other solutions, including the restriction of cell movement to geometrically disjoint bounding boxes [40,69], s... |