Results 1 - 10
of
82
Performance optimization of VLSI interconnect layout
- Integration, the VLSI Journal
, 1996
"... This paper presents a comprehensive survey of existing techniques for interconnect optimization during the VLSI physical design process, with emphasis on recent studies on interconnect design and optimization for high-performance VLSI circuit design under the deep submicron fabrication technologies. ..."
Abstract
-
Cited by 90 (32 self)
- Add to MetaCart
This paper presents a comprehensive survey of existing techniques for interconnect optimization during the VLSI physical design process, with emphasis on recent studies on interconnect design and optimization for high-performance VLSI circuit design under the deep submicron fabrication technologies. First, we present a number of interconnect delay models and driver/gate delay models of various degrees of accuracy and efficiency which are most useful to guide the circuit design and interconnect optimization process. Then, we classify the existing work on optimization of VLSI interconnect into the following three categories and discuss the results in each category in detail: (i) topology optimization for highperformance interconnects, including the algorithms for total wire length minimization, critical path length minimization, and delay minimization; (ii) device and interconnect sizing, including techniques for efficient driver, gate, and transistor sizing, optimal wire sizing, and simultaneous topology construction, buffer insertion, buffer and wire sizing; (iii) highperfbrmance clock routing, including abstract clock net topology generation and embedding, planar clock routing, buffer and wire sizing for clock nets, non-tree clock routing, and clock schedule optimization. For each method, we discuss its effectiveness, its advantages and limitations, as well as its computational efficiency. We group the related techniques according to either their optimization techniques or optimization objectives so that the reader can easily compare the quality and efficiency of different solutions.
Zero Skew Clock Routing With Minimum Wirelength
, 1992
"... In the design of high performance VLSI systems, minimization of clock skew is an increasingly important objective. Additionally, wirelength of clock routing trees should be minimized in order to reduce system power requirements and deformation of the clock pulse at the synchronizing elements of the ..."
Abstract
-
Cited by 64 (12 self)
- Add to MetaCart
In the design of high performance VLSI systems, minimization of clock skew is an increasingly important objective. Additionally, wirelength of clock routing trees should be minimized in order to reduce system power requirements and deformation of the clock pulse at the synchronizing elements of the system. In this paper, we first present the Deferred-Merge Embedding (DME) algorithm, which embeds any given connection topology to create a clock tree with zero skew while minimizing total wirelength. The algorithm always yields exact zero skew trees with respect to the appropriate delay model. Experimental results show an 8% to 15% wirelength reduction over previous constructions in [17] [18]. The DME algorithm may be applied to either the Elmore or linear delay model, and yields optimal total wirelength for linear delay. DME is a very fast algorithm, running in time linear in the number of synchronizing elements. We also present a unified BB+DME algorithm, which constructs a clock tree t...
Optimizing Two-Phase, Level-Clocked Circuitry (Extended Abstract)
"... We investigate two strategies for reducing the clock period of a two-phase, levelclocked circuit: clock tuning, which adjusts the waveforms that clock the circuit, and retiming, which relocates circuit latches. These methods can be used to convert a circuit with edge-triggered latches into a faster ..."
Abstract
-
Cited by 53 (16 self)
- Add to MetaCart
We investigate two strategies for reducing the clock period of a two-phase, levelclocked circuit: clock tuning, which adjusts the waveforms that clock the circuit, and retiming, which relocates circuit latches. These methods can be used to convert a circuit with edge-triggered latches into a faster level-clocked one. We model a two-phase circuit as a graph whose vertex set V is a collection of combinational logic blocks, and whose edge set E is a set of interconnections. Each interconnection passes through 0 or more latches, where each latch is clocked by one of two periodic, nonoverlapping waveforms, or phases. We give efficient polynomial-time algorithms for problems involving the timing verification and optimization of two-phase circuitry. Included are algorithms for ffl verifyi...
Clocking Design and Analysis for a 600-MHz Alpha Microprocessor
, 1998
"... Design, analysis, and verification of the clock hierarchy on a 600-MHz Alpha microprocessor is presented. The clock hierarchy includes a gridded global clock, gridded major clocks, and many local clocks and local conditional clocks, which together improve performance and power at the cost of verific ..."
Abstract
-
Cited by 46 (1 self)
- Add to MetaCart
Design, analysis, and verification of the clock hierarchy on a 600-MHz Alpha microprocessor is presented. The clock hierarchy includes a gridded global clock, gridded major clocks, and many local clocks and local conditional clocks, which together improve performance and power at the cost of verification complexity. Performance is increased with a windowpane arrangement of global clock drivers for lowering skew and employing local clocks for time borrowing. Power is reduced by using major clocks and local conditional clocks. Complexity is managed by partitioning the analysis depending on the type of clock. Design and characterization of global and major clocks use both an AWEsim-based computer-aided design (CAD) tool and SPICE. Design verification of local clocks relies on SPICE along with a timing-based methodology CAD tool that includes datadependent coupling, data-dependent gate loads, and resistance effects.
Wave-pipelining: A tutorial and research survey
- IEEE TRANS. ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
, 1998
"... Wave-pipelining is a method of high-performance circuit design which implements pipelining in logic without the use of intermediate latches or registers. The combination of high-performance integrated circuit (IC) technologies, pipelined architectures, and sophisticated computer-aided design (CAD) t ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
Wave-pipelining is a method of high-performance circuit design which implements pipelining in logic without the use of intermediate latches or registers. The combination of high-performance integrated circuit (IC) technologies, pipelined architectures, and sophisticated computer-aided design (CAD) tools has converted wave-pipelining from a theoretical oddity into a realistic, although challenging, VLSI design method. This paper presents a tutorial of the principles of wave-pipelining and a survey of wave-pipelined VLSI chips and CAD tools for the synthesis and analysis of wave-pipelined circuits.
A Graph-theoretic Approach to ClockSkew Optimization
- In Proc. International Symp. on Circuits and Systems
, 1994
"... This paper addresses the problem of minimizing the clock period of a circuit by optimizing the clockskews. We incorporate uncertainty factors and present a formulation that ensures that the optimization will be safe. In (1), the problem of clock period optimization is formulated as a linear program. ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
This paper addresses the problem of minimizing the clock period of a circuit by optimizing the clockskews. We incorporate uncertainty factors and present a formulation that ensures that the optimization will be safe. In (1), the problem of clock period optimization is formulated as a linear program. We first propose an efficient graph-based solution that takes advantage of the structure of the problem. We also show that the results of (1) may result in exceedingly large skews, and propose a method to reduce these skews without sacrificing the optimality of the clock period. Experimental results on several ISCAS89 benchmark circuits are provided.
Optimal Clock Skew Scheduling Tolerant to Process Variations
, 1996
"... A methodology is presented in this paper for determining an optimal set of clock path delays for designing high performance VLSI/ULSI-based clock distribution networks. This methodology emphasizes the use of non-zero clock skew to reduce the system-wide minimum clock period. Although choosing (or sc ..."
Abstract
-
Cited by 35 (9 self)
- Add to MetaCart
A methodology is presented in this paper for determining an optimal set of clock path delays for designing high performance VLSI/ULSI-based clock distribution networks. This methodology emphasizes the use of non-zero clock skew to reduce the system-wide minimum clock period. Although choosing (or scheduling) clock skew values has been previously recognized as an optimization technique for reducing the minimum clock period, difficulty in controlling the delays of the clock paths due to process parameter variations has limited its effectiveness. In this paper the minimum clock period is reduced using intentional clock skew by calculating a permissible clock skew range for each local data path while incorporating process dependent delay values of the clock signal paths.
Clock Distribution Networks in Synchronous Digital Integrated Circuits
- Proc. IEEE
, 2001
"... this paper, bears separate focus. The paper is organized as follows. In Section II, an overview of the operation of a synchronous system is provided. In Section III, fundamental definitions and the timing characteristics of clock skew are discussed. The timing relationships between a local data path ..."
Abstract
-
Cited by 34 (2 self)
- Add to MetaCart
this paper, bears separate focus. The paper is organized as follows. In Section II, an overview of the operation of a synchronous system is provided. In Section III, fundamental definitions and the timing characteristics of clock skew are discussed. The timing relationships between a local data path and the clock skew of that path are described in Section IV. The interplay among the aforementioned three subsystems making up a synchronous digital system is described in Section V; particularly, how the timing characteristics of the memory and logic elements constrain the design and synthesis of clock distribution networks. Different forms of clock distribution networks, such as buffered trees and H-trees, are discussed. The automated layout and synthesis of clock distribution networks are described in Section VI. Techniques for making clock distribution networks less sensitive to process parameter variations are discussed in Section VII. Localized scheduling of the clock delays is useful in optimizing the performance of high-speed synchronous circuits. The process for determining the optimal timing characteristics of a clock distribution network is reviewed in Section VIII. The application of clock distribution networks to high-speed circuits has existed for many years. The design of the clock distribution network of certain important VLSI-based systems has been described in the literature, and some examples of these circuits are described in Section IX. In an effort to provide some insight into future and evolving areas of research relevant to high-performance clock distribution networks, some potentially important topics for future research are discussed in Section X. Finally, a summary of this paper with some concluding remarks is provided in Section XI
Scheduling And Behavioral Transformations For Parallel Systems
, 1993
"... In a parallel system, either a VLSI architecture in hardware or a parallel program in software, the quality of the final design depends on the ability of a synthesis system to exploit the parallelism hidden in the input description of applications. Since iterative or recursive algorithms are usually ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
In a parallel system, either a VLSI architecture in hardware or a parallel program in software, the quality of the final design depends on the ability of a synthesis system to exploit the parallelism hidden in the input description of applications. Since iterative or recursive algorithms are usually the most time-critical parts of an application, the parallelism embedded in the repetitive pattern of an iterative algorithm needs to be explored. This thesis studies techniques and algorithms to expose the parallelism in an iterative algorithm so that the designer can find an implementation achieving a desired execution rate. In particular, the objective is to find an efficient schedule to be executed iteratively. A form of data-flow graphs is used to model the iterative part of an application, e.g. a digital signal filter or the while/for loop of a program. Nodes in the graph represent operations to be performed and edges represent both intra-iteration and inter-iteration precedence relat...
DelaY: An Efficient Tool for Retiming with Realistic Delay Modeling
- In Proc. 32nd ACM/IEEE Design Automation Conf
, 1995
"... The retiming transformation can be used to optimize synchronous circuits for maximum speed of operation by relocating their storage elements. In this paper, we describe DelaY, a tool for retiming edge-triggered circuits under a realistic delay model that handles load-dependent gate delays, variable ..."
Abstract
-
Cited by 27 (7 self)
- Add to MetaCart
The retiming transformation can be used to optimize synchronous circuits for maximum speed of operation by relocating their storage elements. In this paper, we describe DelaY, a tool for retiming edge-triggered circuits under a realistic delay model that handles load-dependent gate delays, variable register setup times, interconnect delays, and clock skew. The operation of DelaY relies on a novel linear programming formulation of the retiming problem in this model. For the special case where clock skew is monotonic and all registers have equal propagation delays, the retiming algorithm in our tool runs in polynomial time and can transform any given edgetriggered circuit to achieve a specified clock period in O(V 3 F ) steps, where V is the number of logic gates in the circuit and F is bounded by the number of registers in the circuit.

