Results 1 - 10
of
97
High-Level Power Modeling, Estimation, and Optimization
- IEEE Trans. On Computer Aided Design
, 1998
"... Abstract—Silicon area, performance, and testability have been, so far, the major design constraints to be met during the development of digital very-large-scale-integration (VLSI) systems. In recent years, however, things have changed; increasingly, power has been given weight comparable to the othe ..."
Abstract
-
Cited by 74 (10 self)
- Add to MetaCart
Abstract—Silicon area, performance, and testability have been, so far, the major design constraints to be met during the development of digital very-large-scale-integration (VLSI) systems. In recent years, however, things have changed; increasingly, power has been given weight comparable to the other design parameters. This is primarily due to the remarkable success of personal computing devices and wireless communication systems, which demand high-speed computations with low power consumption. In addition, there exists a strong pressure for manufacturers of high-end products to keep power under control, due to the increased costs of packaging and cooling this type of devices. Last, the need of ensuring high circuit reliability has turned out to be more stringent. The availability of tools for the automatic design of low-power VLSI systems has thus become necessary. More specifically, following a natural trend, the interests of the researchers have lately shifted to the investigation of power modeling, estimation, synthesis, and optimization techniques that account for power dissipation during the early stages of the design flow. This paper surveys representative contributions to this area that have appeared in the recent literature. Index Terms — Behavioral and logic synthesis, low power design, power management. I.
Constructive Methods for Scheduling Uniform Loop Nests
- IEEE Transactions on Parallel and Distributed Systems
, 1994
"... This paper surveys scheduling techniques for loop nests with uniform dependences. First we introduce the hyperplane method and related variants. Then we extend it by using a different affine scheduling for each statement within the nest. In both cases we present a new, constructive and efficient met ..."
Abstract
-
Cited by 60 (3 self)
- Add to MetaCart
This paper surveys scheduling techniques for loop nests with uniform dependences. First we introduce the hyperplane method and related variants. Then we extend it by using a different affine scheduling for each statement within the nest. In both cases we present a new, constructive and efficient method to determine optimal solutions, i.e. schedules whose total execution time is minimum. 1 Introduction Loop nests lie in the heart of supercompilers-parallelizers for supercomputers. On one hand their importance in terms of applications is evident: in many scientific programs, the time spent in the execution of a small number of loops represents a large fraction of the total execution time, while the potential parallelism of these loops is very important. On the other hand, the regular and repetitive structure of loop nests greatly facilitates the use of dependence analysis techniques and of scheduling and allocation strategies. The general problem of finding the optimal scheduling for a ...
Optimizing Two-Phase, Level-Clocked Circuitry (Extended Abstract)
"... We investigate two strategies for reducing the clock period of a two-phase, levelclocked circuit: clock tuning, which adjusts the waveforms that clock the circuit, and retiming, which relocates circuit latches. These methods can be used to convert a circuit with edge-triggered latches into a faster ..."
Abstract
-
Cited by 53 (16 self)
- Add to MetaCart
We investigate two strategies for reducing the clock period of a two-phase, levelclocked circuit: clock tuning, which adjusts the waveforms that clock the circuit, and retiming, which relocates circuit latches. These methods can be used to convert a circuit with edge-triggered latches into a faster level-clocked one. We model a two-phase circuit as a graph whose vertex set V is a collection of combinational logic blocks, and whose edge set E is a set of interconnections. Each interconnection passes through 0 or more latches, where each latch is clocked by one of two periodic, nonoverlapping waveforms, or phases. We give efficient polynomial-time algorithms for problems involving the timing verification and optimization of two-phase circuitry. Included are algorithms for ffl verifyi...
Logic Decomposition during Technology Mapping
- IEEE Trans. CAD
, 1995
"... This paper presents a procedure which performs logic decomposition during technology mapping. A problem in technology mapping is that quality of the final implementation depends significantly on the initially provided circuit structure. This problem is critical especially for mapping with tight and ..."
Abstract
-
Cited by 52 (0 self)
- Add to MetaCart
This paper presents a procedure which performs logic decomposition during technology mapping. A problem in technology mapping is that quality of the final implementation depends significantly on the initially provided circuit structure. This problem is critical especially for mapping with tight and complicated constraints. Conventional techniques iteratively apply technology independent transformations and technology mapping, so that the implementation obtained by technology mapping is restructured and remapped. Although some progress can be made, the effectiveness of these techniques is limited, since when a circuit is restructured, it is not clear how it is implemented eventually. The central problem is that technology independent transformations and technology mapping are applied separately. In this paper, we propose a procedure which simultaneously applies technology mapping and algebraic logic decomposition, a key technology independent operation for changing circuit structures....
Efficient Implementation of Retiming
- In Proc. Intl. Conf. on Computer-Aided Design
, 1994
"... Retiming is a technique for optimizing sequential circuits. It repositions the registers in a circuit leaving the combinational cells untouched. The objective of retiming is to find a circuit with the minimum number of registers for a specified clock period. More than ten years have elapsed since Le ..."
Abstract
-
Cited by 43 (0 self)
- Add to MetaCart
Retiming is a technique for optimizing sequential circuits. It repositions the registers in a circuit leaving the combinational cells untouched. The objective of retiming is to find a circuit with the minimum number of registers for a specified clock period. More than ten years have elapsed since Leiserson and Saxe first presented a theoretical formulation to solve this problem for single-clock edge-triggered sequential circuits. Their proposed algorithms have polynomial complexity; however naive implementations of these algorithms exhibit O(n 3 ) time complexity and O(n 2 ) space complexity when applied to digital circuits with n combinational cells. This renders retiming ineffective for circuits with more than 500 combinational cells. This paper addresses the implementation issues required to exploit the sparsity of circuit graphs to allow min-period retiming and constrained min-area retiming to be applied to circuits with as many as 10,000 combinational cells. We believe this is...
Pipeline Vectorization
, 2001
"... This paper presents pipeline vectorization, a method for synthesizing hardware pipelines based on software vectorizing compilers. The method improves efficiency and ease of development of hardware designs, particularly for users with little electronics design experience. We propose several loop tran ..."
Abstract
-
Cited by 36 (8 self)
- Add to MetaCart
This paper presents pipeline vectorization, a method for synthesizing hardware pipelines based on software vectorizing compilers. The method improves efficiency and ease of development of hardware designs, particularly for users with little electronics design experience. We propose several loop transformations to customize pipelines to meet hardware resource constraints, while maximizing available parallelism. For run-time reconfigurable systems, we apply hardware specialization to increase circuit utilization. Our approach is especially effective for highly repetitive computations in DSP and multimedia applications. Case studies using FPGA-based platforms are presented to demonstrate the benefits of our approach and to evaluate trade-offs between alternative implementations. The loop tiling transformation, for instance, has been found to improve vectorization performance by 30 to 40 times above a PC-based software implementation, depending on whether run-time reconfiguration is used.
Transformation-Based Verification Using Generalized Retiming
"... In this paper we present the application of generalized retiming for temporal property checking. Retiming is a structural transformation that relocates registers in a circuit-based design representation without changing its actual input-output behavior. We discuss the application of retiming to mini ..."
Abstract
-
Cited by 31 (16 self)
- Add to MetaCart
In this paper we present the application of generalized retiming for temporal property checking. Retiming is a structural transformation that relocates registers in a circuit-based design representation without changing its actual input-output behavior. We discuss the application of retiming to minimize the number of registers with the goal of increasing the capacity of symbolic state traversal. In particular, we demonstrate that the classical definition of retiming can be generalized for verification by relaxing the notion of design equivalence and physical implementability. This includes (1) omitting the need for equivalent reset states by using an initialization stump, (2) supporting negative registers, handled by a general functional relation to future time frames, and (3) eliminating peripheral registers by converting them into simple temporal offsets. The presented results demonstrate that the application of retiming in verification can significantly increase the capacity of symbolic state traversal. Our experiments also demonstrate that the repeated use of retiming interleaved with other structural simplifications can yield reductions beyond those possible with single applications of the individual approaches. This result suggests that a tool architecture based on re-entrant transformation engines can potentially decompose and solve verification problems that otherwise would be infeasible.
Understanding Retiming through Maximum Average-Delay Cycles
- Mathematical Systems Theory
, 1994
"... A synchronous circuit built of functional elements and registers is a simple implementation of the semisystolic model of computation that can be used to design parallel algorithms. Retiming is a well-known technique that transforms a given circuit into a faster circuit by relocating its registers. W ..."
Abstract
-
Cited by 30 (8 self)
- Add to MetaCart
A synchronous circuit built of functional elements and registers is a simple implementation of the semisystolic model of computation that can be used to design parallel algorithms. Retiming is a well-known technique that transforms a given circuit into a faster circuit by relocating its registers. We give tight bounds on the minimum clock period that can be achieved by retiming a synchronous circuit. These bounds are expressed in terms of the maximum delay-to-register ratio of the cycles in the circuit graph and the maximum propagation delay d max of the circuit components. Our bounds do not depend on the size of the circuit, and they are of theoretical as well as practical interest. They characterize exactly the minimum clock period that can be achieved by retiming a unit-delay circuit, and they lead to more efficient algorithms for several important problems related to retiming. Specifically, we give an O(V 1=2 E lg V ) algorithm for minimum clock period retiming of unitdelay circu...
Pipeline Vectorization for Reconfigurable Systems
- PROC FCCM'99
, 1999
"... This paper presents pipeline vectorization, a method for synthesizing hardware pipelines in reconfigurable systems based on software vectorizing compilers. The method improves efficiency and ease of development of reconfigurable designs, particularly for users with little electronics design experien ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
This paper presents pipeline vectorization, a method for synthesizing hardware pipelines in reconfigurable systems based on software vectorizing compilers. The method improves efficiency and ease of development of reconfigurable designs, particularly for users with little electronics design experience. We propose several loop transformations to customize pipelines to meet hardware resource constraints, while maximizing available parallelism. For run-time reconfigurable systems, we apply hardware specialization to increase circuit utilization. Our approach is especially effective for highly repetitive computations in DSP and multimedia applications. Case studies using FPGA-based platforms are presented to demonstrate the benets of our approach and to evaluate trade-offs between alternative implementations. The loop tiling transformation, for instance, has been found to improve performance by 30 to 40 times above a PC-based software implementation, depending on whether run-time reconfiguration i...
Scheduling And Behavioral Transformations For Parallel Systems
, 1993
"... In a parallel system, either a VLSI architecture in hardware or a parallel program in software, the quality of the final design depends on the ability of a synthesis system to exploit the parallelism hidden in the input description of applications. Since iterative or recursive algorithms are usually ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
In a parallel system, either a VLSI architecture in hardware or a parallel program in software, the quality of the final design depends on the ability of a synthesis system to exploit the parallelism hidden in the input description of applications. Since iterative or recursive algorithms are usually the most time-critical parts of an application, the parallelism embedded in the repetitive pattern of an iterative algorithm needs to be explored. This thesis studies techniques and algorithms to expose the parallelism in an iterative algorithm so that the designer can find an implementation achieving a desired execution rate. In particular, the objective is to find an efficient schedule to be executed iteratively. A form of data-flow graphs is used to model the iterative part of an application, e.g. a digital signal filter or the while/for loop of a program. Nodes in the graph represent operations to be performed and edges represent both intra-iteration and inter-iteration precedence relat...

