Results 1  10
of
113
HighLevel Power Modeling, Estimation, and Optimization
 IEEE Trans. On Computer Aided Design
, 1998
"... Abstract—Silicon area, performance, and testability have been, so far, the major design constraints to be met during the development of digital verylargescaleintegration (VLSI) systems. In recent years, however, things have changed; increasingly, power has been given weight comparable to the othe ..."
Abstract

Cited by 87 (11 self)
 Add to MetaCart
Abstract—Silicon area, performance, and testability have been, so far, the major design constraints to be met during the development of digital verylargescaleintegration (VLSI) systems. In recent years, however, things have changed; increasingly, power has been given weight comparable to the other design parameters. This is primarily due to the remarkable success of personal computing devices and wireless communication systems, which demand highspeed computations with low power consumption. In addition, there exists a strong pressure for manufacturers of highend products to keep power under control, due to the increased costs of packaging and cooling this type of devices. Last, the need of ensuring high circuit reliability has turned out to be more stringent. The availability of tools for the automatic design of lowpower VLSI systems has thus become necessary. More specifically, following a natural trend, the interests of the researchers have lately shifted to the investigation of power modeling, estimation, synthesis, and optimization techniques that account for power dissipation during the early stages of the design flow. This paper surveys representative contributions to this area that have appeared in the recent literature. Index Terms — Behavioral and logic synthesis, low power design, power management. I.
Constructive Methods for Scheduling Uniform Loop Nests
 IEEE Transactions on Parallel and Distributed Systems
, 1994
"... This paper surveys scheduling techniques for loop nests with uniform dependences. First we introduce the hyperplane method and related variants. Then we extend it by using a different affine scheduling for each statement within the nest. In both cases we present a new, constructive and efficient met ..."
Abstract

Cited by 63 (3 self)
 Add to MetaCart
This paper surveys scheduling techniques for loop nests with uniform dependences. First we introduce the hyperplane method and related variants. Then we extend it by using a different affine scheduling for each statement within the nest. In both cases we present a new, constructive and efficient method to determine optimal solutions, i.e. schedules whose total execution time is minimum. 1 Introduction Loop nests lie in the heart of supercompilersparallelizers for supercomputers. On one hand their importance in terms of applications is evident: in many scientific programs, the time spent in the execution of a small number of loops represents a large fraction of the total execution time, while the potential parallelism of these loops is very important. On the other hand, the regular and repetitive structure of loop nests greatly facilitates the use of dependence analysis techniques and of scheduling and allocation strategies. The general problem of finding the optimal scheduling for a ...
Optimizing TwoPhase, LevelClocked Circuitry (Extended Abstract)
"... We investigate two strategies for reducing the clock period of a twophase, levelclocked circuit: clock tuning, which adjusts the waveforms that clock the circuit, and retiming, which relocates circuit latches. These methods can be used to convert a circuit with edgetriggered latches into a faster ..."
Abstract

Cited by 55 (16 self)
 Add to MetaCart
We investigate two strategies for reducing the clock period of a twophase, levelclocked circuit: clock tuning, which adjusts the waveforms that clock the circuit, and retiming, which relocates circuit latches. These methods can be used to convert a circuit with edgetriggered latches into a faster levelclocked one. We model a twophase circuit as a graph whose vertex set V is a collection of combinational logic blocks, and whose edge set E is a set of interconnections. Each interconnection passes through 0 or more latches, where each latch is clocked by one of two periodic, nonoverlapping waveforms, or phases. We give efficient polynomialtime algorithms for problems involving the timing verification and optimization of twophase circuitry. Included are algorithms for ffl verifyi...
Logic Decomposition during Technology Mapping. submitted to
 IEEE Trans. CAD
, 1995
"... A problem in technology mapping is that quality of the final implementation depends significantly on the initially provided circuit structure. To resolve this problem, conventional techniques iteratively but separately apply technology independent transformations and technology mapping. In this pape ..."
Abstract

Cited by 55 (0 self)
 Add to MetaCart
A problem in technology mapping is that quality of the final implementation depends significantly on the initially provided circuit structure. To resolve this problem, conventional techniques iteratively but separately apply technology independent transformations and technology mapping. In this paper, we propose a procedure which performs logic decomposition and technology mapping simultaneously. We show that the procedure effectively explores all possible algebraic decompositions. It finds an optimal tree implementation over all the circuit structures examined, while the run time is typically logarithmic in the number of decompositions. 1
Efficient Implementation of Retiming
 In Proc. Intl. Conf. on ComputerAided Design
, 1994
"... Retiming is a technique for optimizing sequential circuits. It repositions the registers in a circuit leaving the combinational cells untouched. The objective of retiming is to find a circuit with the minimum number of registers for a specified clock period. More than ten years have elapsed since Le ..."
Abstract

Cited by 46 (0 self)
 Add to MetaCart
Retiming is a technique for optimizing sequential circuits. It repositions the registers in a circuit leaving the combinational cells untouched. The objective of retiming is to find a circuit with the minimum number of registers for a specified clock period. More than ten years have elapsed since Leiserson and Saxe first presented a theoretical formulation to solve this problem for singleclock edgetriggered sequential circuits. Their proposed algorithms have polynomial complexity; however naive implementations of these algorithms exhibit O(n 3 ) time complexity and O(n 2 ) space complexity when applied to digital circuits with n combinational cells. This renders retiming ineffective for circuits with more than 500 combinational cells. This paper addresses the implementation issues required to exploit the sparsity of circuit graphs to allow minperiod retiming and constrained minarea retiming to be applied to circuits with as many as 10,000 combinational cells. We believe this is...
Pipeline vectorization
 IEEE Trans. Comput.Aided Des
"... Abstract—This paper presents pipeline vectorization, a method for synthesizing hardware pipelines based on software vectorizing compilers. The method improves efficiency and ease of development of hardware designs, particularly for users with little electronics design experience. We propose several ..."
Abstract

Cited by 41 (8 self)
 Add to MetaCart
Abstract—This paper presents pipeline vectorization, a method for synthesizing hardware pipelines based on software vectorizing compilers. The method improves efficiency and ease of development of hardware designs, particularly for users with little electronics design experience. We propose several loop transformations to customize pipelines to meet hardware resource constraints while maximizing available parallelism. For runtime reconfigurable systems, we apply hardware specialization to increase circuit utilization. Our approach is especially effective for highly repetitive computations in digital signal processor (DSP) and multimedia applications. Case studies using field programmable gate arrays (FPGAs)based platforms are presented to demonstrate the benefits of our approach and to evaluate tradeoffs between alternative implementations. For instance, the looptiling transformation, has been found to improve vectorization performance 30–40 times above a PCbased software implementation, depending on whether runtime reconfiguration (RTR) is used. Index Terms—Highlevel synthesis, parallelization, pipelining, reconfigurable computing, vectorization.
Scheduling And Behavioral Transformations For Parallel Systems
, 1993
"... In a parallel system, either a VLSI architecture in hardware or a parallel program in software, the quality of the final design depends on the ability of a synthesis system to exploit the parallelism hidden in the input description of applications. Since iterative or recursive algorithms are usually ..."
Abstract

Cited by 39 (3 self)
 Add to MetaCart
In a parallel system, either a VLSI architecture in hardware or a parallel program in software, the quality of the final design depends on the ability of a synthesis system to exploit the parallelism hidden in the input description of applications. Since iterative or recursive algorithms are usually the most timecritical parts of an application, the parallelism embedded in the repetitive pattern of an iterative algorithm needs to be explored. This thesis studies techniques and algorithms to expose the parallelism in an iterative algorithm so that the designer can find an implementation achieving a desired execution rate. In particular, the objective is to find an efficient schedule to be executed iteratively. A form of dataflow graphs is used to model the iterative part of an application, e.g. a digital signal filter or the while/for loop of a program. Nodes in the graph represent operations to be performed and edges represent both intraiteration and interiteration precedence relat...
TransformationBased Verification Using Generalized Retiming
"... In this paper we present the application of generalized retiming for temporal property checking. Retiming is a structural transformation that relocates registers in a circuitbased design representation without changing its actual inputoutput behavior. We discuss the application of retiming to mini ..."
Abstract

Cited by 34 (17 self)
 Add to MetaCart
In this paper we present the application of generalized retiming for temporal property checking. Retiming is a structural transformation that relocates registers in a circuitbased design representation without changing its actual inputoutput behavior. We discuss the application of retiming to minimize the number of registers with the goal of increasing the capacity of symbolic state traversal. In particular, we demonstrate that the classical definition of retiming can be generalized for verification by relaxing the notion of design equivalence and physical implementability. This includes (1) omitting the need for equivalent reset states by using an initialization stump, (2) supporting negative registers, handled by a general functional relation to future time frames, and (3) eliminating peripheral registers by converting them into simple temporal offsets. The presented results demonstrate that the application of retiming in verification can significantly increase the capacity of symbolic state traversal. Our experiments also demonstrate that the repeated use of retiming interleaved with other structural simplifications can yield reductions beyond those possible with single applications of the individual approaches. This result suggests that a tool architecture based on reentrant transformation engines can potentially decompose and solve verification problems that otherwise would be infeasible.
Pipeline Vectorization for Reconfigurable Systems
 PROC FCCM'99
, 1999
"... This paper presents pipeline vectorization, a method for synthesizing hardware pipelines in reconfigurable systems based on software vectorizing compilers. The method improves efficiency and ease of development of reconfigurable designs, particularly for users with little electronics design experien ..."
Abstract

Cited by 34 (3 self)
 Add to MetaCart
This paper presents pipeline vectorization, a method for synthesizing hardware pipelines in reconfigurable systems based on software vectorizing compilers. The method improves efficiency and ease of development of reconfigurable designs, particularly for users with little electronics design experience. We propose several loop transformations to customize pipelines to meet hardware resource constraints, while maximizing available parallelism. For runtime reconfigurable systems, we apply hardware specialization to increase circuit utilization. Our approach is especially effective for highly repetitive computations in DSP and multimedia applications. Case studies using FPGAbased platforms are presented to demonstrate the benets of our approach and to evaluate tradeoffs between alternative implementations. The loop tiling transformation, for instance, has been found to improve performance by 30 to 40 times above a PCbased software implementation, depending on whether runtime reconfiguration i...
Understanding Retiming through Maximum AverageDelay Cycles
 Mathematical Systems Theory
, 1994
"... A synchronous circuit built of functional elements and registers is a simple implementation of the semisystolic model of computation that can be used to design parallel algorithms. Retiming is a wellknown technique that transforms a given circuit into a faster circuit by relocating its registers. W ..."
Abstract

Cited by 31 (8 self)
 Add to MetaCart
A synchronous circuit built of functional elements and registers is a simple implementation of the semisystolic model of computation that can be used to design parallel algorithms. Retiming is a wellknown technique that transforms a given circuit into a faster circuit by relocating its registers. We give tight bounds on the minimum clock period that can be achieved by retiming a synchronous circuit. These bounds are expressed in terms of the maximum delaytoregister ratio of the cycles in the circuit graph and the maximum propagation delay d max of the circuit components. Our bounds do not depend on the size of the circuit, and they are of theoretical as well as practical interest. They characterize exactly the minimum clock period that can be achieved by retiming a unitdelay circuit, and they lead to more efficient algorithms for several important problems related to retiming. Specifically, we give an O(V 1=2 E lg V ) algorithm for minimum clock period retiming of unitdelay circu...