Results 1 - 10
of
58
Energy Minimization Using Multiple Supply Voltages
- IEEE Trans. on VLSI Systems
, 1997
"... We present a dynamic programming technique for solving the multiple supply voltage scheduling problem in both non-pipelined and functionally pipelined data-paths. The scheduling problem refers to the assignment of a supply voltage level (selected from a fixed and known number of voltage levels) to e ..."
Abstract
-
Cited by 110 (4 self)
- Add to MetaCart
We present a dynamic programming technique for solving the multiple supply voltage scheduling problem in both non-pipelined and functionally pipelined data-paths. The scheduling problem refers to the assignment of a supply voltage level (selected from a fixed and known number of voltage levels) to each operation in a data flow graph so as to minimize the average energy consumption for given computation time or throughput constraints or both. The energy model is accurate and accounts for the input pattern dependencies, re-convergent fanout induced dependencies, and the energy cost of level shifters. Experimental results show that using three supply voltage levels on a number of standard benchmarks, an average energy saving of 40.19% (with a computation time constraint of 1.5 times the critical path delay) can be obtained compared to using a single supply voltage level. Keywords--- Energy Minimization, Multiple Supply Voltages, Scheduling, Dynamic Programming, Functional Pipelining. I. ...
Behavioral Transformation for Algorithmic Level IC Design
- IEEE Transactions on Computer-Aided Design
, 1989
"... Now that the field of automated synthesis for register transfer level integrated circuit design is beginning to mature, it is appropriate to begin developing tools for higher levels of design. At the next higher level, it is appropriate to explore behavioral and structural partitioning, answering su ..."
Abstract
-
Cited by 48 (0 self)
- Add to MetaCart
Now that the field of automated synthesis for register transfer level integrated circuit design is beginning to mature, it is appropriate to begin developing tools for higher levels of design. At the next higher level, it is appropriate to explore behavioral and structural partitioning, answering such questions about the design as: . Should the design be implemented on a single VLSI chip, or partitioned into two or more chips, and if it is to be partitioned, where should the behavior be divided? . Should the design be implemented as a single process, with a single data path and controller, or should it be split into two or more processes, each hopefully smaller and faster than the single process design and with more potential concurrency? . Should the design be pipelined or left unpipelined, and if pipelined, how many stage divisions should there be, and where should they be placed? The goal of this research was to define the Algorithmic Level of design (also known as the Behavioral Le...
Scheduling And Behavioral Transformations For Parallel Systems
, 1993
"... In a parallel system, either a VLSI architecture in hardware or a parallel program in software, the quality of the final design depends on the ability of a synthesis system to exploit the parallelism hidden in the input description of applications. Since iterative or recursive algorithms are usually ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
In a parallel system, either a VLSI architecture in hardware or a parallel program in software, the quality of the final design depends on the ability of a synthesis system to exploit the parallelism hidden in the input description of applications. Since iterative or recursive algorithms are usually the most time-critical parts of an application, the parallelism embedded in the repetitive pattern of an iterative algorithm needs to be explored. This thesis studies techniques and algorithms to expose the parallelism in an iterative algorithm so that the designer can find an implementation achieving a desired execution rate. In particular, the objective is to find an efficient schedule to be executed iteratively. A form of data-flow graphs is used to model the iterative part of an application, e.g. a digital signal filter or the while/for loop of a program. Nodes in the graph represent operations to be performed and edges represent both intra-iteration and inter-iteration precedence relat...
Bitwidth Cognizant Architecture Synthesis of Custom Hardware Accelerators
- IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
, 2001
"... applicationspecific design, architecture synthesis, bitwidth, clustering, embedded system, hardware accelerator, operation scheduling, resource allocation PICO is a system for automatically synthesizing embedded hardware accelerators from loop nests specified in the C programming language. A key iss ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
applicationspecific design, architecture synthesis, bitwidth, clustering, embedded system, hardware accelerator, operation scheduling, resource allocation PICO is a system for automatically synthesizing embedded hardware accelerators from loop nests specified in the C programming language. A key issue confronted when designing such accelerators is the optimization of hardware by exploiting information that is known about the varying number of bits required to represent and process operands. In this paper, we describe the handling and exploitation of integer bitwidth in PICO. A bitwidth analysis procedure is used to determine bitwidth requirements for all integer variables and operations in a C application. Given known bitwidths for all variables, complex problems arise when determining a program schedule that specifies on which function unit and at what time each operation executes. If operations are assigned to function units with no knowledge of bitwidth, bitwidth-related cost benefit is lost when each unit is built to accommodate the widest operation assigned. By carefully placing operations of similar width on the same unit, hardware costs are decreased. This problem is addressed using a preliminary clustering of operations that is based jointly on width and implementation cost. These clusters are then honored during resource allocation and operation scheduling to create an efficient widthconscious design. Experimental results show that exploiting integer bitwidth substantially reduces the gate count of PICO-synthesized hardware accelerators across a range of applications.
Achieving Full Parallelism using Multi-Dimensional Retiming
, 1996
"... Most scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are usually applied to get optimal execution rates in parallel and/or pipeline systems. The retiming technique is a common and valuable transformation tool in one-dimensional proble ..."
Abstract
-
Cited by 22 (14 self)
- Add to MetaCart
Most scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are usually applied to get optimal execution rates in parallel and/or pipeline systems. The retiming technique is a common and valuable transformation tool in one-dimensional problems, when loops are represented by data flow graphs (DFGs). In this paper, uniform nested loops are modeled as multidimensional data flow graphs (MDFGs). Full parallelism of the loop body, i.e., all nodes in the MDFG executed in parallel, substantially decreases the overall computation time. It is well known that, for one-dimensional DFGs, retiming can not always achieve full parallelism. Other existing optimization techniques for nested loops also can not always achieve full parallelism. This paper shows an important and counter-intuitive result, which proves that we can always obtain full-parallelism for MDFGs with more than one dimension. This result is obtained by transforming the MDFG in...
HYPER: An Interactive Synthesis Environment for High Performance Real Time Applications
- IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN
, 1989
"... A synthesis system called HYPER is proposed for real time applications. HYPER takes a flow graph description of an algodfinn as the input and performs scheduling, resource allocation, optimizations, and transformations. A dedicated bit-sliced data palh cluster is generated by the systa and the layou ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
A synthesis system called HYPER is proposed for real time applications. HYPER takes a flow graph description of an algodfinn as the input and performs scheduling, resource allocation, optimizations, and transformations. A dedicated bit-sliced data palh cluster is generated by the systa and the layouts can be further generated through the LAGER IV system.
A transformation-based method for loop folding
- IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
, 1994
"... We propose a transformation-based scheduling algorithm for the problem- given a loop construct, a target initiation interval and a set of resource constraints, schedule the loop in a pipelined fashion such that the iteration time of executing an iteration of the loop is minimized. The iteration tim ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
We propose a transformation-based scheduling algorithm for the problem- given a loop construct, a target initiation interval and a set of resource constraints, schedule the loop in a pipelined fashion such that the iteration time of executing an iteration of the loop is minimized. The iteration time is an important quality measure of a data path design because it affects both storage and control costs. Our algorithm first performs an As Soon As Possible Pipelined (ASAPp) scheduling regardless the resource constraint. It then resolves resource constraint violations by rescheduling some operations. The software system imple-menting the proposed algorithm, called Theda.Fold, can deal with behavioral loop descriptions that contain chained, multicycle and/or structural pipelined operations as well as those having data dependencies across iteration boundaries. Experiment on a number of benchmarks is reported.
Schedule-Based Multi-Dimensional Retiming on Data Flow Graphs
- Proceedings of 8th International Parallel Processing Symposium
, 1994
"... Transformation techniques are usually applied to get optimal execution rates in parallel and/or pipeline systems. The retiming technique is a common and valuable tool in one-dimensional problems, represented by Data Flow Graphs (DFGs) such as DSP filters, which can maximize the parallelism of a loop ..."
Abstract
-
Cited by 15 (13 self)
- Add to MetaCart
Transformation techniques are usually applied to get optimal execution rates in parallel and/or pipeline systems. The retiming technique is a common and valuable tool in one-dimensional problems, represented by Data Flow Graphs (DFGs) such as DSP filters, which can maximize the parallelism of a loop body represented by a DFG. Since most scientific or DSP applications are recursive or iterative, to increase the parallelism of the loop body can substantially decrease the overall computation time. Few results on retiming have been obtained for multi-dimensional problems. The previous result of multi-dimensional retiming is only applied to a restricted class of Data Flow Graphs in which every total delay vector in a cycle has to be strictly non-negative. This paper develops a novel retiming technique that considers the final schedule as part of the process. To authors' knowledge, this is the first retiming algorithm for general multi-dimensional Data Flow Graphs. The description and the correctness of our algorithm are presented in the paper. Through the experiments, results have shown that our algorithm runs efficiently. Some DSP filters are used in the paper as an example of the application of our algorithm. 1
Module Assignment for Low Power
, 1996
"... In this paper, we investigate the problem of minimizing the total power consumption during the binding of operations to functional units in a scheduled data path with functional pipelining and conditional branching for data intensive applications. We first present a technique to estimate the power c ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
In this paper, we investigate the problem of minimizing the total power consumption during the binding of operations to functional units in a scheduled data path with functional pipelining and conditional branching for data intensive applications. We first present a technique to estimate the power consumption in a functionally pipelined data path and then formulate the power optimization problem as a max-cost multi-commodity flow problem and solve it optimally. Our proposed method can augment most high-level synthesis algorithms as a post-processing step for reducing power after the optimizations for area or speed have been completed. An average power savings of 28% has been observed after we apply our method to pipelined designs that have been optimized using conventional techniques.
Hierarchical design space exploration for a class of digital systems
- IEEE Transactions on VLSI, v
, 1993
"... i Hierarchical Design Space Exploration for a Class of Digital Systems Abstract This paper presents an architectural synthesis approach for a widely used class of digital systems characterized by inherent regularity in their description. This approach relies on a novel modeling or abstraction of the ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
i Hierarchical Design Space Exploration for a Class of Digital Systems Abstract This paper presents an architectural synthesis approach for a widely used class of digital systems characterized by inherent regularity in their description. This approach relies on a novel modeling or abstraction of the problem domain to facilitate a hierarchical solution method. The modeling is based on exploiting the inherent regularity in the system description to cluster its behavioral operations. The method emphasizes prudent postponement of design decisions until enough physical design information is available to estimate layout effects like wiring; we use well-known area-delay estimators for this purpose. The approach has the advantage that it keeps track of a set of potentially good candidate solutions, rather than narrowing down to a single solution very early in the design process. Through an extensive set of experiments on well known DSP design examples, we demonstrate the advantages that such distinctive features have to offer; the impact of hierarchy on several important issues like interconnection area, extent of design space explored, etc. is presented.

