## Behavioral Level Guidance Using Property-Based Design Characterization (1996)

Citations: | 2 - 0 self |

### BibTeX

@TECHREPORT{Guerra96behaviorallevel,

author = {Lisa Marie Guerra},

title = {Behavioral Level Guidance Using Property-Based Design Characterization },

institution = {},

year = {1996}

}

### OpenURL

### Abstract

The growing importance of optimization, short time to market windows, and exponentially growing design complexity are just a few of the factors shaping the state-of-the-art synthesis process. In particular, optimization at the early stages of design is crucial --- at the system and behavioral levels, orders of magnitude performance improvement in key design metrics such as throughput, power, and area can be attained. This requires, however, strategic and coordinated application of design techniques best suited for a target design. The problem, however, is the number of options currently available is overwhelming, and as a result, design exploration is often conducted in a qualitative, ad-hoc manner. To address

### Citations

8563 |
Elements of Information Theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ...plexity is of interest for this work only to the extent that it provides intuitive motivation for the development of a heuristic measure of graph complexity. The following quote from Cover and Thomas =-=[Cov91]-=- nicely sums this up: "One does not use the shortest computer program in practice because it may take infinitely long to find such a minimal program. But one can use very short, not necessarily minima... |

8530 |
Introduction to Algorithms
- Cormen, Leiserson, et al.
- 1990
(Show Context)
Citation Context ...re also topological in nature. One such metric is the flowgraph longest path. The longest path can be found using a modified shortest-paths algorithm in linear time (in the number of nodes and edges) =-=[Cor90]-=-, and is a rough indicator of overall timing. Another such metric is the ratio of the number of operations to the length of the longest path, which is a rough indicator of the average concurrency. A n... |

3973 |
Computer Architecture: A Quantitative Approach, 3 rd ed
- Hennessy, Patterson, et al.
- 2002
(Show Context)
Citation Context ...iler areas, the concept of locality has been heavily studied and utilized during the last three decades [e.g., Kuc78, Pat90]. In particular, it has greatly influenced the design of memory hierarchies =-=[Pat90]-=-. Locality is the qualitative property of typical programs that 90% of execution time is spent on 10% of the instructions [Knu71]. In those domains, temporal locality is described as the tendency for ... |

1682 | An Introduction to Kolmogorov Complexity and its Applications
- Li, Vitányi
- 1997
(Show Context)
Citation Context ...e main idea is that "the amount of information in a finite object such as a string, is the size, in bits, of the smallest program that, starting with blank memory, outputs the string then terminates" =-=[Li90]-=-. Although there are many programs which can generate a given finite sequence, the shortest one gives a good indication of the information content of that sequence. A more formal definition is: the Ko... |

870 |
The Art of Computer Systems Performance Analysis
- Jain
- 1991
(Show Context)
Citation Context ...linear regression, a linear model based on the metric was built to predict the actual number of units. The model is: predicted number of units = 1.15 . concurrency metric + 1.55. The model had an R 2 =-=[Jai91]-=- of 0.87 and an F-ratio [Box78] of 741.8. The R 2 is the ratio of the variability explained by the model over the variability of the data itself. If the model explains all the variability of data, the... |

480 | Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing
- Lee, Messerschmitt
- 1987
(Show Context)
Citation Context ...ay operators as representing registers holding state, and the other operators as combinational logic. The semantics of the graphs are most similar in function to the synchronous dataflow model of Lee =-=[Lee87]-=-. Under this model, operators consume at every input, and produce at every output, a fixed number of samples on every execution. Extension to handle dynamic dataflow graphs [Lee93] can also be done by... |

418 |
Combinatorial Algorithms for Integrated Circuit Layout
- Lengauer
(Show Context)
Citation Context ...pairs of nodes to form a clique but differ in the weight assigned to the resulting edges. The first is the uniformly weighted clique model x T DA-- ()x Lx T Qx l x T x1-- () -- = Q lI -- ()x 0 = x 50 =-=[Len90]-=-, which has been widely used for layout partitioning. This model assigns a weight of 1/(k-1) to all edges in the clique, where k is the number of nodes in the clique (Figure 24a and Figure 24b). Addit... |

410 | Introduction to VLSI systems - Mead, Conway - 1980 |

378 |
A loop transformation theory and an algorithm to maximize parallelism
- Wolf, Lam
- 1991
(Show Context)
Citation Context ...plicit enumeration search techniques. For linear loop transformations, several research groups specializing in optimizing compilers have used mathematical theory to develop efficient ordering schemes =-=[Wol91]-=-. Generic probabilistic techniques can often provide a good trade-off between design time and solution quality [Fox89]. Bottleneck identification and elimination approaches have been proposed for a li... |

365 | Compiler transformations for high-performance computing - Bacon, Graham, et al. - 1994 |

360 | Sangiovanni_Vincentelll ""Logic Minimization Algorithms for VLSI Synthesis" Kluwer Academic Pub - Brayton, Hachte, et al. - 1984 |

309 | High-level synthesis: introduction to chip and system design - Gajski, Dutt, et al. - 1992 |

308 | Constant Propagation with Conditional Branches - Wegman, Zadeck - 1991 |

236 |
Force-directed scheduling for the behavioral synthesis of ASICs
- Paulin, Knight
- 1989
(Show Context)
Citation Context ...f a distribution graph, which quantifies an expected number of operations to be executed in each clock cycle. Paulin first proposed and used information from such graphs for force-directed scheduling =-=[Pau89]-=-. Consider the computation of Figure 13a, in which each operator is assumed to require 1 clock cycle to execute. In order to complete the computation in 4 clock cycles, two multiplications must be don... |

234 |
Statistics for experimenters: An introduction to design, data analysis and model building
- Box, Hunter, et al.
- 1978
(Show Context)
Citation Context ...el based on the metric was built to predict the actual number of units. The model is: predicted number of units = 1.15 . concurrency metric + 1.55. The model had an R 2 [Jai91] of 0.87 and an F-ratio =-=[Box78]-=- of 741.8. The R 2 is the ratio of the variability explained by the model over the variability of the data itself. If the model explains all the variability of data, the R 2 would be 1. On the other h... |

194 | Available instruction-level parallelism ,for superscalar and superpipelined machines - Jouppi, Wall - 1989 |

193 | E.: Scheduling dynamic dataflow graphs with bounded memory using the token flow model
- Buck, Lee
- 1993
(Show Context)
Citation Context ...taflow model of Lee [Lee87]. Under this model, operators consume at every input, and produce at every output, a fixed number of samples on every execution. Extension to handle dynamic dataflow graphs =-=[Lee93]-=- can also be done by using maximum and expected numbers of iterations for non-deterministic loops and using expected probabilities for branches. 8 Hierarchy within flowgraphs is used to represent loop... |

179 | Global Optimizations by Suppression of Partial Redundancies - Morel, Renvoise - 1979 |

177 | Optimizing Power Using Transformations
- Chadrakasan, Potkonjak, et al.
- 1995
(Show Context)
Citation Context ...ion types, N i the number of times the operation of type i is performed per sample period, and C i is the average capacitance per execution of operation type i. Sample capacitance models are shown in =-=[Cha95]-=-. 4.3.2 Execution Unit Area Modeling: Operator and Register Access Concurrency For estimating execution unit area, concurrency metrics can be used. An empirical study was conducted to detect the trend... |

159 |
An r-dimensional quadratic placement algorithm
- Hall
- 1970
(Show Context)
Citation Context ...4 3/7 = .43 4/7=.57 6/7=.85 D D Connectivity 4/7=.57 D D 4/7=.57 #isolated 3 1 1 2 1 1 #bridges 1 3 0 2 0 0 Fig. 22. Example of locality metrics. 48 Hall's derivation The derivation presented by Hall =-=[Hal70]-=-, and reproduced here, showed that the eigenvectors of the Laplacian matrix solve the one dimensional quadratic placement problem. The problem involves finding the vector = (x 1 , x 2 ,......., x n ) ... |

158 |
Short-Circuit Dissipation of Static CMOS Circuitry and Its Impact on the Design of Buffer Circuits
- Veendrick
- 1984
(Show Context)
Citation Context ...composed of dynamic, short-circuit, and leakage power. The latter two, however, are influenced mainly by the circuit design style used, and can be designed to be less than 15% of the total chip power =-=[Vee84]-=-. For analysis at the algorithm and architectural levels, therefore, power can be adequately described by its dynamic component: PowerC eff V SW V DDs() fs= 14 where f is the frequency of operation, V... |

134 |
An empirical study of FORTRAN programs
- Knuth
- 1974
(Show Context)
Citation Context ...particular, it has greatly influenced the design of memory hierarchies [Pat90]. Locality is the qualitative property of typical programs that 90% of execution time is spent on 10% of the instructions =-=[Knu71]-=-. In those domains, temporal locality is described as the tendency for a program to reuse data or instructions which have been recently used. Spatial locality is the tendency for a program to use data... |

127 | Principles of Compiler Design - Aho, Ullman - 1977 |

116 | A hardware/software codesign methodology for DSP applications - Kalavade, Lee - 1993 |

109 | The Structure of Computers and Computation - Kuck - 1978 |

108 |
Fast prototyping of data path intensive architectures
- Rabaey, Chu, et al.
- 1991
(Show Context)
Citation Context ... thesis. Manual mapping was performed during exploration of new ideas regarding improvements to the existing synthesis tools (Section 4.4). Synthesis, with the Hyper behavioral-level synthesis system =-=[Rab91a]-=-, was used during the development of new estimators (Section 4.3) and for general experimentation. The Hyper system, developed at the University of California at Berkeley, provides an automated path f... |

90 | Global communication and memory optimizing transformations for low power signal processing systems - Catthoor, Franssen, et al. - 1994 |

88 | Superoptimizer: a look at the smallest program
- Massalin
- 1987
(Show Context)
Citation Context ...] is another common approach which works well when the targeted example has similar characteristics to those used in experimentally developing the scripts. In the enumerationbased "generate and test" =-=[Mas87]-=- approach, all combinations of transformations are considered for a particular compilation, and the best one is selected using implicit enumeration search techniques. For linear loop transformations, ... |

85 | Measuring the parallelism available for very long instruction word architecture - Nicolau, Fisher - 1984 |

72 |
VHDL: Hardware Description and Design
- Lipsett, Schaefer, et al.
- 1989
(Show Context)
Citation Context ..., and is repeated for the subsequent iterations. The last step, hardware mapping [Ben93], generates a finite state machine to control the datapath, and emits a description of the architecture in VHDL =-=[Lip89]-=- or SDL [Bro92]. The SDL format is suitable for silicon compilation to layout using the Lager system [Bro92]. More details on the Hyper system and architectural synthesis in general can be found in [R... |

62 | Measuring parallelism in computation-intensive scientific/engineering applications - Kumar - 1988 |

58 | Behavioral level power estimation and exploration
- Mehra, Rabaey
- 1972
(Show Context)
Citation Context ...cution units, registers, and interconnect counts. This is supplemented with additional empirical models to provide an estimate of total chip area, which takes into account the control and wiring area =-=[Meh94]-=-. Power dissipated in digital CMOS circuits is composed of dynamic, short-circuit, and leakage power. The latter two, however, are influenced mainly by the circuit design style used, and can be design... |

56 |
Optimizing resource utilization using transformations
- Potkonjak, Rabaey
- 1991
(Show Context)
Citation Context ...riance in these graphs, the less uniform the structure is, and hence the more susceptible it is to improvement by transformations which alter ASAP and ALAP times. The retiming-for-area transformation =-=[Pot91]-=- is one such optimization. Section 4.1 shows examples of using this metric in this way. 3.6 Locality In the architecture and compiler areas, the concept of locality has been heavily studied and utiliz... |

48 |
An approach to ordering optimizing transformations
- Whitfield, Soffa
- 1990
(Show Context)
Citation Context ... of transformations [Iqb93, 86 Hua96]. Finally, the idea of enabling and disabling transformations, again restricted to a specific set of transformations, has been explored in a number of compilation =-=[Whi90]-=- and behavioral level synthesis efforts [Pot92, Sri95a, Hua94]. 5.2 Global Overview Figure 42 depicts the proposed methodology for guided optimization. This methodology is based on the idea that to fu... |

45 | Estimating Architectural Resources and Performance for High-level Synthesis Applications - Sharma, Jain - 1993 |

36 |
Timing optimization of combinational logic
- Singh, Wang, et al.
- 1988
(Show Context)
Citation Context ...en throughput and latency [Sri95b]. Several timing metrics give a measure of how constrained the system is, such as the critical path to sample period ratio, metrics related to the e-critical network =-=[Sin88]-=-, and metrics related to the scheduling slack. The e-critical network is a subset of the flowgraph containing all nodes and edges on paths which are "almost" critical. More precisely, paths of length ... |

35 |
Scheduling synchronous dataflow graphs for efficient looping
- Bhattacharyya, Lee
- 1993
(Show Context)
Citation Context ...ors [Kun84, Kun88] can be largely attributed to efforts to effectively exploit computation regularity. Vector processors [Hwa84] are often used to exploit loop level regularity. Bhattacharyya and Lee =-=[Bha93]-=- exploited regularity in their development of a looped scheduler, which synthesizes code for programmable DSPs with reduced program and memory requirements. In the high-level synthesis literature, Not... |

34 | Techniques for area estimation of VLSI layouts - Kurdahi - 1989 |

34 | High-level synthesis techniques for reducing the activity of functional units - Musoll, Cortadella - 1995 |

33 |
High-Level Algorithm and Architecture Transformations for DSP Synthesis
- Parhi
- 1995
(Show Context)
Citation Context ...zations used in CAD have been adopted from compiler technology. An overview of compiler optimizations as well as an extensive bibliography can be found in [All75, Aho77, Rob87, Bac94]. Further, Parhi =-=[Par95]-=- presented a survey on numerous transformations for DSP systems. Behavioral-level transformations include algebraic transformations (using the associative, distributive, and commutative identities), c... |

32 |
Peephole optimization
- McKeeman
- 1965
(Show Context)
Citation Context ...y, and flow management include [Kle94, Har90]. There have also been a number of other approaches to optimization ordering. These, however, only address fixed sets of techniques. Peephole optimization =-=[McK65]-=-, for example, is a simple and popular technique for combining transformations where the compiler considers only a limited section of code upon which it applies the available transformations one by on... |

32 | Maximally fast and arbitrarily fast implementation of linear computations
- Potkonjak, Rabaey
- 1992
(Show Context)
Citation Context ...however. For example, systems can exhibit linearity over min or max as the additive operator, and arithmetic addition as the multiplicative operator [Fet90]. The class of feedback linear computations =-=[Pot92]-=- is a more general class including not only all linear computations, but also a subset of non-linear computations. Feedback linear computations can be defined by the following equations: s[n+1] = A . ... |

30 | Optimum and Heuristic Transformation Techniques for Simultaneous Optimization of Latency and Throughput
- Srivastava, Potkonjak
- 1995
(Show Context)
Citation Context ...eline stages that have been added and T s is the sample period. Srivastava and Potkonjak have presented a detailed treatment of the definitions of and the relationships between throughput and latency =-=[Sri95b]-=-. Several timing metrics give a measure of how constrained the system is, such as the critical path to sample period ratio, metrics related to the e-critical network [Sin88], and metrics related to th... |

25 |
high-level language and silicon compiler for digital signal processing
- Hilfinger, BA
- 1985
(Show Context)
Citation Context ...and 7 more complex systems (DFE, echo cancellation, dynamic programming, speech coding, etc.). 2.1.2 Flowgraph Representation The user specifies an algorithm in either the Silage applicative language =-=[Hil85]-=- or a C++ subset [Wan94]. This description is parsed and compiled into a hierarchical dataflow graph representation. This representation is central to this work, as the design characterization is extr... |

25 |
Cathedral-III: Architecture-driven high-level synthesis for high throughput DSP applications
- Note, Geurts, et al.
- 1991
(Show Context)
Citation Context ...d regularity in their development of a looped scheduler, which synthesizes code for programmable DSPs with reduced program and memory requirements. In the high-level synthesis literature, Note et al. =-=[Not91]-=- and 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/6 1/6 1/6 1/3 a x y z x y z a a x y z Fig. 24. Hyperedge models: (a) initial hyperedge, (b) uniformly weighted clique model, (c) clique model with lower weights ... |

24 |
Performance optimization using template mapping for datapath-intensive high-level synthesis
- Corazao, Khalaf, et al.
- 1996
(Show Context)
Citation Context ...zation and Exploration After estimation, optimizations can be applied to improve the design's area, power, and throughput. An optimization may involve changing a parameter such as the clock frequency =-=[Cor96]-=- or voltage [Cha95, Rag95], performing chaining [Cor96], selecting new hardware resources from the hardware library [Jai90], or applying a transformation. Transformations involve changes in the struct... |

23 | Partitioning by regularity extraction - Rao, Kurdahi - 1992 |

22 | On Supercomputing with Systolic/Wavefront Array - Kung - 1984 |

21 |
System-level design guidance using algorithm properties
- Guerra, Potkonjak, et al.
- 1994
(Show Context)
Citation Context ...operator delays, but also the control decode delay, the register access delay, and the clock duty cycle. Estimation of the chip's area is performed through a combination of techniques. Both empirical =-=[Gue94]-=- (Section 4.3) and deterministic min-bound [Rab91b] techniques are used for the estimation of execution units, registers, and interconnect counts. This is supplemented with additional empirical models... |

19 | Determining the minimum iteration period of an algorithm
- Ito, Parhi
- 1995
(Show Context)
Citation Context ...edictors of the potential improvement in throughput by application of transformations such as pipelining, retiming, and time-loop unfolding. An efficient algorithm for its calculation can be found in =-=[Ito95]-=-. A final timing metric is the maximum operator delay. This metric is useful, for example, when performing clock selection. 3.5 Concurrency and Uniformity In both the general-purpose and scientific-co... |