## Optimum and Heuristic Transformation Techniques for Simultaneous Optimization of Latency and Throughput (1995)

Venue: | IEEE Trans. on VLSI Systems |

Citations: | 30 - 3 self |

### BibTeX

@ARTICLE{Srivastava95optimumand,

author = {Mani B. Srivastava and Miodrag Potkonjak},

title = {Optimum and Heuristic Transformation Techniques for Simultaneous Optimization of Latency and Throughput},

journal = {IEEE Trans. on VLSI Systems},

year = {1995},

volume = {3},

pages = {2--19}

}

### Years of Citing Articles

### OpenURL

### Abstract

A common metric of speed for DSP systems is their throughput. Algorithm transformations are the key to obtaining high throughput ASIC as well as software implementations. However, increasingly DSP subsystems are being used in systems such as "signal processing servers" and embedded controllers where both throughput and latency are important, and independent, metrics of speed. For example, the subsystem implementing the control law in a robot controller is part of a feedback loop so that not only does it have to process the inputs arriving at a rate determined by the sample period of the control loop, but it also has to produce the output corresponding to an input sample within a specified latency constraint. Although throughput alone can be arbitrarily improved for several classes of systems using previously published techniques, none of those approaches are effective when latency constraints are considered. After formally establishing the relationship between latency and throughput in...

### Citations

3993 |
Computer Architecture: A Quantitative Approach
- Hennessy, Patterson
- 1990
(Show Context)
Citation Context ...ce optimization technique, as well as one of the most often addressed transformation in high level synthesis research. For extensive coverage of pipelining in general computing literature we refer to =-=[33, 40]-=-, and in high level synthesis to [30, 41]. We will use the following definition of pipelining: Pipelining with k pipeline stages on a CDFG is a special form of retiming where on each primary output (o... |

786 |
The semantics of simple language for parallel programming
- Kahn
- 1974
(Show Context)
Citation Context ...he ratio between any two data sample rates will be a statically known rational number [35]. Mathematically, such a synchronous CDFG is equivalent to a continuous function over streams of data samples =-=[36, 37]-=-. Since CDFGs are of course causal, this means that they are equivalent to a function that expresses the i-th set of output samples in terms of the i-th and earlier sets of input samples. The system s... |

482 | D.G.: Static scheduling of synchronous data flow programs for digital signal processing
- Lee, Messerschmitt
- 1987
(Show Context)
Citation Context ...s well behaved in that the data sample rate at any given data edge in the CDFG is independent of the inputs, and the ratio between any two data sample rates will be a statically known rational number =-=[35]-=-. Mathematically, such a synchronous CDFG is equivalent to a continuous function over streams of data samples [36, 37]. Since CDFGs are of course causal, this means that they are equivalent to a funct... |

168 |
Coroutines and Networks of Parallel Processes
- Kahn, MacQueen
- 1977
(Show Context)
Citation Context ...he ratio between any two data sample rates will be a statically known rational number [35]. Mathematically, such a synchronous CDFG is equivalent to a continuous function over streams of data samples =-=[36, 37]-=-. Since CDFGs are of course causal, this means that they are equivalent to a function that expresses the i-th set of output samples in terms of the i-th and earlier sets of input samples. The system s... |

144 |
The Architecture of Pipelined Computers
- Kogge
- 1981
(Show Context)
Citation Context ...e coefficients. Until now all approaches which were able to improve throughput to an arbitrary extent were based on a combination of unfolding of the computation with block processing or interleaving =-=[33, 21, 34]. How-=-ever, this comes at the expense of a proportional degradation in latency. This long-standing Latency and Sample Period Bottleneck is broken by employing a novel combination of unfolding with "On-... |

45 |
Control System Design: An Introduction to State-Space Methods
- Friedland
(Show Context)
Citation Context ...SP and Linear System Theory that a special case of these LTI CDFGs, namely LTI CDFGs with single-input and zero initial state can always be transformed to certain standard (canonical) CDFG structures =-=[44]-=-. Since throughput and implementation cost (area, number of operations) have been the more popular metrics in traditional DSP, these standard CDFG structures have been developed and analyzed with thos... |

32 | Maximally fast and arbitrarily fast implementation of linear computations
- Potkonjak, Rabaey
- 1992
(Show Context)
Citation Context ...e coefficients. Until now all approaches which were able to improve throughput to an arbitrary extent were based on a combination of unfolding of the computation with block processing or interleaving =-=[33, 21, 34]. How-=-ever, this comes at the expense of a proportional degradation in latency. This long-standing Latency and Sample Period Bottleneck is broken by employing a novel combination of unfolding with "On-... |

14 |
Anatomy of a silicon compiler
- Brodersen
- 1992
(Show Context)
Citation Context ...ating word-parallel dedicated datapaths that are controlled by a central finitestate machine controller. HYPER generates the physical layout of the chips by using the LAGER silicon compilation system =-=[46]-=- at the backend - datapaths are generated using the datapath compiler in LAGER, the FSM controller using the logic synthesis and tiling tools in LAGER, the datapath control logic using standard cells,... |

11 |
Optimal automatic periodic multiprocessor scheduler for fully specified flow graphs
- Gelabert, Barnwell
- 1993
(Show Context)
Citation Context ... Throughput and Latency There are two independent metrics of speed, and a user may specify constraints on both of them as part of the system specification. The two metrics are Throughput, and Latency =-=[39, 2]-=-. n n n f r X n [ ] S n 1 -- [ ] n 0 1 2 3 ... , , , , { }sf n 0 1 2 3 ... , , , , { }sAsR RsBsR Psr n 0 1 2 3 ... , , , , { }sCsQ RsDsQ PsS n [ ] AS n 1 -- [ ] BX n [ ] + = Y n [ ] CS n 1 -- [ ] DX n... |

4 |
Rabaey: "Optimizing Resource Utilization Using Transformations
- Potkonjak, J
- 1991
(Show Context)
Citation Context ... performance [22, 23, 24, 25, 26, 27], but also for newer metrics such as power [28], and fault tolerance [29]. Sophisticated transformations have also been presented for functional [30] and software =-=[31, 32]-=- pipelining. 1.3 What is New? We have developed rigorous graph theoretic definitions of latency and throughput for general synchronous computation that also expose the interdependence between these tw... |

4 |
Pipelining: Just another transformation
- Potkonjak, Rabaey
- 1992
(Show Context)
Citation Context ... of the most often addressed transformation in high level synthesis research. For extensive coverage of pipelining in general computing literature we refer to [33, 40], and in high level synthesis to =-=[30, 41]-=-. We will use the following definition of pipelining: Pipelining with k pipeline stages on a CDFG is a special form of retiming where on each primary output (or input) k new delays are introduced, and... |

4 |
Architectural Transformation Program for Optimization of Digital Systems by Multi-level Decomposition
- Chatterjee, Roy, et al.
- 1993
(Show Context)
Citation Context ...re this structure provides an effective answer to the important problem of minimizing the number of shifts in LTI systems (the number of shifts dominates the implementation cost in bit-serial systems =-=[45]-=-) which has recently received significant attention [45]. Even when the number of states is arbitrarily high, the LTI system can be transformed by our approach such that the number of shifts is a smal... |

1 |
Delchamps. State-Space and Input-Output Linear Systems
- F
- 1988
(Show Context)
Citation Context ...common case of real valued data. A P-input, Q-output, R-state real-valued CDFG with real-valued data can be equivalently expressed by the following discrete-time finite-dimensional state-space system =-=[38]-=- where is the input vector, is the state vector, is the output vector, is the state-transition mapping, is the output mapping, and is the time index. , the initial state when the system starts operati... |