## Divide-and-Conquer Techniques for Global Throughput Optimization (1996)

Venue: | Proc. IEEE VLSI Signal Processing Workshop |

Citations: | 3 - 2 self |

### BibTeX

@INPROCEEDINGS{Guerra96divide-and-conquertechniques,

author = {Lisa Guerra and Miodrag Potkonjak and Jan Rabaey},

title = {Divide-and-Conquer Techniques for Global Throughput Optimization},

booktitle = {Proc. IEEE VLSI Signal Processing Workshop},

year = {1996},

pages = {137--146}

}

### OpenURL

### Abstract

This paper proposes a divide-and-conquer approach for global throughput optimization which not only leverages upon existing techniques, but enables their more effective and coordinated use. The "divide" approach consists of logical partitioning of the computation into subparts falling into one of a set of preclassified computation types. The subparts are then "conquered" through coordinated application of existing optimization techniques. We have characterized a set of techniques in terms of their expected effect on throughput, and can thus select the most promising techniques for each unique situation. The technique is not limited to a specific class of computations and gives higher, or at worst equal, improvement than previously reported techniques on all examples. 1.0 Introduction Throughput optimization techniques remain important for meeting the sampling rate requirements of modern DSP and communication applications. Though clock rates for ASICs and general purpose computing devi...

### Citations

992 | Depth first search and linear graph algorithms
- Tarjan
- 1972
(Show Context)
Citation Context ...inside feedback cycles, and those outside of feedback cycles. This is done by identifying the computation's strongly connected components (SCCs), using the standard depth-first search-based algorithm =-=[Tar72]-=-. For any pair of operations A and B within a SCC, there exist both a path from A to B, and one from B to A. All operations in non-trivial SCCs (those with more than one operation), are thus part of a... |

480 | Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing
- Lee, Messerschmitt
- 1987
(Show Context)
Citation Context ...ng precedences. The computations operate on periodic semi-infinite streams of inputs to produce semi-infinite streams of outputs. The underlying computation model is homogeneous synchronous data flow =-=[Lee87]-=-, a model widely used in application domains such as DSP, video and image processing, communications, and control. Under this model, the operators consume a single sample from each input and produce a... |

378 |
A loop transformation theory and an algorithm to maximize parallelism
- Wolf, Lam
- 1991
(Show Context)
Citation Context ...ization [McK65], the use of static scripts [Bra84], exhaustive implicit search-based "generate and test" methods [Mas87], linear algebra-based ordering of subsets of loop-control flow transf=-=ormations [Wol91]-=-, generic probabilistic iterative improvement techniques [Cha92], bottleneck removal methods [Hua94], and microscopic and special-domain enabling-effect based techniques [Whi90, Sri95a]. The proposed ... |

360 |
Sangiovanni_Vincentelll ""Logic Minimization Algorithms for VLSI Synthesis" Kluwer Academic Pub
- Brayton, Hachte, et al.
- 1984
(Show Context)
Citation Context ...echniques for throughput improvement, but also effectively ordering and coordinating sets of individual techniques. Popular techniques include peephole optimization [McK65], the use of static scripts =-=[Bra84], exhausti-=-ve implicit search-based "generate and test" methods [Mas87], linear algebra-based ordering of subsets of loop-control flow transformations [Wol91], generic probabilistic iterative improveme... |

88 | Superoptimizer: a look at the smallest program
- Massalin
- 1987
(Show Context)
Citation Context ...d coordinating sets of individual techniques. Popular techniques include peephole optimization [McK65], the use of static scripts [Bra84], exhaustive implicit search-based "generate and test"=-=; methods [Mas87]-=-, linear algebra-based ordering of subsets of loop-control flow transformations [Wol91], generic probabilistic iterative improvement techniques [Cha92], bottleneck removal methods [Hua94], and microsc... |

64 | Pipeline interleaving and parallelism in recursive digital filtersâ€”Part I: Pipeling using scattered look-ahead and decomposition - Parhi, Messerschmitt - 1989 |

48 | An approach to ordering optimizing transformations - Whitfield, Soffa - 1990 |

32 |
Peephole optimization
- McKeeman
- 1965
(Show Context)
Citation Context ...s involve not only using isolated techniques for throughput improvement, but also effectively ordering and coordinating sets of individual techniques. Popular techniques include peephole optimization =-=[McK65], the use -=-of static scripts [Bra84], exhaustive implicit search-based "generate and test" methods [Mas87], linear algebra-based ordering of subsets of loop-control flow transformations [Wol91], generi... |

32 | Maximally fast and arbitrarily fast implementation of linear computations
- Potkonjak, Rabaey
- 1992
(Show Context)
Citation Context ...ffective critical path remains 3 after unfolding). As illustrated in Figure 2, the critical path in the unfolded structure can be reduced from 6 to 3, however, using the "maximally fast" tec=-=hnique of [Pot92]-=-. The technique applies a static script of algebraic and redundancy manipulation transformations to reduce the critical path to , where NumStates is the number of internal states that have to be compu... |

30 | Optimum and Heuristic Transformation Techniques for Simultaneous Optimization of Latency and Throughput
- Srivastava, Potkonjak
- 1995
(Show Context)
Citation Context ...umber of states and k is the unfolding factor. This technique can also be used for feedback linear computations. For linear computations that are also timeinvariant and single input, the technique of =-=[Sri95a]-=- is an alternate option that often results in lower area overhead. Note that if an entire computation is linear (as compared to just a sub-part), the mentioned techniques can be applied directly, with... |

17 |
New algorithms and lower bounds for the parallel evaluation of certain rational expressions and recurrences
- Kung
- 1976
(Show Context)
Citation Context ...d in Figure 6d to give a critical path of 4 (Figure 6f). Overall the critical path is reduced from 6 to 4. Arbitrary speed up is not known to be achievable for this class of computation. Kung's proof =-=[Kun76]-=- that speed up is limited to a constant factor for some classes of computations, while made under a slightly different set of assumptions, suggests that there indeed exist computations that cannot be ... |

14 |
Maximizing the throughput of high performance DSP applications using behavioral transformations
- Huang, Rabaey
- 1994
(Show Context)
Citation Context ...est" methods [Mas87], linear algebra-based ordering of subsets of loop-control flow transformations [Wol91], generic probabilistic iterative improvement techniques [Cha92], bottleneck removal met=-=hods [Hua94]-=-, and microscopic and special-domain enabling-effect based techniques [Whi90, Sri95a]. The proposed approach differs from previous methods in several key aspects. Firstly, it uses logical partitioning... |

12 | Instruction set mapping for performance optimization
- Corazao, Khalaf, et al.
- 1993
(Show Context)
Citation Context ...s Numerous techniques for throughput optimization have been proposed, many based on transformations [Mes88, Par89, Fet90], but some also based on template-matching [Not91, Cor93], and clock selection =-=[Cor93]-=-. Previous approaches involve not only using isolated techniques for throughput improvement, but also effectively ordering and coordinating sets of individual techniques. Popular techniques include pe... |

9 |
Software's chronic crisis", Scientific American
- Gibbs
- 1994
(Show Context)
Citation Context ...nd communication applications. Though clock rates for ASICs and general purpose computing devices have been doubling every two years, required sampling rates have been increasing at even higher rates =-=[Gib94]-=-. Techniques for reducing throughput are also widely used during the optimization of other design metrics such as area and power. For example, a highly effective power minimization techniques involves... |

6 |
HyperLP: A Design System for Power Minimization using Architectural Transformations
- Chandrakasan, Potkonjak, et al.
- 1992
(Show Context)
Citation Context ... during the optimization of other design metrics such as area and power. For example, a highly effective power minimization techniques involves using throughput optimization to enable voltage scaling =-=[Cha92]-=-. Numerous approaches have been proposed for throughput optimization. However, most have limited scope with respect to either application domain or the set of techniques utilized. As a consequence, wh... |

5 | Algorithm transformations for unlimited parallelism," submitted to - Fettweis, Thiele, et al. - 1990 |

4 | Breaking the recursive bottleneck," Performance Limits in Communication: Theory and Practice - Messerschmitt - 1988 |

3 | Energy efficient implementation of linear systems on programmable processors - Srivastava, Potkonjak - 1994 |

1 | Cathedral-III: Architectural-driven high level synthesis for high throughput DSP applications - Note, Geurts, et al. - 1991 |