## Linear Analysis and Optimization of Stream Programs (2003)

### Cached

### Download Links

Venue: | In PLDI |

Citations: | 24 - 8 self |

### BibTeX

@INPROCEEDINGS{Lamb03linearanalysis,

author = {Andrew A. Lamb and William Thies and Saman Amarasinghe},

title = {Linear Analysis and Optimization of Stream Programs},

booktitle = {In PLDI},

year = {2003}

}

### Years of Citing Articles

### OpenURL

### Abstract

As more complex DSP algorithms are realized in practice, there is an increasing need for high-level stream abstractions that can be compiled without sacrificing efficiency. Toward this end, we present a set of aggressive optimizations that target linear sections of a stream program. Our input language is StreamIt, which represents programs as a hierarchical graph of autonomous filters. A filter is linear if each of its outputs can be represented as an affine combination of its inputs. Linearity is common in DSP components; examples include FIR. filters, expanders, compressors, FFTs and DCTs.

### Citations

5266 |
Design Patterns: Elements of Reusable Object-Oriented Software
- Gamma, Helm, et al.
- 1994
(Show Context)
Citation Context ..., and push. At compile time, the structure of the SIR mirrors the hierarchal stream structure present in the original program. Compiler passes in KOPI are implemented using the visitor design pattern =-=[9]-=- to visit each IR node. There are two types of visitors. One type iterates over a program’s SIR representation (the stream structure) and the other type iterates over the IR nodes in a given function ... |

572 | Automatic discovery of linear restraints among variables of a program
- Cousot, Halbwachs
- 1978
(Show Context)
Citation Context ... while StreamIt includes support for code generation and whole-program development. In addition to ADE, other work on DSP algorithm development is surveyed in [14]. Karr [11] and Cousot and Halbwachs =-=[2]-=- describe general methods for detecting linear relationships among program variables. Karr maintains an affine representation (similar to ours) for each program variable, while Cousot and Halbwachs us... |

488 | The synchronous dataflow programming language lustre
- Halbwachs, Caspi, et al.
- 1991
(Show Context)
Citation Context ...output data in its init function * and the oly work that occurs in the work function * is pushing the data on to the tape and doing some * buffer management. **/ void->float filter FloatSource { float=-=[16]-=- inputs; int idx; init { for(int i=0; i<16; i++) { inputs[i] = i; } idx = 0; } work push 1 { push(inputs[idx]); idx = (idx + 1) % 16; } } Figure A-3: Source code for the FIR benchmark. /** * This filt... |

449 | FFTW: an adaptive software architecture for the FFT
- Frigo, Johnson
- 1998
(Show Context)
Citation Context ...FT. There are myriads of other ways to optimize the computation of the DFT and a wide body of literature devoted to the subject. For a very fast runtime implementation, the author suggests using FFTW =-=[6, 7, 8]-=-, which is a library for calculating the FFT which uses runtime tuning to maximize performance. 33s34sChapter 3 Linear Analysis In this chapter, we first describe a matrix framework which describes li... |

387 | Dynamo: a transparent dynamic optimization system - Bala, Duesterwald, et al. - 2000 |

296 | Streamit: A language for streaming applications
- THIES, KARCZMAREK, et al.
(Show Context)
Citation Context ...led without any performance penalty. In this paper, we develop a set of optimizations that lower the entry barrier for high-performance stream programming. Our work is done in the context of StreamIt =-=[7, 19]-=-, which is a high-level language for signal processing applications. A program in StreamIt is comprised of a set of concurrently executing filters, each of which contains its own address space and com... |

291 | Automated empirical optimizations of software and the atlas project
- Whaley, Petitet, et al.
- 2000
(Show Context)
Citation Context ...e matrix multiplication routine, as well as atLxiliary effects on the runtime overhead in the StreamIt library. We have experimented with using the machine-tuned ATLAS library for the matrLx multiply =-=[20]-=-, but performance varies widely: linear replacement with ATLAS performs anywhere from-36% (on FMRadio) to 58% (on Oversampler) better than it does with our own matrix multiply routine, and average per... |

178 | The RAW microprocessor: a computational fabric for software circuits and general-purpose programs
- Taylor, Kim, et al.
- 2002
(Show Context)
Citation Context ...o code generation backends. The uniprocessor backend generates sequential C code that is compiled and linked against a supporting library. The second backend generates code for the Raw microprocessor =-=[35, 24]-=-, which features a grid of processors interconnected via various communication structures. We chose to use the uniprocessor backend for our measurements to control for the variability inherent in mapp... |

160 |
Affine Relationships Among Variables of a Program
- Karr
- 1976
(Show Context)
Citation Context ...ded for algorithm exploration, while StreamIt includes support for code generation and whole-program development. In addition to ADE, other work on DSP algorithm development is surveyed in [14]. Karr =-=[11]-=- and Cousot and Halbwachs [2] describe general methods for detecting linear relationships among program variables. Karr maintains an affine representation (similar to ours) for each program variable, ... |

153 | A fast Fourier transform compiler
- Frigo
- 1999
(Show Context)
Citation Context ...FT. There are myriads of other ways to optimize the computation of the DFT and a wide body of literature devoted to the subject. For a very fast runtime implementation, the author suggests using FFTW =-=[6, 7, 8]-=-, which is a library for calculating the FFT which uses runtime tuning to maximize performance. 33s34sChapter 3 Linear Analysis In this chapter, we first describe a matrix framework which describes li... |

139 |
The ESTEREL Synchronous Programming Language
- Berry, Gonthier
- 1992
(Show Context)
Citation Context ...taining more precise linearity information. A number of other programming languages are oriented around a notion of a stream; see [31] for a survey. Synchronous languages such as LUSTRE [16], Esterel =-=[2]-=-, and Signal [12] target the embedded domain, while languages such as Occam [17], SISAL [11] and StreamC [28] target parallel and vector targets. However, none of the compilers for these languages hav... |

123 | A bandwidth-efficient architecture for media processing
- Rixner, Dally, et al.
- 1998
(Show Context)
Citation Context ...tion of a stream; see [31] for a survey. Synchronous languages such as LUSTRE [16], Esterel [2], and Signal [12] target the embedded domain, while languages such as Occam [17], SISAL [11] and StreamC =-=[28]-=- target parallel and vector targets. However, none of the compilers for these languages have coarse-grained, DSP-specific analyses such as linear filter detection. Also note that the “linear data flow... |

85 | A Survey of Stream Processing
- Stephens
- 1997
(Show Context)
Citation Context ...ctions makes it feasible to symbolically execute all loops, thereby obtaining more precise linearity information. A number of other programming languages are oriented around a notion of a stream; see =-=[17] for a sur-=-vey. Also note that the "linear data flow analysis" of Ryan [16] is completely unrelated to our work; it aims to do program analysis in linear time. 9. CONCLUSION This paper presents a set o... |

82 | A Stream Compiler for Communication-Exposed Architectures
- Gordon, Thies, et al.
- 2002
(Show Context)
Citation Context ...mation. (a) A pipeline. (b) A splitjoin. (c) A feedbackloop. Figure 6: Stream structures supported by StreamIt. 1.2 StreamIt StreamIt is a language and compiler for high-performance signal processing =-=[6, 7, 19]-=-. In a streaming application, each data item is in the system for only a small amount of time, as opposed to scientific applications where the data set is used extensively over the entire execution. A... |

82 | Spl: A language and compiler for dsp algorithms
- Xiong, Johnson, et al.
(Show Context)
Citation Context ...nerates libraries for signal processing algorithms[8, 9, 4]. Using a feedback-directed search process, DSP transforms are optimized for the underlying architecture. The input language to SPIRAL is SPL=-=[22, 21]-=-, which provides a parameterizable way of expressing matrLx computations. Given a matrLx representation in SPL, SPIRAL generates formulas that correspond to different factorizations of the matrLx. It ... |

49 |
Saman Amarasinghe. StreamIt: A Language for Streaming Applications
- Thies, Karczmarek
- 2002
(Show Context)
Citation Context ...age programming as the only option. In this thesis, we develop a set of optimizations that lower the entry barrier for high-performance stream programming. Our work is done in the context of StreamIt =-=[15, 33]-=-, which is a high-level language for high performance signal processing applications. A program in StreamIt is comprised of a set of concurrently executing filters, each of which contains its own addr... |

31 |
Saman Amarasinghe, and Anant Agarwal. Baring it all to Software: Raw Machines
- Waingold, Taylor, et al.
- 1997
(Show Context)
Citation Context ...o code generation backends. The uniprocessor backend generates sequential C code that is compiled and linked against a supporting library. The second backend generates code for the Raw microprocessor =-=[35, 24]-=-, which features a grid of processors interconnected via various communication structures. We chose to use the uniprocessor backend for our measurements to control for the variability inherent in mapp... |

28 |
Saman Amarasinghe. A stream compiler for communication-exposed architectures
- Gordon, Thies, et al.
- 2002
(Show Context)
Citation Context ...age programming as the only option. In this thesis, we develop a set of optimizations that lower the entry barrier for high-performance stream programming. Our work is done in the context of StreamIt =-=[15, 33]-=-, which is a high-level language for high performance signal processing applications. A program in StreamIt is comprised of a set of concurrently executing filters, each of which contains its own addr... |

26 |
The Sisal model of functional programming and its implementation
- Gaudiot, DeBoni, et al.
- 1997
(Show Context)
Citation Context ...ented around a notion of a stream; see [31] for a survey. Synchronous languages such as LUSTRE [16], Esterel [2], and Signal [12] target the embedded domain, while languages such as Occam [17], SISAL =-=[11]-=- and StreamC [28] target parallel and vector targets. However, none of the compilers for these languages have coarse-grained, DSP-specific analyses such as linear filter detection. Also note that the ... |

17 |
Le Guernic, and Löic Besnard. Signal: A declarative language for synchronous programming of real-time systems
- Gautier, Paul
- 1987
(Show Context)
Citation Context ...cise linearity information. A number of other programming languages are oriented around a notion of a stream; see [31] for a survey. Synchronous languages such as LUSTRE [16], Esterel [2], and Signal =-=[12]-=- target the embedded domain, while languages such as Occam [17], SISAL [11] and StreamC [28] target parallel and vector targets. However, none of the compilers for these languages have coarse-grained,... |

14 |
Occam 2 Reference Manual
- Corporation
- 1988
(Show Context)
Citation Context ...ages are oriented around a notion of a stream; see [31] for a survey. Synchronous languages such as LUSTRE [16], Esterel [2], and Signal [12] target the embedded domain, while languages such as Occam =-=[17]-=-, SISAL [11] and StreamC [28] target parallel and vector targets. However, none of the compilers for these languages have coarse-grained, DSP-specific analyses such as linear filter detection. Also no... |

10 | Searching for the best FFT formulas with the SPL compiler
- Johnson, Johnson, et al.
- 2001
(Show Context)
Citation Context ...ormance gain. 8. RELATED WORK Several groups are researching strategies for efficient code generation for DSP applications. SPIRAL is a system that generates libraries for signal processing algorithms=-=[8, 9, 4]-=-. Using a feedback-directed search process, DSP transforms are optimized for the underlying architecture. The input language to SPIRAL is SPL[22, 21], which provides a parameterizable way of expressin... |

10 |
Polymorphous Computing Architecture (PCA) Example Applications and Description
- Lebak
- 2001
(Show Context)
Citation Context ...ld target detection; 4) FMRadio, an FM software radio with equalizer; 5) Radar, the core functionality in modern radar signal processors, based on a system from the Polymorphic Computing Architecture =-=[12]-=-; 6) FilterBank, a multi-rate signal decomposition processing block common in communications and image processing; 7) Vocoder, a channel voice coder, commonly used for speech analysis and compression;... |

9 |
An Algorithm Design Environment for Signal Processing
- Covell
- 1989
(Show Context)
Citation Context ...entary to these packages: it allows programmers to interface with them using general user-level code. ADE (A Design Environment) is a system for specifying, analyzing, and manipulating DSP algorithms =-=[3]-=-. ADE includes a rule-based system that can search for improved arrangements of stream algorithms using extensible transformation rules. However, the system uses predefined signal processing blocks th... |

9 | Constrained and Phased Scheduling of Synchronous Data Flow Graphs for the StreamIt Language
- Karczmarek
- 2002
(Show Context)
Citation Context ...raph will grow without bound in the steady state. A genU' mod U U U U' U U Figure 9: Expanding a linear node to rates (e', o', u'). eral method for scheduling StreamIt programs is given by Karczmarek =-=[10]-=-. A fundamental aspect of the steady-state schedule is that neighboring nodes might need to be fired at different frequencies. For example, if there are two filters F and Fs in a pipeline and F produc... |

7 |
Saman Amarasinghe. Linear analysis and optimization of stream programs
- Lamb, Thies
- 2003
(Show Context)
Citation Context ...rithm was both conceived and implemented by William Thies. We include this section in the interest of completeness — many of the results in Chapter 5 rely on it. For more information, please refer to =-=[22]-=- and [34]. 66s// types of transformations we consider for each stream enum Transform { ANY, LINEAR, FREQ, NONE } // a tuple representing a cost and a stream struct Config { int cost : cost of the conf... |

6 |
E.: Kopi Reference manual. http://www.dms.at/kopi/docs/kopi.html
- Gay-Para, Graf, et al.
- 2001
(Show Context)
Citation Context ...sult Figure 4-6: Algorithm for optimization selection (part three). 70s4.4 Implementation Notes This section presents some notes about our implementation. The StreamIt compiler is built upon the KOPI =-=[13]-=- java compiler infrastructure. The stream intermediate representation (SIR) is used as the internal representation of StreamIt programs. Each node of the SIR represents a stream construct: filter, pip... |

6 |
Computational Methods of Linear Algebra
- Sewell
- 2005
(Show Context)
Citation Context ...tails are well beyond the scope of this thesis. We present a derivation for the FFT which works for DFTs of sizes which are powers of two. The derivation follows closely the derivation given by Sewell=-=[30]-=-. 2.3.1 Notation, Definitions and Identities In this section, we will explain the notation and prove the key identities of complex exponentials used in the FFT derivation. The N-point DFT, X[k], for a... |

4 |
Automatic optimization of DSP algorithms
- Xiong
- 2001
(Show Context)
Citation Context ...nerates libraries for signal processing algorithms[8, 9, 4]. Using a feedback-directed search process, DSP transforms are optimized for the underlying architecture. The input language to SPIRAL is SPL=-=[22, 21]-=-, which provides a parameterizable way of expressing matrLx computations. Given a matrLx representation in SPL, SPIRAL generates formulas that correspond to different factorizations of the matrLx. It ... |

3 |
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Prgrams
- AL
- 2002
(Show Context)
Citation Context ...ion, frequency replacement, and optimization selection algorithms described in the previous sections. The implementation is part of the StreamIt compiler, and works for both the uniprocessor and Flaw =-=[13]-=- backends. In this section, we evaluate three configurations of linear optimizations for the uniprocessor backend:sLinear replacement, which transforms maximal linear sections of the stream graph into... |

3 |
Saman Amarasinghe. Partitioning a structured stream graph using dynamic programming
- Thies, Lin
- 2003
(Show Context)
Citation Context ... both conceived and implemented by William Thies. We include this section in the interest of completeness — many of the results in Chapter 5 rely on it. For more information, please refer to [22] and =-=[34]-=-. 66s// types of transformations we consider for each stream enum Transform { ANY, LINEAR, FREQ, NONE } // a tuple representing a cost and a stream struct Config { int cost : cost of the configuration... |

2 | Automatic derivation and implementation of signal processing algorithms
- Egner, Johnson, et al.
(Show Context)
Citation Context ...ormance gain. 8. RELATED WORK Several groups are researching strategies for efficient code generation for DSP applications. SPIRAL is a system that generates libraries for signal processing algorithms=-=[8, 9, 4]-=-. Using a feedback-directed search process, DSP transforms are optimized for the underlying architecture. The input language to SPIRAL is SPL[22, 21], which provides a parameterizable way of expressin... |

2 |
data flow analysis
- Ryan, Linear
- 1992
(Show Context)
Citation Context ...ing more precise linearity information. A number of other programming languages are oriented around a notion of a stream; see [17] for a survey. Also note that the "linear data flow analysis"=-=; of Ryan [16]-=- is completely unrelated to our work; it aims to do program analysis in linear time. 9. CONCLUSION This paper presents a set of automated analyses for detecting, analyzing, and optimizing linear filte... |

1 |
A Fast Fourier Transform Compiler
- Fidgo
- 1999
(Show Context)
Citation Context ... each column.sFrequency replacement, which transforms maximal linear sections of the stream graph into a single node in the frequency domain. To implement the necessary basis conversions, we use FFTW =-=[5]-=-, which is an adaptive and high-performance FFT library.sAutomatic selection, which employs both of the previous transformations judiciously in order to obtain the maximal benefit. This works accordin... |

1 |
TMS320C54x DSP Reference Set, volume 2: Mnemonic Instruction Set
- Instruments
- 2001
(Show Context)
Citation Context ...be tailored to a specific architecture and code generation strategy. For example, if there is architecture-level support for convolution operations (such as the the FIRS instruction in the TMS320C54x =-=[18]-=-), then this would effect the cost for certain dimensions of matrices; similarly, if a matrix multiplication algorithm is available that exploits symmetry or sparsity in a matrix, then this benefit co... |

1 |
Homepage of FFTW. http://www.fftw.org
- Frigo, Johnson
(Show Context)
Citation Context ...FT. There are myriads of other ways to optimize the computation of the DFT and a wide body of literature devoted to the subject. For a very fast runtime implementation, the author suggests using FFTW =-=[6, 7, 8]-=-, which is a library for calculating the FFT which uses runtime tuning to maximize performance. 33s34sChapter 3 Linear Analysis In this chapter, we first describe a matrix framework which describes li... |

1 |
Digital signal processors: Past, present, and future
- Gass
(Show Context)
Citation Context ... in complexity, these factors will become unmanageable. There is a pressing need for high-level DSP abstractions that can be compiled without any performance penalty. 17sAccording to Texas Instruments=-=[10]-=-, more than fifty percent of the code that runs the DSPs in a modern cell phone is written in assembly (the rest is written in annotated C). Even provided the best available C compilers, programmers m... |