## Background Memory Area Estimation for Multi-dimensional Signal Processing Systems (1995)

Venue: | IEEE Trans. on VLSI Systems |

Citations: | 45 - 18 self |

### BibTeX

@ARTICLE{Balasa95backgroundmemory,

author = {Florin Balasa and Francky Catthoor and Hugo De Man},

title = {Background Memory Area Estimation for Multi-dimensional Signal Processing Systems},

journal = {IEEE Trans. on VLSI Systems},

year = {1995},

volume = {3},

pages = {157--172}

}

### Years of Citing Articles

### OpenURL

### Abstract

Memory cost is responsible for a large amount of the chip and/or board area of customized video and image processing system realizations. In this paper, we present a novel technique -- founded on data-flow analysis -- which allows to address the problem of background memory size evaluation for a given non-procedural algorithm specification, operating on multi-dimensional signals with affine indices. Most of the target applications are characterized by a huge number of signals, so a new polyhedral data-flow model operating on groups of scalar signals is proposed. These groups are obtained by a novel analytical partitioning technique, allowing to select a desired granularity, depending on the application complexity. The method incorporates a way to trade-off memory size with computational and controller complexity. 1 Introduction Speech, image and video processing applications involve a large amount of multi-dimensional signals which lead to large memory units. These result in significa...

### Citations

11248 |
Computers and Intractability. A Guide to the Theory of NP-Completeness. W.H.Freeman and co
- Garey, Johnson
- 1979
(Show Context)
Citation Context ... to deal with cyclic graphs have been proposed also in [15, 29]. Unfortunately, the same problem becomes significantly harder when the operation ordering is still not fixed, belonging to the NP class =-=[14]-=- even in the absence of conditionals. A straightforward way of estimating the number of memory locations when the operation scheduling is still not decided is by means of an accurate data-flow analysi... |

1508 |
Theory of linear and integer programming
- Schrijver
- 1986
(Show Context)
Citation Context ...lta isb g , where T = T 1 V 1 ; u = T 1 v 1 + u 1 A = " A 1 V 1 A 2 V 2 # ; b = " b 1 \Gamma A 1 v 1 b 2 \Gamma A 2 v 2 # Solving a linear Diophantine system was proven to be of polynomial c=-=omplexity [27]-=-. All the known methods are based on bringing the system matrix to the Hermite Normal Form [27]. They only differ from each other basically by the modality in which this representation is obtained. Nu... |

464 | The omega test: A fast and practical integer programming algorithm for dependence analysis
- Pugh
- 1992
(Show Context)
Citation Context ...blem in the non-procedural case, data-flow analysis has been consistently employed by us as the main exploration strategy [3]. In a recent work, data dependence information provided by the Omega test =-=[25]-=- has also been applied in memory size estimation [35]. Although the method is very appealing in terms of speed, the assumptions regarding loop hierarchy and control flow -- e.g. a nest of loops is exe... |

386 |
A loop transformation theory and an algorithm to maximize parallelism
- Wolf, Lam
- 1991
(Show Context)
Citation Context ...mming [12], and Fourier-Motzkin algorithm [9] are extensively used [25]. Data dependence has been employed to maximize the fine- or coarse-grain parallelism in loop nests, or to improve data locality =-=[38]-=-. As RMSP algorithms contain usually a huge amount of scalars, a data-flow analysis operating with groups of signals (rather than a flattened one operating with individual signals) is compulsory. On t... |

315 |
High-Level Synthesis: Introduction to Chip and System Design
- Gajski, Dutt, et al.
- 1992
(Show Context)
Citation Context ... e.g. [1, 2, 19, 29, 31]) where the control steps of production/consumption for each individual signal are determined beforehand. This applies also for memory/register estimation techniques (see e.g. =-=[18, 13]-=- and their references). This strategy is mainly due to the fact that applications targeted in conventional high-level synthesis contain a relatively small number of signals (at most of the order of ma... |

217 | Dataflow analysis of array and scalar references
- Feautrier
- 1991
(Show Context)
Citation Context ...Determining whether a dependence exists between two array references (see [5] for a good overview) is a crucial issue for doing code transformations. Methods based on parametrized integer programming =-=[12]-=-, and Fourier-Motzkin algorithm [9] are extensively used [25]. Data dependence has been employed to maximize the fine- or coarse-grain parallelism in loop nests, or to improve data locality [38]. As R... |

138 |
Diophantine equations
- Mordell
- 1969
(Show Context)
Citation Context ...rmation reduces the coefficients of an equation (except the smallest one) to at most 1/2 of the smallest coefficient, therefore employing usually a shorter sequence of unimodular transformations than =-=[22]-=- -- where the reduction factor is 1 , or [25] -- where the reduction factor is 2/3 . The basic idea is the Published in: IEEE Trans. on Comp.-aided Design, Vol.CAD-14, No., pp., 1995. See IEEE copyrig... |

134 |
Automated synthesis of data paths in digital systems,” Computer-Aided Design of Integrated Circuits and Systems
- Tseng, Siewiorek
- 1986
(Show Context)
Citation Context ...l signals in our CATHEDRAL script [36]. Note that this high-level memory management stage is fully complementary to the traditional high-level synthesis step known as "register allocation/assignm=-=ent" [31, 19, 15, 1, 29]-=- which deals with individual storage places for scalars, after scheduling. Part of this effort is also needed in the CATHEDRAL context, but this decision on scalar memory management is postponed to ou... |

126 |
Fourier-Motzkin elimination and its dual
- Dantzig, Eaves
- 1973
(Show Context)
Citation Context ...sts between two array references (see [5] for a good overview) is a crucial issue for doing code transformations. Methods based on parametrized integer programming [12], and Fourier-Motzkin algorithm =-=[9]-=- are extensively used [25]. Data dependence has been employed to maximize the fine- or coarse-grain parallelism in loop nests, or to improve data locality [38]. As RMSP algorithms contain usually a hu... |

115 |
Loop Transformations for Restructuring Compilers – The Foundations
- Banerjee
- 1993
(Show Context)
Citation Context ...es like that of Fig. 1. It is partly capable of handling non-procedural applications 2 , and the stream model is very well suited and yields good results for applications where the dependence vectors =-=[5]-=- have constant elements, as in many front-end video applications. By means of a hierarchical stream model, it is possible to handle multi-dimensional signals with complex affine indices, but only in n... |

104 |
An area model for on-chip memories and its application
- Mulder, Quach, et al.
- 1991
(Show Context)
Citation Context ...on Comp.-aided Design, Vol.CAD-14, No., pp., 1995. See IEEE copyright procedure 1995. to 1, for different cycle budgets allocated for read=write operations. According to the layout model presented in =-=[23]-=-, the actual silicon area occupied by the background memory is: A = TehnologyF actor \Delta bits \Delta (1 + ffP orts) \Delta (N + fi) \Delta [1 + 0:25(P orts + P orts rw \Gamma 2)] The evaluation of ... |

91 |
REAL: A Program for REgister ALlocation
- Kurdahi, Parker
- 1987
(Show Context)
Citation Context ...l signals in our CATHEDRAL script [36]. Note that this high-level memory management stage is fully complementary to the traditional high-level synthesis step known as "register allocation/assignm=-=ent" [31, 19, 15, 1, 29]-=- which deals with individual storage places for scalars, after scheduling. Part of this effort is also needed in the CATHEDRAL context, but this decision on scalar memory management is postponed to ou... |

87 | Complete register allocation problems
- Sethi
- 1973
(Show Context)
Citation Context ...th the resulting data-flow graph. Even the simpler problem of finding the minimum number of memory locations necessary to compute a directed acyclic graph has been proven to be an NP-complete problem =-=[28]-=-. Structurally, the DFGs determined as in Section 4 can be more complex: e.g. they may contain cycles, as 2 groups of signals may contain 2 subsets with opposite dependences to the Published in: IEEE ... |

59 |
Asymptotically fast triangularization of matrices over rings
- Hafner, McCurley
- 1991
(Show Context)
Citation Context ... recent results have been obtained, overcoming the main shortcoming of the classical methods, namely the "intermediate expression swell". A pathological example substantiating this effect is=-= given in [16]-=-. Therefore, several algorithms with provable polynomial worst-case complexity have been proposed (e.g. [16, 27]). The size of Diophantine systems in most of our practical cases does not justify the o... |

39 |
An algorithm for array variable clustering
- Ramachandran, Gajski, et al.
- 1994
(Show Context)
Citation Context ...se of the huge size of the ILP formulations. Exceptions to this are PHIDEO [20] -- where streams are used during the memory allocation but which is still done after scheduling, and recently also MeSA =-=[26]-=- -- where memory allocation cost is based both on a layout model and on the expected performance, but where the possibility of signals to share common storage locations when their life times are disjo... |

32 |
Memory Estimation for High Level Synthesis
- Verbauwhede, Scheers, et al.
- 1994
(Show Context)
Citation Context ...as been consistently employed by us as the main exploration strategy [3]. In a recent work, data dependence information provided by the Omega test [25] has also been applied in memory size estimation =-=[35]-=-. Although the method is very appealing in terms of speed, the assumptions regarding loop hierarchy and control flow -- e.g. a nest of loops is executed in the sequence as specified in the source code... |

31 | Dataflowdriven memory allocation for multi-dimensional signal processing systems
- Balasa, Catthoor, et al.
- 1994
(Show Context)
Citation Context ...developed here, is useful in two different contexts. This research was done as part of solving, under the same application domain assumptions, the more general problem of background memory allocation =-=[4]-=-. The estimation model is targeted to (multi-port) random access memories. Allocation aspects like memory hierarchy, bandwidth, latency, are beyond the scope of this paper. However, the method propose... |

30 |
Algorithm for Discovering the Set of All the Solutions of a Linear Programming Problem
- Chernikova
- 1968
(Show Context)
Citation Context ...utational effort of e.g. counting the lattice points of the resulting LBLs (see subsection 3.4). The technique employed for minimizing the set of linear constraints is based on Chernikova's algorithm =-=[8]-=- -- a nonpivoting method for finding all vertices of convex polytopes. The implementation in use stems from the polyhedral library developed at IRISA [37]. 3.4 Computation of the affine image size of ... |

25 |
Foreground memory management in data path synthesis
- Stok, Jess
- 1992
(Show Context)
Citation Context ...l signals in our CATHEDRAL script [36]. Note that this high-level memory management stage is fully complementary to the traditional high-level synthesis step known as "register allocation/assignm=-=ent" [31, 19, 15, 1, 29]-=- which deals with individual storage places for scalars, after scheduling. Part of this effort is also needed in the CATHEDRAL context, but this decision on scalar memory management is postponed to ou... |

22 |
DSP specification using the Silage language
- Genin, Hilfinger, et al.
- 1990
(Show Context)
Citation Context ... approaches entail a loss of the code regularity. According to our experience, this leads to an unacceptable growth of the controller size. In addition, a behavioural description language like SILAGE =-=[17]-=- is by definition non-procedural: besides the natural production-consumption order imposed by the dependence relations existent in the code, there is much freedom left in the execution ordering. This ... |

21 |
Background memory management for the synthesis of algebraic algorithms on multi-processor dsp chips
- Verbauwhede, Catthoor, et al.
- 1989
(Show Context)
Citation Context ...a and power consumption cost in (application-specific) architectures for real-time multi-dimensional signal processing (RMSP), in most cases dominating the data-path contribution for complete systems =-=[34, 20]. A -=-128 kbit embedded SRAM takes about 115mm 2 in a 1.2 ��m CMOS technology, as compared to 5 to 15mm 2 for an entire (very) complex data-path. The difference in importance would increase further if t... |

19 | transformation methodology for fixed-rate video, image and telecom processing applications
- Catthoor, Man
- 1994
(Show Context)
Citation Context ...own that memory management decisions, taken before scheduling, can reduce significantly the background memory cost, while the freedom for data-path allocation and scheduling remains almost unaffected =-=[6]-=-. Furthermore, within the scheduling-directed view, many examples are untractable because of the huge size of the ILP formulations. Exceptions to this are PHIDEO [20] -- where streams are used during ... |

18 |
Post-processor for data path synthesis using multiport memories
- AHMAD, CHEN
- 1991
(Show Context)
Citation Context |

18 |
Compiling multidimensional data streams into distributed DSP ASIC memory
- Vanhoof, Bolsens, et al.
- 1991
(Show Context)
Citation Context ...be exploited by a designer in order to meet his goals -- e.g. to take profit of the parallelism "hidden" in the code. The current in-place mapping and address generation tools of the CATHEDR=-=AL system [33]-=- cannot handle the example of Fig. 1 if e.g. lines (1) and (5) are interchanged, as Published in: IEEE Trans. on Comp.-aided Design, Vol.CAD-14, No., pp., 1995. See IEEE copyright procedure 1995. they... |

16 |
Area{Time Model for Synthesis of Non{Pipelines Designs
- Jain, Mlinar, et al.
- 1988
(Show Context)
Citation Context ... e.g. [1, 2, 19, 29, 31]) where the control steps of production/consumption for each individual signal are determined beforehand. This applies also for memory/register estimation techniques (see e.g. =-=[18, 13]-=- and their references). This strategy is mainly due to the fact that applications targeted in conventional high-level synthesis contain a relatively small number of signals (at most of the order of ma... |

14 |
Compiler techniques for massive parallel architectures
- Thiele
- 1992
(Show Context)
Citation Context ...rized by the collection of definition (left-hand side) and operand (right-hand side occurrences) domains for that signal [36]. Each domain has an index space which is a linearly bounded lattice (LBL) =-=[30]-=- -- the image of an affine mapping over a set of linear inequalities representing a polytope 5 : f x = T \Delta i + u j A \Delta isb g (1) where x2 Z m is the coordinate vector of an m-dimensional sig... |

10 |
Modeling multidimensional data and control flow
- Franssen, Balasa
- 1993
(Show Context)
Citation Context ...l requirement, tackling the problem under more general conditions. The motivation for this is given in Section 2. 1 However, extensions to an even more general modelling have been described by us too =-=[11]-=-. Published in: IEEE Trans. on Comp.-aided Design, Vol.CAD-14, No., pp., 1995. See IEEE copyright procedure 1995. The memory size estimation kernel as developed here, is useful in two different contex... |

7 |
An efficient microcode compiler for application-specific DSP processors
- Goossens, Rabaey, et al.
- 1990
(Show Context)
Citation Context |

7 |
A.van der Werf, "Allocation of multiport memories for hierarchical data streams
- Lippens, Verhaegh
- 1993
(Show Context)
Citation Context ...a and power consumption cost in (application-specific) architectures for real-time multi-dimensional signal processing (RMSP), in most cases dominating the data-path contribution for complete systems =-=[34, 20]. A -=-128 kbit embedded SRAM takes about 115mm 2 in a 1.2 ��m CMOS technology, as compared to 5 to 15mm 2 for an entire (very) complex data-path. The difference in importance would increase further if t... |

6 |
On counting lattice points in polyhedra
- Dyer
- 1991
(Show Context)
Citation Context ...ber of lattice points 7 of the image of a polytope is equal to the number of points of integer coordinates inside the polytope. The latter problem was tackled long ago,but only recently it was proven =-=[10]-=- the existence of a polynomial-time solution up to the 4-D case. As a more general solution -- able to handle signals of any dimension -- was needed, a novel technique based on the Fourier-Motzkin eli... |

5 |
Exact Evaluation of Memory Area for Multi-dimensional Processing Systems
- Balasa, Man
- 1993
(Show Context)
Citation Context ...r an ordering vector in that space. For solving the memory size estimation problem in the non-procedural case, data-flow analysis has been consistently employed by us as the main exploration strategy =-=[3]-=-. In a recent work, data dependence information provided by the Omega test [25] has also been applied in memory size estimation [35]. Although the method is very appealing in terms of speed, the assum... |

5 |
Motion estimation architecture for video compression
- Chan, Panchanathan
- 1993
(Show Context)
Citation Context ...ee RMSP applications: (1) a singular value decomposition (SVD) updating algorithm [21] -- an important algebraic kernel used e.g. in beamforming and Kalman filtering; (2) a motion detection algorithm =-=[7]-=- from a video coding application (see Fig. 1 for part of the code); (3) the kernel of a complex voice coding application -- essential component of a mobile radio terminal [32]. Table 1 shows the resul... |

4 |
et al., "Allocation of multiport memories in data path synthesis
- Balakrishnan
- 1988
(Show Context)
Citation Context ...e estimation: context and state-of-the-art To our knowledge, almost all techniques for dealing with the allocation of storage units are scalar-oriented and employ a scheduling-directed view (see e.g. =-=[1, 2, 19, 29, 31]-=-) where the control steps of production/consumption for each individual signal are determined beforehand. This applies also for memory/register estimation techniques (see e.g. [18, 13] and their refer... |

4 |
SVD updating for tracking slowly time-varying systems
- Moonen, Dooren, et al.
- 1989
(Show Context)
Citation Context ...ATHEDRAL framework. It was tested on an HP 9000/735 workstation. The novel estimation method has been evaluated on three RMSP applications: (1) a singular value decomposition (SVD) updating algorithm =-=[21]-=- -- an important algebraic kernel used e.g. in beamforming and Kalman filtering; (2) a motion detection algorithm [7] from a video coding application (see Fig. 1 for part of the code); (3) the kernel ... |

4 |
High-level modeling of data and control flow for signal processing systems
- Swaaij, Franssen, et al.
- 1993
(Show Context)
Citation Context ...blished in: IEEE Trans. on Comp.-aided Design, Vol.CAD-14, No., pp., 1995. See IEEE copyright procedure 1995. high-level storage organization for the multi-dimensional signals in our CATHEDRAL script =-=[36]. Note tha-=-t this high-level memory management stage is fully complementary to the traditional high-level synthesis step known as "register allocation/assignment" [31, 19, 15, 1, 29] which deals with i... |

3 |
A specification and simulation front-end for hardware synthesis of digital signal processing applications
- Nachtergaele, Bolsens, et al.
- 1992
(Show Context)
Citation Context ...n in high-level synthesis. The first estimation methods are based on symbolic evaluation -- a scalar-oriented technique -- which consists in enumerating all indexed signals for all index combinations =-=[24]-=-. More recently, novel results have been obtained for the case when the algorithm specification is non-procedural. Modifications of the loop hierarchy and the sequence of execution as specified in the... |

3 |
A Library for Doing Polyhedral Operations, M.Sc
- Wilde
- 1993
(Show Context)
Citation Context ...constraints is based on Chernikova's algorithm [8] -- a nonpivoting method for finding all vertices of convex polytopes. The implementation in use stems from the polyhedral library developed at IRISA =-=[37]-=-. 3.4 Computation of the affine image size of a polytope When the linear function t : Z n ! Z m , defined by t(i) = Ti is injective, the number of lattice points 7 of the image of a polytope is equal ... |

1 |
et al., "Design of a voice coding ASIC with the CATHEDRAL II silicon compiler
- Vanhoof
- 1993
(Show Context)
Citation Context ...otion detection algorithm [7] from a video coding application (see Fig. 1 for part of the code); (3) the kernel of a complex voice coding application -- essential component of a mobile radio terminal =-=[32]-=-. Table 1 shows the results obtained by the presented approach concerning the storage size, in comparison with those yielded by s2p=agora -- the memory tool currently employed by the CATHEDRAL system ... |