## Mapping Regular Recursive Algorithms To Fine-Grained Processor Arrays (1994)

Citations: | 3 - 2 self |

### BibTeX

@TECHREPORT{Ganapathy94mappingregular,

author = {Kumar Nanjunda Ganapathy},

title = {Mapping Regular Recursive Algorithms To Fine-Grained Processor Arrays},

institution = {},

year = {1994}

}

### OpenURL

### Abstract

With the continuing growth of VLSI technology, special-purpose parallel processors have become a promising approach in the quest for high performance. Fine-grained processor arrays have become popular as they are suitable for solving problems with a high degree of parallelism, and can be inexpensively built using custom designs or commercially available field programmable gate arrays (FPGA). Such specialized designs are often required in portable computing and communication systems with real-time constraints, as softwarecontrolled processors often fail to provide the necessary throughput. This thesis addresses many issues in designing such application-specific systems built with fine-grained processor arrays for regular recursive uniform dependence algorithms. A uniform dependence algorithm consists of a set of indexed computations and a set of uniform dependence vectors which are independent of the indices of computations. Many important applications in signal/image processing, commun...

### Citations

2564 |
h~ Design and Analysis of Computer Algorithms
- Hopcroft, Ullman
- 1974
(Show Context)
Citation Context ...ms. Typical examples include computing the transitive closure and the shortest paths of a graph. Other problems in the class of APPs include the generation of regular expressions from finite automata =-=[45]-=-, matrix inversion, and Gauss-Jordan elimination. For an overview of applications of APPs see references [46, 47, 48]. A common algebraic framework for graph algorithms and numerical algorithms in ter... |

472 | Lowpower cmos digital design
- Chandrakasan, Sheng, et al.
- 1992
(Show Context)
Citation Context ...roportional to the square of the number of PEs (Section 5.4.2). 5.5.3 Power There are three major sources of power dissipation in digital CMOS circuits, which are summarized in the following equation =-=[75]-=- : P = p t (CL :V:V dd :f clk ) + I sc :V dd + I leakage :V dd : (5.17) The first term represents the switching component of power, where CL is the loading capacitance,sf clk is the clock frequency, V... |

359 |
Dependence analysis for supercomputing
- Banerjee
- 1988
(Show Context)
Citation Context ...nment statements of the form of Eq. (1.3), i.e., Z i (y( ~ J)) = OE h Z 1 (x 1 ( ~ J)); . . . ; Z r (x r ( ~ J)) i ; 1sisr: Each appearance of a variable on the right-hand side may cause a dependence =-=[22]-=-. If all loop bounds l i and u i , i = 1; . . . ; n, are linear functions of index variables j 1 ; . . . ; j i\Gamma1 , then the set of all iteration vectors ~ J of the loop can be described by a conv... |

249 |
Supernode partitioning
- Irigoin, Triolet
- 1988
(Show Context)
Citation Context ...cted component (which is the case for the algorithms we consider), independent partitioning results only in one block, i.e., the entire DG. A technique called supernode partitioning has been proposed =-=[72]-=-, in which the goal is to partition the nodes that depend on each other and reduce communication between supernodes by propagating results inside the supernode. However, a systematic way to find such ... |

229 |
Why systolic architectures
- Kung
- 1982
(Show Context)
Citation Context ...pment of VLSI computing techniques has had a significant impact on the development of novel computer architectures. One class of architectures, the so-called systolic arrays, first introduced by Kung =-=[8, 9]-=-, has gained popularity because of its ability to exploit massive parallelism and pipelining to achieve high performance. Informally, a systolic system can be envisaged as an array of synchronized pro... |

191 |
M.Minoux, Graphs and Algorithms
- Gondran
- 1984
(Show Context)
Citation Context ...ms in the class of APPs include the generation of regular expressions from finite automata [45], matrix inversion, and Gauss-Jordan elimination. For an overview of applications of APPs see references =-=[46, 47, 48]-=-. A common algebraic framework for graph algorithms and numerical algorithms in terms of APPs was first achieved by Lehmann [49]. The application of APPs in global flow analysis of programs useful for... |

129 | iWarp: an integrated solution to high-speed parallel computing
- Borkar, Cohen, et al.
- 1988
(Show Context)
Citation Context ...of minimizing the number of accesses over the limited bandwidth connection is considered in the mapping process. This is in contrast to other approaches of building general-purpose systolic computers =-=[24, 25, 26, 27, 7]-=-. Thus, Chapter 5 discusses design methods under constraints of fixed bandwidth and area, and objectives of yield (clock frequency) or speedup, and number of accesses. The mapping process incorporates... |

102 |
Highly Parallel Computing
- Almasi, Gottlieb
- 1989
(Show Context)
Citation Context ...ed-loop programs Affine dependence algorithms are common in image processing, digital signal processing, and other scientific applications in which regular compute-intensive operations are required 6 =-=[12, 20, 21]-=-. In practice, many of the algorithms to be executed by processor arrays are described in a procedural high-level language such as FORTRAN. Nested loops are often the most time consuming kernels of th... |

92 |
Partitioning and mapping algorithms into fixed size systolic arrays
- Moldovan, Fortes
- 1986
(Show Context)
Citation Context ...e parameters. The proposed approach employs an efficient search technique to explore the design space and arrive at the optimal designs. Equivalence between the parameter and dependence-based methods =-=[1, 2, 3]-=- can be used to find optimal iii designs in the dependence-based approaches. The GPM has also been extended to derive optimal two-level pipelined algorithm-specific processor arrays. Such two-level pi... |

90 |
Leierson, Systolic Arrays for VLSI
- Kung, E
- 1987
(Show Context)
Citation Context ...pment of VLSI computing techniques has had a significant impact on the development of novel computer architectures. One class of architectures, the so-called systolic arrays, first introduced by Kung =-=[8, 9]-=-, has gained popularity because of its ability to exploit massive parallelism and pipelining to achieve high performance. Informally, a systolic system can be envisaged as an array of synchronized pro... |

72 |
A Unified Approach to Path Problems
- Tarjan
- 1981
(Show Context)
Citation Context ...rithms and numerical algorithms in terms of APPs was first achieved by Lehmann [49]. The application of APPs in global flow analysis of programs useful for code optimization is discussed in reference =-=[50]-=-. Two-dimensional processor arrays for finding transitive closures have been presented before [51, 52]. In this section we synthesize a one-pass linear processor array for the transitive-closure probl... |

68 | The Mapping of Linear Recurrence Equations on Regular Arrays
- Quinton, Dongen
- 1989
(Show Context)
Citation Context ...ct variables (which may have identical values) are used to compute distinct C(i; j; k). Procedures for uniformization and broadcast removal share many similarities and are discussed in the references =-=[14, 15, 16, 17, 18, 19]-=-. 1.2.1 Relation to nested-loop programs Affine dependence algorithms are common in image processing, digital signal processing, and other scientific applications in which regular compute-intensive op... |

68 | Architecture and applications of connection machine
- Tucker, Robertson
- 1988
(Show Context)
Citation Context ...ed-loop programs Affine dependence algorithms are common in image processing, digital signal processing, and other scientific applications in which regular compute-intensive operations are required 6 =-=[12, 20, 21]-=-. In practice, many of the algorithms to be executed by processor arrays are described in a procedural high-level language such as FORTRAN. Nested loops are often the most time consuming kernels of th... |

66 |
and Combinatorial Optimization in Ordered Algebraic Structures
- Zimmermann, Linear
- 1981
(Show Context)
Citation Context ...ms in the class of APPs include the generation of regular expressions from finite automata [45], matrix inversion, and Gauss-Jordan elimination. For an overview of applications of APPs see references =-=[46, 47, 48]-=-. A common algebraic framework for graph algorithms and numerical algorithms in terms of APPs was first achieved by Lehmann [49]. The application of APPs in global flow analysis of programs useful for... |

61 |
Graphs and Networks
- Carré
- 1979
(Show Context)
Citation Context ...ms in the class of APPs include the generation of regular expressions from finite automata [45], matrix inversion, and Gauss-Jordan elimination. For an overview of applications of APPs see references =-=[46, 47, 48]-=-. A common algebraic framework for graph algorithms and numerical algorithms in terms of APPs was first achieved by Lehmann [49]. The application of APPs in global flow analysis of programs useful for... |

46 | The Warp Computer: Architecture, Implementation and Performance9
- ANNARATONE
- 1987
(Show Context)
Citation Context ...of minimizing the number of accesses over the limited bandwidth connection is considered in the mapping process. This is in contrast to other approaches of building general-purpose systolic computers =-=[24, 25, 26, 27, 7]-=-. Thus, Chapter 5 discusses design methods under constraints of fixed bandwidth and area, and objectives of yield (clock frequency) or speedup, and number of accesses. The mapping process incorporates... |

45 |
Leiserson, “Algorithms for VLSI Processor Arrays”, Introduction to VLSI Systems
- Kung, E
- 1980
(Show Context)
Citation Context ...e under the constraint of a low-bandwidth interconnect to main memory in our array processor. This work also differs from the traditional systolic array mapping/partitioning on fixed processor arrays =-=[1, 60, 61, 62, 63, 64, 65, 66]-=- by assuming only a limited storage in the processor array and by considering the effect of main-memory latency due to low-bandwidth interconnection to main memory. The goals of our design that are di... |

39 |
Systolic Algorithms and Architectures
- Quinton, Robert
- 1991
(Show Context)
Citation Context ...ossible earlier. Examples of these applications include interactive language (or speech) recognition, text recognition, virtual reality, database operations, and real-time image and signal processing =-=[12, 13]-=-. These applications require massive, repetitive parallel processing, and hence, systolic computing. A number of implementation issues determine a systolic array's performance and efficiency. Designer... |

39 |
Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors
- Peir, Cytron
- 1987
(Show Context)
Citation Context ... is broken down into blocks of maximum size that can executed efficiently in the PA. Methods to find independent partitions in which the communication between blocks is zero have been proposed before =-=[71, 2]-=-. However, when the original DG has only one connected component (which is the case for the algorithms we consider), independent partitioning results only in one block, i.e., the entire DG. A techniqu... |

35 |
On the analysis and synthesis of VLSI algorithms
- Moldovan
- 1982
(Show Context)
Citation Context ...ort has been devoted by numerous researchers to mapping uniform dependence algorithms to processor arrays systematically. Most of these methods are based on or derived from the dependency method (DM) =-=[36, 37, 38]-=-. An overview of the different methods can be found in the references [39, 12]. In the dependency method (denoted as DM), an algorithm (A) is represented as a 5-tuple (J n ; C; D;X;Y ), where J n is a... |

35 |
Direct VLSI implementation of combinatorial algorithms
- Guibas, Kung, et al.
- 1979
(Show Context)
Citation Context ...ph for a variety of recurrences in three variables (e.g., finding the longest common subsequence over three strings). Other computations include L-U factorization [6], a three-pass transitive closure =-=[54]-=-, matrix triangularization, matrix inversion [52], and two-dimensional tuple comparison [42]. A special case of the 59 matrix-product is the matrix-vector product, which models FIR-filtering, convolut... |

33 |
Synthesizing systolic arrays with control signals from recurrence equations
- Rajopadhye
- 1989
(Show Context)
Citation Context ...ct variables (which may have identical values) are used to compute distinct C(i; j; k). Procedures for uniformization and broadcast removal share many similarities and are discussed in the references =-=[14, 15, 16, 17, 18, 19]-=-. 1.2.1 Relation to nested-loop programs Affine dependence algorithms are common in image processing, digital signal processing, and other scientific applications in which regular compute-intensive op... |

32 |
Mapping nested loop algorithms into multidimensional systolic arrays
- Lee, Kedem
- 1990
(Show Context)
Citation Context ...rm recurrences. However, in DM, the generality in representation leads to large search spaces for optimal designs, as the problem of finding optimal designs is posed as an integer programming problem =-=[2, 40]-=-. In contrast, the method presented in this chapter, the General Parameter Method (GPM), is restricted to uniform recurrences, but can be used to generate optimal designs for user-specified objectives... |

31 |
Synthesizing Linear Array Algorithms from Nested For Loop Algorithms
- Lee, Kedem
- 1988
(Show Context)
Citation Context ...e parameters. The proposed approach employs an efficient search technique to explore the design space and arrive at the optimal designs. Equivalence between the parameter and dependence-based methods =-=[1, 2, 3]-=- can be used to find optimal iii designs in the dependence-based approaches. The GPM has also been extended to derive optimal two-level pipelined algorithm-specific processor arrays. Such two-level pi... |

30 |
The design of optimal systolic arrays
- Li, Wall
- 1985
(Show Context)
Citation Context ...hat the number of choices for matrix S could be very large or even infinite, making it difficult (or impossible) to enumerate over them. Initial work on parameter-based methods was done by Li and Wah =-=[42]-=- (denoted as OPM or Original Parameter Method) for a restricted set of uniform recurrences. They considered specifically 3-D and 2-D recurrences and mapped them to 2-D and 1-D processor arrays, respec... |

24 |
Algebraic structures for transitive closure
- Lehmann
- 1977
(Show Context)
Citation Context ...ination. For an overview of applications of APPs see references [46, 47, 48]. A common algebraic framework for graph algorithms and numerical algorithms in terms of APPs was first achieved by Lehmann =-=[49]-=-. The application of APPs in global flow analysis of programs useful for code optimization is discussed in reference [50]. Two-dimensional processor arrays for finding transitive closures have been pr... |

23 |
On uniformization of affine dependence algorithms
- Chen, Shang
- 1992
(Show Context)
Citation Context ...ct variables (which may have identical values) are used to compute distinct C(i; j; k). Procedures for uniformization and broadcast removal share many similarities and are discussed in the references =-=[14, 15, 16, 17, 18, 19]-=-. 1.2.1 Relation to nested-loop programs Affine dependence algorithms are common in image processing, digital signal processing, and other scientific applications in which regular compute-intensive op... |

22 |
On Time Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays
- Shang, Fortes
- 1992
(Show Context)
Citation Context ... is broken down into blocks of maximum size that can executed efficiently in the PA. Methods to find independent partitions in which the communication between blocks is zero have been proposed before =-=[71, 2]-=-. However, when the original DG has only one connected component (which is the case for the algorithms we consider), independent partitioning results only in one block, i.e., the entire DG. A techniqu... |

21 |
I/O complexity: the red blue pebble game
- Hong, Kung
- 1981
(Show Context)
Citation Context ...(n \Gamma 1)V n\Gamma1 p p + V N n : (5.5) Lemma 5.3 establishes that the above sequencing scheme for n-D meshes is asympototically optimal with respect to the number of accesses to the MM. Lemma 5.3 =-=[74] For n-dim-=-ensional meshes, Q =\Omega i V n\Gamma1 p S j , where S is the size of the limited memory and Q is the I/O complexity. 121 m N+m p - 1 m N+m p p N+1 m "Storage tile" 6 N m + N = 4m, p = 4 7 ... |

14 |
Obtaining dependence vectors for nested-loop computations
- Ribas
- 1990
(Show Context)
Citation Context ...nd (iii) branch statements are defined in terms of the loop variables j 1 ; . . . ; j i\Gamma1 , and do not go outside the loop containing the branch statement. Given a nested loop program, reference =-=[23]-=- describes how to obtain the set of uniform dependencies using the techniques of uniformization. Example 1.3 It is easy to see that the following nested-loop program corresponds to the pipelined versi... |

14 |
Systematic design approaches for algorithmically specified systolic arrays
- Fortes, Fu, et al.
- 1988
(Show Context)
Citation Context ...hms to processor arrays systematically. Most of these methods are based on or derived from the dependency method (DM) [36, 37, 38]. An overview of the different methods can be found in the references =-=[39, 12]-=-. In the dependency method (denoted as DM), an algorithm (A) is represented as a 5-tuple (J n ; C; D;X;Y ), where J n is a finite n-dimensional index set of A; C is the set of triples that represents ... |

13 |
Optimization and Interconnection Complexity for: Parallel Processors, Single-Stage Networks and Decision Trees
- Kuhn
- 1980
(Show Context)
Citation Context ...ort has been devoted by numerous researchers to mapping uniform dependence algorithms to processor arrays systematically. Most of these methods are based on or derived from the dependency method (DM) =-=[36, 37, 38]-=-. An overview of the different methods can be found in the references [39, 12]. In the dependency method (denoted as DM), an algorithm (A) is represented as a 5-tuple (J n ; C; D;X;Y ), where J n is a... |

13 |
Optimal Systolic Design for the Transitive Closure and the Shortest Path Problems
- Kung, Lo, et al.
- 1987
(Show Context)
Citation Context ...on of APPs in global flow analysis of programs useful for code optimization is discussed in reference [50]. Two-dimensional processor arrays for finding transitive closures have been presented before =-=[51, 52]-=-. In this section we synthesize a one-pass linear processor array for the transitive-closure problem using the Floyd-Warshall path-finding algorithm. 42 The transitive-closure problem is defined as fo... |

12 |
Optimal Partitioning Scheme for Wavefront/Systolic Array Processors
- Jainandunsing
- 1986
(Show Context)
Citation Context ...e under the constraint of a low-bandwidth interconnect to main memory in our array processor. This work also differs from the traditional systolic array mapping/partitioning on fixed processor arrays =-=[1, 60, 61, 62, 63, 64, 65, 66]-=- by assuming only a limited storage in the processor array and by considering the effect of main-memory latency due to low-bandwidth interconnection to main memory. The goals of our design that are di... |

12 |
Partitioning and mapping algorithms into xed size systolic arrays
- Moldovan, Fortes
- 1986
(Show Context)
Citation Context ...the parameters. The proposed approach employs an e cient search technique to explore the design space and arrive at the optimal designs. Equivalence between the parameter and dependence-based methods =-=[1, 2, 3]-=- can be used to nd optimal iiisdesigns in the dependence-based approaches. The GPM has also been extended to derive optimal two-level pipelined algorithm-speci c processor arrays. Such two-level pipel... |

11 | Synthesizing optimal lower dimensional processor arrays
- Ganapathy, Wah
- 1992
(Show Context)
Citation Context ...e summarize the thesis, and discuss some future avenues of research. 1.4 Contributions of This Thesis The following are the main contributions of this thesis: ffl General Parameter Method (Chapter 2) =-=[28, 29, 30, 31]-=- A systematic array synthesis technique for uniform dependence algorithms that can optimize a given user-specified objective subject to some resource constraints. Chapter 3 presents the application of... |

10 |
Wah."Systolic Arrays - From Concept to Implementation
- Fortes, W
- 1987
(Show Context)
Citation Context ...ossible earlier. Examples of these applications include interactive language (or speech) recognition, text recognition, virtual reality, database operations, and real-time image and signal processing =-=[12, 13]-=-. These applications require massive, repetitive parallel processing, and hence, systolic computing. A number of implementation issues determine a systolic array's performance and efficiency. Designer... |

8 |
Wafer-Scale Integration and Two Level Pipelined Implementations f Systolic Armys “, JovnrOt of Poraiieiand Distdbuted Computbtg 1.1(1984). 32-63. A prelimi- nary vemion appeared
- HT, lam
- 1984
(Show Context)
Citation Context ...the curves indicates that there are more attractive alternatives than the time-optimal or processor-optimal designs. 4.5 Comparisons with Existing Work There have been earlier efforts by Kung and Lam =-=[4]-=- and recently by Valero-Garcia et al., [5] to obtain two-level pipelined PAs. They used a common approach that retimes a PA in order to include additional delays for pipelining. Their approach, howeve... |

7 | Optimal synthesis of algorithm-specific lower-dimensional processor arrays
- Ganapathy, Wah
- 1996
(Show Context)
Citation Context ...e summarize the thesis, and discuss some future avenues of research. 1.4 Contributions of This Thesis The following are the main contributions of this thesis: ffl General Parameter Method (Chapter 2) =-=[28, 29, 30, 31]-=- A systematic array synthesis technique for uniform dependence algorithms that can optimize a given user-specified objective subject to some resource constraints. Chapter 3 presents the application of... |

7 |
Subspace scheduling and parallel implementation of non-systolic regular iterative algorithms
- Roychowdhury, Kailath
- 1989
(Show Context)
Citation Context ...ves 18 (including nonmonotonic and nonlinear ones) using efficient search techniques of polynomial complexity. There have been several earlier attempts to map algorithms onto lower-dimensional arrays =-=[3, 40, 41]-=-. Important steps towards a formal solution were first made by Lee and Kedem [3]. They presented the concept of data-link collisions (two data tokens contending for the same link simultaneously) and c... |

6 |
Scheduling a system of affine recurrence equations onto a systolic array
- Yaacoby, Cappello
- 1988
(Show Context)
Citation Context |

5 |
Systematic hardware adaptation of systolic algorithms
- Valero-Garcia, Navarro, et al.
(Show Context)
Citation Context ...1). : : : : : : : : : : : : : : : : : : : : : : : : 95 4.8: Processor-time trade-offs for transitive closure: Variation in #PE with T comp . 96 4.9: Relationship between proposed and existing methods =-=[4, 5]-=- of synthesizing PAs with PFUs. : : : : : : : : : : : : : : : : : : : : : : : : : : : : 97 5.1: Coprocessor architecture proposed to solve a class of algorithms modeled by uniform recurrences. : : : :... |

5 | A new formulation of mapping conditions for the synthesis of linear systolic arrays
- Xue
- 1993
(Show Context)
Citation Context ...f = ~ 0; where~ff = ~ fi \Gamma ~fl 6= ~ 0; ff i 2 [(L i \Gamma U i ); . . . ; (L i + U i )] : Note that in Theorem 2.3, we have defined conservative bounds on ff i . Better estimates can be obtained =-=[44]-=- and will result in less overhead when the conditions in Theorem 2.3 are checked in the design process. Example 2.5 For the recurrence in Eq. (1.7), if the array sought is 1-D, then the spacing parame... |

5 |
Partitioning: an essential step in mapping algorithms into systolic array processors
- Navarro, Llaberia, et al.
- 1987
(Show Context)
Citation Context ...e under the constraint of a low-bandwidth interconnect to main memory in our array processor. This work also differs from the traditional systolic array mapping/partitioning on fixed processor arrays =-=[1, 60, 61, 62, 63, 64, 65, 66]-=- by assuming only a limited storage in the processor array and by considering the effect of main-memory latency due to low-bandwidth interconnection to main memory. The goals of our design that are di... |

4 | Optimal design of lower dimensional processor arrays for uniform recurrences
- Ganapathy, Wah
- 1992
(Show Context)
Citation Context ...e summarize the thesis, and discuss some future avenues of research. 1.4 Contributions of This Thesis The following are the main contributions of this thesis: ffl General Parameter Method (Chapter 2) =-=[28, 29, 30, 31]-=- A systematic array synthesis technique for uniform dependence algorithms that can optimize a given user-specified objective subject to some resource constraints. Chapter 3 presents the application of... |

4 |
A systolic array for algebraic path problem
- Rote
- 1985
(Show Context)
Citation Context ...on of APPs in global flow analysis of programs useful for code optimization is discussed in reference [50]. Two-dimensional processor arrays for finding transitive closures have been presented before =-=[51, 52]-=-. In this section we synthesize a one-pass linear processor array for the transitive-closure problem using the Floyd-Warshall path-finding algorithm. 42 The transitive-closure problem is defined as fo... |

4 |
VLSI algorithms for solving recurrence equations and applications
- Ibarra, Palis
- 1987
(Show Context)
Citation Context ...ortant fundamental class of problems in signal and image processing. The dependence graph for 3-D cube graph algorithms is an N \Theta N \Theta N cubical mesh. Beyond matrix product, Ibarra and Palis =-=[53]-=- point out that the cubical mesh is the dependence graph for a variety of recurrences in three variables (e.g., finding the longest common subsequence over three strings). Other computations include L... |

4 |
The Organization of Computations for Uniform Recurrences
- Karp, Miller, et al.
- 1967
(Show Context)
Citation Context ...ed loops, where the loop-carried dependencies correspond to the dependencies in the recurrence equation. Recurrences can be classified as uniform or nonuniform based on the nature of the dependencies =-=[70]-=-. A recurrence equation, Z(~p) = OE[Z(~q 1 ); Z(~q 2 ); \Delta \Delta \Delta ; Z(~q r )], is called uniform if ~q i = ~p + ~ d i , where ~ d i is a constant n-dimensional vector independent of ~p and ... |

3 |
Matrix computations on systolic-type meshes: An introduction to multi-mesh graph (MMG) method
- Moreno, Lang
- 1990
(Show Context)
Citation Context ...ctors along the axes as dependence vectors. 110 Delay node N N N 3-D Multi-mesh dependence graph Figure 5.6: Cubical-mesh dependence graph for computing the transitive closure of an N \Theta N matrix =-=[6, 7]-=-. The figure is drawn for N = 4. Example 5.2 Figure 5.6 shows the cubical-mesh dependence graph [6, 7] for a 4\Theta4 transitiveclosure problem shown in Example 1.3. Figure 5.7 shows a pictorial view ... |

3 |
The Saxpy Matrix-1: A General Purpose Systolic Computer
- Fouler, Schreiber
- 1987
(Show Context)
Citation Context ...of minimizing the number of accesses over the limited bandwidth connection is considered in the mapping process. This is in contrast to other approaches of building general-purpose systolic computers =-=[24, 25, 26, 27, 7]-=-. Thus, Chapter 5 discusses design methods under constraints of fixed bandwidth and area, and objectives of yield (clock frequency) or speedup, and number of accesses. The mapping process incorporates... |