Results 1 
8 of
8
Scheduling a system of nonsingular affine recurrence equations onto a processor array
 Journal of VLSI Signal Processing
, 1989
"... 1This material is based upon work supported by the Office of Naval Research under contract nos. N00014 ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
1This material is based upon work supported by the Office of Naval Research under contract nos. N00014
Affine Dependence Classification for Communications Minimization
 IJPP
, 1996
"... This paper introduces results on placement and communications minimization for systems of affine recurrence equations. We show how to classify the dependences according to the number and nature of communications they may result in. We give both communicationfree conditions and conditions for an eff ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
This paper introduces results on placement and communications minimization for systems of affine recurrence equations. We show how to classify the dependences according to the number and nature of communications they may result in. We give both communicationfree conditions and conditions for an efficient use of broadcast or neighbortoneighbor communication primitives. Since the dependences of a problem can generally not be all communicationfree, we finally introduce a heuristic to globally minimize the communications based on the classification of dependences. Keywords: parallelization techniques, localization optimization, communications minimization, systems of recurrence equations, loops. 1. Introduction Over the past years, many research works have dealt with the construction of efficient parallelizing compilers ([11], [21], [3], [1]). In this context loop parallelization techniques have been developed. The parallelization of a loop nest requires the determination of a schedu...
Advanced Systolic Design
, 1999
"... Systolic arrays are locally connected parallel architectures, whose structure is wellsuited to the implementation of many algorithms, in scientific computation, signal and image processing, biological data analysis, etc. The nature of systolic algorithms makes it possible to synthesize architecture ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Systolic arrays are locally connected parallel architectures, whose structure is wellsuited to the implementation of many algorithms, in scientific computation, signal and image processing, biological data analysis, etc. The nature of systolic algorithms makes it possible to synthesize architectures supporting them, using correctness preserving transformations, in a theoretical framework that has a deep relationship with loop parallelization techniques. This opens the way to new very highlevel architecture synthesis techniques, which will be a major step in mastering the use of IC technologies in the future. This chapter has two parts. First, we present the current state of the art in systolic synthesis techniques, and the second part surveys various ways of implementing systolic algorithms and architectures using programmable architectures, fpgas, or dedicated architectures. 5.1 Introduction The term systolic arrays was coined by Kung and Leiserson in 1978 to describe application ...
BOUNDED BROADCAST IN SYSTOLIC ARRAYS
, 2006
"... Much work has been done on the problem of synthesizing a processor array from a system of recurrence equations. Some researchers limit communication to nearest neighbors in the array; others use broadcast. In many cases, neither of the above approaches result in an optimal execution time. In this pa ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Much work has been done on the problem of synthesizing a processor array from a system of recurrence equations. Some researchers limit communication to nearest neighbors in the array; others use broadcast. In many cases, neither of the above approaches result in an optimal execution time. In this paper a technique called bounded broadcast is explored whereby an element of a processor array can broadcast to a bounded number of other processors. This technique is applied to the problems of transitive closure and allpairs shortest distance, resulting in time complexities that are smaller than those reported previously. In general, the technique can be used to design bounded broadcast systolic arrays for algorithms whose implementation can benefit from broadcasting. Keywords: allpairs shortest distance, broadcast, data dependence, parallel computation, recurrence equation, systolic array, transitive closure, VLSI architecture. 1
Libraries of schedulefree operators in Alpha
 in: Proceedings of the International Conference on Application SpeciĀ®c Array Processors, IEEE Computer
, 1997
"... This paper presents a method, based on the formalism of affine recurrence equations, for the synthesis of digital circuits exploiting parallelism at the bitlevel. In the initial specification of a numerical algorithm, the arithmetic operators are replaced with their yet unscheduled (schedulefree) ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
This paper presents a method, based on the formalism of affine recurrence equations, for the synthesis of digital circuits exploiting parallelism at the bitlevel. In the initial specification of a numerical algorithm, the arithmetic operators are replaced with their yet unscheduled (schedulefree) binary implementation as recurrence equations. This allows a bitlevel dependency analysis yielding a bitparallel array. The method is demonstrated on the example of the matrixvector product, and discuted. 1
Automatic Synthesis of Regular Architectures Optimized at the Bit Level
"... This paper presents methods based on the formalism of affine recurrence equations for the synthesis of bitlevel regular architectures from wordlevel (integer or real) algorithms. Because of bitlevel dependency analysis, the arrays have optimal efficiency. We present two possible design flows lead ..."
Abstract
 Add to MetaCart
(Show Context)
This paper presents methods based on the formalism of affine recurrence equations for the synthesis of bitlevel regular architectures from wordlevel (integer or real) algorithms. Because of bitlevel dependency analysis, the arrays have optimal efficiency. We present two possible design flows leading to architectures based either on bitparallel or bitserial operators. The first one is fully automated. 1.
Parametrically Tiled Distributed Memory Parallelization of Polyhedral Programs
, 2013
"... We present a method for parallelizing polyhedral programs targeting distributed memory platforms. We use wavefront of tiles as the parallelization strategy, and uniformize affine dependences to localize communication to neighbortoneighbor sendrecv, and also to enable parametric tiling. We evalua ..."
Abstract
 Add to MetaCart
We present a method for parallelizing polyhedral programs targeting distributed memory platforms. We use wavefront of tiles as the parallelization strategy, and uniformize affine dependences to localize communication to neighbortoneighbor sendrecv, and also to enable parametric tiling. We evaluate our approach on a collection of polyhedral kernels from the PolyBench suite, and show that our approach scales to as many as 96 cores, well beyond the shared memory limit, with performance comparable to PLuTo, a stateoftheart shared memory automatic parallelizer using the polyhedral model.