Results 1 
9 of
9
HighLevel Synthesis of Nonprogrammable Hardware Accelerators
 JOURNAL OF VLSI SIGNAL PROCESSING
, 2000
"... The PICON system automatically synthesizes embedded nonprogrammable accelerators to be used as coprocessors for functions expressed as loop nests in C. The output is synthesizable VHDL that defines the accelerator at the register transfer level (RTL). The system generates a synchronous array of cu ..."
Abstract

Cited by 60 (6 self)
 Add to MetaCart
The PICON system automatically synthesizes embedded nonprogrammable accelerators to be used as coprocessors for functions expressed as loop nests in C. The output is synthesizable VHDL that defines the accelerator at the register transfer level (RTL). The system generates a synchronous array of customized VLIW (verylong instruction word) processors, their controller, local memory, and interfaces. The system also modifies the user's application software to make use of the generated accelerator. The user indicates the throughput to be achieved by specifying the number of processors and their initiation interval. In experimental comparisons, PICON designs are slightly more costly than handdesigned accelerators with the same performance.
Generalized multipartitioning for multidimensional arrays
 In Proceedings of the International Parallel and Distributed Processing Symposium, Fort Lauderdale, FL
, 2002
"... Multipartitioning is a strategy for parallelizing computations that require solving 1D recurrences along each dimension of a multidimensional array. Previous techniques for multipartitioning yield efficient parallelizations over 3D domains only when the number of processors is a perfect square. Thi ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
Multipartitioning is a strategy for parallelizing computations that require solving 1D recurrences along each dimension of a multidimensional array. Previous techniques for multipartitioning yield efficient parallelizations over 3D domains only when the number of processors is a perfect square. This paper considers the general problem of computing multipartitionings for ddimensional data volumes on an arbitrary number of processors. We describe an algorithm that computes an optimal multipartitioning onto all of the processors for this general case. Finally, we describe how we extended the Rice dHPF compiler for High Performance Fortran to generate code that exploits generalized multipartitioning and show that the compiler’s generated code for the NAS SP computational fluid dynamics benchmark achieves scalable high performance. 1.
Constructing and Exploiting Linear Schedules with Prescribed Parallelism
, 2002
"... this paper appeared in the proceedings of the 14th International Parallel and Distributed Processing Symposium (IEEE Computer Society, 2000, pp. 815821) under the title Aconstructive solution to the juggling problem in processor array synthesis ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
this paper appeared in the proceedings of the 14th International Parallel and Distributed Processing Symposium (IEEE Computer Society, 2000, pp. 815821) under the title Aconstructive solution to the juggling problem in processor array synthesis
On Efficient Parallelization of LineSweep Computations
 In 9th Workshop on Compilers for Parallel Computers
, 2001
"... Multipartitioning is a strategy for partitioning multidimensional arrays among a collection of processors so that linesweep computations can be performed e#ciently. The principal property of a multipartitioned array is that for a line sweep along any array dimension, all processors have the same nu ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
Multipartitioning is a strategy for partitioning multidimensional arrays among a collection of processors so that linesweep computations can be performed e#ciently. The principal property of a multipartitioned array is that for a line sweep along any array dimension, all processors have the same number of tiles to compute at each step in the sweep. This property results in full, balanced parallelism. A secondary benefit of multipartitionings is that they induce only coarsegrain communication. Previously, computing a ddimensional multipartitioning required that p 1 d1 be integral, where p is the number of processors. Here, we describe an algorithm to compute a ddimensional multipartitioning of an array of # dimensions for an arbitrary number of processors, for any d, 2 # d # #. When using a multipartitioning to parallelize a line sweep computation, the best partitioning is the one that exploits all of the processors and has the smallest communication volume. To compute the best multipartitioning of a #dimensional array, we describe a cost model for selecting d, the dimensionality of the best partitioning, and the number of cuts along each partitioned dimension. In practice, our technique will choose a 3dimensional multipartitioning for a 3dimensional linesweep computation, except when p is a prime; previously, a 3dimensional multipartitioning could be applied only when # p is integral. We describe an implementation of multipartitioning in the Rice dHPF compiler and performance results obtained to parallelize a line sweep computation on a range of di#erent numbers of processors. # This work performed while a visiting scholar at Rice University. 1
Generalized multipartitioning
 In Second Annual Los Alamos Computer Science Institute (LACSI) Sy mposisum
, 2001
"... Multipartitioning is a strategy for partitioning multidimensional arrays among a collection of processors. With multipartitioning, computations that require solving onedimensional recurrences along each dimension of a multidimensional array can be parallelized effectively. Previous techniques for ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Multipartitioning is a strategy for partitioning multidimensional arrays among a collection of processors. With multipartitioning, computations that require solving onedimensional recurrences along each dimension of a multidimensional array can be parallelized effectively. Previous techniques for multipartitioning yield efficient parallelizations over threedimensional domains only when the number of processors is a perfect square. This paper considers the general problem of computing optimal multipartitionings for ddimensional data volumes on an arbitrary number of processors. We describe an algorithm that computes an optimal multipartitioning for this general case, which enables multipartitioning to be used for performing efficient parallelizations of linesweep computations under arbitrary conditions. Finally, we describe a prototype implementation of generalized multipartitioning in the Rice dHPF compiler and performance results obtained when using it to parallelize a line sweep computation for different numbers of processors. 1
HighLevel Synthesis of Nonprogrammable Hardware Accelerators
 Journal of VLSI Signal Processing
, 2000
"... The PICON system automatically synthesizes embedded nonprogrammable accelerators to be used as coprocessors for functions expressed as loop nests in C. The output is synthesizable VHDL that defines the accelerator at the register transfer level (RTL). The system generates a synchronous array of cu ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The PICON system automatically synthesizes embedded nonprogrammable accelerators to be used as coprocessors for functions expressed as loop nests in C. The output is synthesizable VHDL that defines the accelerator at the register transfer level (RTL). The system generates a synchronous array of customized VLIW (verylong instruction word) processors, their controller, local memory, and interfaces. The system also modifies the user's application software to make use of the generated accelerator. The user indicates the throughput to be achieved by specifying the number of processors and their initiation interval. In experimental comparisons, PICON designs are slightly more costly than handdesigned accelerators with the same performance.
Efficient parallelization of linesweep computations
, 2001
"... Multipartitioning is a strategy for partitioning multidimensional arrays among a collection of processors so that linesweep computations can be performed efficiently. The principal property of a multipartitioned array is that for a line sweep along any array dimension, all processors have the same ..."
Abstract
 Add to MetaCart
Multipartitioning is a strategy for partitioning multidimensional arrays among a collection of processors so that linesweep computations can be performed efficiently. The principal property of a multipartitioned array is that for a line sweep along any array dimension, all processors have the same number of tiles to compute at each step in the sweep. This property results in full, balanced parallelism. A secondary benefit of multipartitionings is that they induce only coarsegrain communication. Previously, computing a ddimensional multipartitioning required that p 1 d−1 be integral, where p is the number of processors. Here, we describe an algorithm to compute a ddimensional multipartitioning of an array of ρ dimensions for an arbitrary number of processors, for any d, 2 ≤ d ≤ ρ. When using a multipartitioning to parallelize a line sweep computation, the best partitioning is the one that exploits all of the processors and has the smallest communication volume. To compute the best multipartitioning of a ρdimensional array, we describe a cost model for selecting d, the dimensionality of the best partitioning, and the number of cuts along each partitioned dimension. In practice, our technique will choose a 3dimensional multipartitioning for a 3dimensional linesweep computation, except when p is a prime; previously, a 3dimensional multipartitioning could be applied only when √ p is integral. We describe an implementation of multipartitioning in the Rice dHPF compiler and performance results obtained to parallelize a line sweep computation on a range of different numbers of processors. ∗This work performed while a visiting scholar at Rice University.
Latin HyperRectangles for Efficient Parallelization of LineSweep Computations
"... Multipartitioning is a strategy for partitioning multidimensional arrays among a collection of processors so that linesweep computations can be performed eciently. The principal property of a multipartitioned array is that for a line sweep along any array dimension, all processors have the same nu ..."
Abstract
 Add to MetaCart
Multipartitioning is a strategy for partitioning multidimensional arrays among a collection of processors so that linesweep computations can be performed eciently. The principal property of a multipartitioned array is that for a line sweep along any array dimension, all processors have the same number of tiles to compute at each step in the sweep, in other words, it describes a latin hyperrectangle, natural extension of the notion of latin squares. This property results in full, balanced parallelism. A secondary benet of multipartitionings is that they induce only coarsegrain communication. All of the multipartitionings described in the literature to date assign only one tile per processor per hyperplane of a multipartitioning (latin hypercube). While this class of multipartitionings is optimal for two dimensions, in three dimensions it requires the number of processors to be a perfect square. This paper considers the general problem of computing optimal multipartitionings for multidimensional data volumes on an arbitrary number of processors. We describe an algorithm to compute a ddimensional multipartitioning of a multidimensional array for an arbitrary number of processors. When using a multipartitioning to parallelize a line sweep computation, the best partitioning is the one that exploits all of the processors and has the smallest communication volume. To compute the best multipartitioning of a multidimensional array, we describe a cost model for selecting d, the dimensionality of the best partitioning, and the number of cuts along each partitioned dimension. In practice, our technique will choose a 3dimensional multipartitioning for a 3dimensional linesweep computation, except when p is a prime; previously, a 3dimensional multipartitioning could be a...
Internal Accession Date Only
, 2000
"... systolic array synthesis, affine scheduling We describe a new, practical, constructive method for solving the wellknown conflictfree scheduling systolic for the locally sequential, globally parallel (LSGP) case of processor array synthesis. Previous solutions have an important practical disadvanta ..."
Abstract
 Add to MetaCart
systolic array synthesis, affine scheduling We describe a new, practical, constructive method for solving the wellknown conflictfree scheduling systolic for the locally sequential, globally parallel (LSGP) case of processor array synthesis. Previous solutions have an important practical disadvantage. Here we provide a closed form solution that enables the enumeration of all conflictfree schedules. The second part of the paper discusses reduction of the cost of hardware whose function is to control the flow of data, enable or disable functional units, and generate memory addresses. We present a new technique for controlling the complexity of these housekeeping functions in a processor array.