Results 1  10
of
154
Synchronization and linearity: an algebra for discrete event systems
, 2001
"... The first edition of this book was published in 1992 by Wiley (ISBN 0 471 93609 X). Since this book is now out of print, and to answer the request of several colleagues, the authors have decided to make it available freely on the Web, while retaining the copyright, for the benefit of the scientific ..."
Abstract

Cited by 279 (10 self)
 Add to MetaCart
The first edition of this book was published in 1992 by Wiley (ISBN 0 471 93609 X). Since this book is now out of print, and to answer the request of several colleagues, the authors have decided to make it available freely on the Web, while retaining the copyright, for the benefit of the scientific community. Copyright Statement This electronic document is in PDF format. One needs Acrobat Reader (available freely for most platforms from the Adobe web site) to benefit from the full interactive machinery: using the package hyperref by Sebastian Rahtz, the table of contents and all LATEX crossreferences are automatically converted into clickable hyperlinks, bookmarks are generated automatically, etc.. So, do not hesitate to click on references to equation or section numbers, on items of thetableofcontents and of the index, etc.. One may freely use and print this document for one’s own purpose or even distribute it freely, but not commercially, provided it is distributed in its entirety and without modifications, including this preface and copyright statement. Any use of thecontents should be acknowledged according to the standard scientific practice. The
Special Purpose Parallel Computing
 Lectures on Parallel Computation
, 1993
"... A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing ..."
Abstract

Cited by 77 (5 self)
 Add to MetaCart
A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing [365] demonstrated that, in principle, a single general purpose sequential machine could be designed which would be capable of efficiently performing any computation which could be performed by a special purpose sequential machine. The importance of this universality result for subsequent practical developments in computing cannot be overstated. It showed that, for a given computational problem, the additional efficiency advantages which could be gained by designing a special purpose sequential machine for that problem would not be great. Around 1944, von Neumann produced a proposal [66, 389] for a general purpose storedprogram sequential computer which captured the fundamental principles of...
The Mapping of Linear Recurrence Equations on Regular Arrays
 Journal of VLSI Signal Processing
, 1989
"... The parallelization of many algorithms can be obtained using spacetime transformations which are applied on nested doloops or on recurrence equations. In this paper, we analyze systems of linear recurrence equations, a generalization of uniform recurrence equations. The first part of the paper des ..."
Abstract

Cited by 68 (7 self)
 Add to MetaCart
(Show Context)
The parallelization of many algorithms can be obtained using spacetime transformations which are applied on nested doloops or on recurrence equations. In this paper, we analyze systems of linear recurrence equations, a generalization of uniform recurrence equations. The first part of the paper describes a method for finding automatically whether such a system can be scheduled by an affine timing function, independent of the size parameter of the algorithm. In the second part, we describe a powerful method that makes it possible to transform linear recurrences into uniform recurrence equations. Both parts rely on results on integral convex polyhedra. Our results are illustrated on the Gauss elimination algorithm and on the GaussJordan diagonalization algorithm. 1 Introduction Designing efficient algorithms for parallel architectures is one of the main difficulties of the current research in computer science. As the architecture of supercomputers evolves towards massive parallelism...
Compaan: Deriving Process Networks from Matlab for Embedded Signal Processing Architectures
 IN PROCEEDINGS OF THE 8TH INTERNATIONAL WORKSHOP ON HARDWARE/SOFTWARE CODESIGN (CODES
, 2000
"... This paper presents the Compaan tool that automatically transforms a nested loop program written in Matlab into a processnetwork specification. The process ..."
Abstract

Cited by 59 (11 self)
 Add to MetaCart
(Show Context)
This paper presents the Compaan tool that automatically transforms a nested loop program written in Matlab into a processnetwork specification. The process
Improving Functional Density Through RunTime Circuit Reconfiguration
, 1997
"... orting a C compiler to the DISC processor. Justin Diether assisted in the design, handlayout, and testing of many partially reconfigured circuits. I would also like to thank Paul Graham for his generous assistance and support of our many mutual activities, classes, and projects at BYU. Other gradua ..."
Abstract

Cited by 48 (2 self)
 Add to MetaCart
orting a C compiler to the DISC processor. Justin Diether assisted in the design, handlayout, and testing of many partially reconfigured circuits. I would also like to thank Paul Graham for his generous assistance and support of our many mutual activities, classes, and projects at BYU. Other graduate students assisting me with this work include Russel Peterson, Mike Rencher, Richard Ross, and Peter Bellows. My advisor, Brad Hutchings, provided essential assistance and encouragement in all of the projects, ideas, and results presented within this work. My decision to complete this degree and write this dissertation was influenced largely by his advice and positive encouragement. Brent Nelson and other faculty members within the Electrical and Computer Engineering department at BYU have provided critical feedback on a wide variety of topics relating to this work. I would also like to acknowledge the insight and assistance of many collaborators researching closely related subjects. For
Scheduling And Behavioral Transformations For Parallel Systems
, 1993
"... In a parallel system, either a VLSI architecture in hardware or a parallel program in software, the quality of the final design depends on the ability of a synthesis system to exploit the parallelism hidden in the input description of applications. Since iterative or recursive algorithms are usually ..."
Abstract

Cited by 39 (3 self)
 Add to MetaCart
In a parallel system, either a VLSI architecture in hardware or a parallel program in software, the quality of the final design depends on the ability of a synthesis system to exploit the parallelism hidden in the input description of applications. Since iterative or recursive algorithms are usually the most timecritical parts of an application, the parallelism embedded in the repetitive pattern of an iterative algorithm needs to be explored. This thesis studies techniques and algorithms to expose the parallelism in an iterative algorithm so that the designer can find an implementation achieving a desired execution rate. In particular, the objective is to find an efficient schedule to be executed iteratively. A form of dataflow graphs is used to model the iterative part of an application, e.g. a digital signal filter or the while/for loop of a program. Nodes in the graph represent operations to be performed and edges represent both intraiteration and interiteration precedence relat...
Mapping Uniform Loop Nests onto Distributed Memory Architectures
 Parallel Computing
, 1993
"... This paper deals with scheduling, mapping and partitioning techniques for uniform loop nests. It is shown how the different techniques of scheduling, of mapping and of partitioning are linked and how code generation can be derived according to these methods. Our approach is based upon extensions of ..."
Abstract

Cited by 38 (9 self)
 Add to MetaCart
(Show Context)
This paper deals with scheduling, mapping and partitioning techniques for uniform loop nests. It is shown how the different techniques of scheduling, of mapping and of partitioning are linked and how code generation can be derived according to these methods. Our approach is based upon extensions of systolic array design methodologies.
Linear Scheduling Is Nearly Optimal
, 1991
"... This paper deals with the problem of finding optimal schedulings for uniform dependence algorithms. Given a convex domain, let T f be the total time needed to execute all computations using the free (greedy) schedule and let T l be the total time needed to execute all computations using the optimal ..."
Abstract

Cited by 36 (11 self)
 Add to MetaCart
This paper deals with the problem of finding optimal schedulings for uniform dependence algorithms. Given a convex domain, let T f be the total time needed to execute all computations using the free (greedy) schedule and let T l be the total time needed to execute all computations using the optimal linear schedule. Our main result is to bound T l =T f and T l \Gamma T f for sufficiently "fat" domains. Keywords: Uniform dependence algorithms; Convex domain; Free schedule; Linear schedule; Optimal schedule; Path packing. 1. Introduction The pioneering work of Karp, Miller and Winograd 2 has considered a special class of algorithms characterized by uniform data dependencies and unittime computations. This special class of algorithms, termed uniform dependence algorithms by Shang and Fortes 6 has proven of paramount importance in various fields of applications, such as systolic array design and parallel compiler optimization. This paper deals with the problem of finding optimal s...
Achieving Full Parallelism using MultiDimensional Retiming
, 1996
"... Most scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are usually applied to get optimal execution rates in parallel and/or pipeline systems. The retiming technique is a common and valuable transformation tool in onedimensional proble ..."
Abstract

Cited by 29 (18 self)
 Add to MetaCart
Most scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are usually applied to get optimal execution rates in parallel and/or pipeline systems. The retiming technique is a common and valuable transformation tool in onedimensional problems, when loops are represented by data flow graphs (DFGs). In this paper, uniform nested loops are modeled as multidimensional data flow graphs (MDFGs). Full parallelism of the loop body, i.e., all nodes in the MDFG executed in parallel, substantially decreases the overall computation time. It is well known that, for onedimensional DFGs, retiming can not always achieve full parallelism. Other existing optimization techniques for nested loops also can not always achieve full parallelism. This paper shows an important and counterintuitive result, which proves that we can always obtain fullparallelism for MDFGs with more than one dimension. This result is obtained by transforming the MDFG in...
Regular Partitioning for synthesizing fixedsize systolic arrays
 Integration, The VLSI Journal
, 1991
"... Extending the projection method for the synthesis of systolic arrays, we present a procedure for the design of fixedsize systolic arrays using a technique called "locally sequential globally parallel" (LSGP) partitioning. Our main result, which gives a necessary and sufficient condition t ..."
Abstract

Cited by 28 (7 self)
 Add to MetaCart
(Show Context)
Extending the projection method for the synthesis of systolic arrays, we present a procedure for the design of fixedsize systolic arrays using a technique called "locally sequential globally parallel" (LSGP) partitioning. Our main result, which gives a necessary and sufficient condition to characterize the boxes in which cells can be merged without conflict, is the key to the procedure presented here. 1 Introduction Recent work has shown that the usual synthesis method for systolic arrays based upon a projection vector and a scheduling vector can be extended to generate systolic implementations on a fixed number of processors. The main idea of all these extensions is to merge many cells into a single processor, so as to compress the array. This step is called partitioning and can take two different forms: the LPGS (locally parallel globally sequential) form and the LSGP (locally sequential globally parallel) form. The first approach, studied by Moldovan [MF86], is to partition the ar...