Results 1  10
of
17
ZPL: An Array Sublanguage
 PROCEEDINGS OF THE 6TH INTERNATIONAL WORKSHOP ON LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING
, 1993
"... The notion of isolating the "common case" is a well known computer science principle. This paper describes ZPL, a language that treats data parallelism as a common case of MIMD parallelism. This separation of concerns has many benefits. It allows us to define a clean and concise language for describ ..."
Abstract

Cited by 32 (10 self)
 Add to MetaCart
The notion of isolating the "common case" is a well known computer science principle. This paper describes ZPL, a language that treats data parallelism as a common case of MIMD parallelism. This separation of concerns has many benefits. It allows us to define a clean and concise language for describing data parallel computations, and this in turn leads to efficient parallel execution. Our particular language also provides mechanisms for handling boundary conditions. We introduce the concepts, constructs and semantics of our new language, and give a simple example that contrasts ZPL with other data parallel languages.
Parallelizing While Loops for Multiprocessor Systems
 IN PROCEEDINGS OF THE 9TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM
, 1995
"... Current parallelizing compilers treat while loops and do loops with conditional exits as sequential constructs because their iteration space is unknown. Motivated by the fact that these types of loops arise frequently in practice, we have developed techniques that can be used to automatically transf ..."
Abstract

Cited by 31 (13 self)
 Add to MetaCart
Current parallelizing compilers treat while loops and do loops with conditional exits as sequential constructs because their iteration space is unknown. Motivated by the fact that these types of loops arise frequently in practice, we have developed techniques that can be used to automatically transform them for parallel execution. We succeed in parallelizing loops involving linked lists traversals  something that has not been done before. This is an important problem since linked list traversals arise frequently in loops with irregular access patterns, such as sparse matrix computations. The methods can even be applied to loops whose data dependence relations cannot be analyzed at compiletime. We outline a cost/performance analysis that can be used to decide when the methods should be applied. Since, as we show, the expected speedups are significant, our conclusion is that they should almost always be applied  providing there is sufficient parallelism available in the original loop. We present experimental results on loops from the PERFECT Benchmarks and sparse matrix packages which substantiate our conclusion that these techniques can yield significant speedups.
Efficient parallel algorithms for chordal graphs
"... We give the first efficient parallel algorithms for recognizing chordal graphs, finding a maximum clique and a maximum independent set in a chordal graph, finding an optimal coloring of a chordal graph, finding a breadthfirst search tree and a depthfirst search tree of a chordal graph, recognizing ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
We give the first efficient parallel algorithms for recognizing chordal graphs, finding a maximum clique and a maximum independent set in a chordal graph, finding an optimal coloring of a chordal graph, finding a breadthfirst search tree and a depthfirst search tree of a chordal graph, recognizing interval graphs, and testing interval graphs for isomorphism. The key to our results is an efficient parallel algorithm for finding a perfect elimination ordering.
RSA Hardware Implementation
, 1995
"... Introduction to Arithmetic for Digital System Designers. New York, NY: Holt, Rinehart and Winston, 1982. 28 #14# C#. K. Ko#c and C. Y. Hung. Multioperand modulo addition using carry save adders. Electronics Letters, 26#6#:361#363, 15th March 1990. #15# C# . K. Ko#c and C. Y. Hung. Bitlevel syst ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
Introduction to Arithmetic for Digital System Designers. New York, NY: Holt, Rinehart and Winston, 1982. 28 #14# C#. K. Ko#c and C. Y. Hung. Multioperand modulo addition using carry save adders. Electronics Letters, 26#6#:361#363, 15th March 1990. #15# C# . K. Ko#c and C. Y. Hung. Bitlevel systolic arrays for modular multiplication. Journal of VLSI Signal Processing, 3#3#:215#223, 1991. #16# M. Kochanski. Developing an RSA chip. In H. C. Williams, editor, Advances in Cryptology CRYPTO 85, Proceedings, Lecture Notes in Computer Science, No. 218, pages 350#357. New York, NY: SpringerVerlag, 1985. #17# I. Koren. Computer Arithmetic Algorithms. Englewood Cli#s, NJ: PrenticeHall, 1993. #18# D. C. Kozen. The Design and Analysis of Algorithms. New York, NY: SpringerVerlag, 1992. #19# R. Ladner and M. Fischer. Parallel pre#x computation. Journal of the ACM, 27#4#:831# 838, October 1980. #20# S.
Generic Downwards Accumulations
 Science of Computer Programming
, 2000
"... . A downwards accumulation is a higherorder operation that distributes information downwards through a data structure, from the root towards the leaves. The concept was originally introduced in an ad hoc way for just a couple of kinds of tree. We generalize the concept to an arbitrary regular d ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
. A downwards accumulation is a higherorder operation that distributes information downwards through a data structure, from the root towards the leaves. The concept was originally introduced in an ad hoc way for just a couple of kinds of tree. We generalize the concept to an arbitrary regular datatype; the resulting denition is coinductive. 1 Introduction The notion of scans or accumulations on lists is well known, and has proved very fruitful for expressing and calculating with programs involving lists [4]. Gibbons [7, 8] generalizes the notion of accumulation to various kinds of tree; that generalization too has proved fruitful, underlying the derivations of a number of tree algorithms, such as the parallel prex algorithm for prex sums [15, 8], Reingold and Tilford's algorithm for drawing trees tidily [21, 9], and algorithms for query evaluation in structured text [16, 23]. There are two varieties of accumulation on lists: leftwards and rightwards. Leftwards accumulation ...
Runtime Parallelization: A Framework for Parallel Computation
, 1995
"... The goal of parallelizing, or restructuring, compilers is to detect and exploit parallelism in sequential programs written in conventional languages. Current parallelizing compilers do a reasonable job of extracting parallelism from programs with regular, statically analyzable access patterns. Howev ..."
Abstract

Cited by 16 (8 self)
 Add to MetaCart
The goal of parallelizing, or restructuring, compilers is to detect and exploit parallelism in sequential programs written in conventional languages. Current parallelizing compilers do a reasonable job of extracting parallelism from programs with regular, statically analyzable access patterns. However, if the memory access pattern of the program is input data dependent, then static data dependence analysis and consequently parallelization is impossible. Moreover, in this case the compiler cannot apply privatization and reduction parallelization, the transformations that have been proven to be the most effective in removing data dependences and increasing the amount of exploitable parallelism in the program. Typical examples of irregular, dynamic applications are complex simulations such as SPICE for circuit simulation, DYNA3D for structural mechanics modeling, DMOL for quantum mechanical simulation of molecules, and CHARMM for molecular dynamics simulation of organic systems. Therefore, since irregular programs represent a large and important fraction of applications, an automatable framework for runtime parallelization is needed to complement existing and future static compiler techniques. In this thesis,
Implementation of Parallel Graph Algorithms on a Massively Parallel SIMD Computer with Virtual Processing
, 1995
"... We describe our implementation of several PRAM graph algorithms on the massively parallel computer MasPar MP1 with 16,384 processors. Our implementation incorporated virtual processing and we present extensive test data. In a previous project [13], we reported the implementation of a set of paralle ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
We describe our implementation of several PRAM graph algorithms on the massively parallel computer MasPar MP1 with 16,384 processors. Our implementation incorporated virtual processing and we present extensive test data. In a previous project [13], we reported the implementation of a set of parallel graph algorithms with the constraint that the maximum input size was restricted to be no more than the physical number of processors on the MasPar. The MasPar language MPL that we used for our code does not support virtual processing. In this paper, we describe a method of simulating virtual processors on the MasPar. We recoded and finetuned our earlier parallel graph algorithms to incorporate the usage of virtual processors. Under the current implementation scheme, there is no limit on the number of virtual processors that one can use in the program as long as there is enough main memory to store all the data required during the computation. We also give two general optimization techniq...
An Efficient Parallel Algorithm That Finds Independent Sets Of Guaranteed Size
, 1990
"... . Every graph with n vertices and m edges has an independent set containing at least n 2 =(2m +n) vertices. We present a parallel algorithm that nds an independent set of this size and runs in O(log 3 n) time on a CRCW PRAM with O((m + n)(m; n)= log 2 n) processors, where (n; m) is a functiona ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
. Every graph with n vertices and m edges has an independent set containing at least n 2 =(2m +n) vertices. We present a parallel algorithm that nds an independent set of this size and runs in O(log 3 n) time on a CRCW PRAM with O((m + n)(m; n)= log 2 n) processors, where (n; m) is a functional inverse of Ackerman's function. The ideas used in the design of this algorithm are also used to design an algorithm that, with the same resources, nds a vertex coloring satisfying certain minimality conditions. Key words. Turan's theorem, independent set, NC, graph, parallel computation, deterministic AMS(MOS) subject classications. 68Q22, 68R10, 68R05 1. Introduction. This paper presents a fast parallel algorithm that, given a graph G, nds an independent set of G whose size is bounded from below. The bound depends on the number n of vertices and number m of edges of G, and cannot be improved in these terms. Since constructing a maximum independent set is NPhard, it cannot be so...
Integrating synchronous and asynchronous paradigms: the Fork95 parallel programming language
"... The SBPRAM is a lockstepsynchronous, massively parallel multiprocessor currently being built at Saarbrucken University, with up to 4096 RISCstyle processing elements and with a (from the programmer's view) physically shared memory of up to 2GByte with uniform memory access time. Fork95 is a rede ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
The SBPRAM is a lockstepsynchronous, massively parallel multiprocessor currently being built at Saarbrucken University, with up to 4096 RISCstyle processing elements and with a (from the programmer's view) physically shared memory of up to 2GByte with uniform memory access time. Fork95 is a redesign of the Pram language FORK, based on ANSI C, with additional constructs to create parallel processes, hierarchically dividing processor groups into subgroups, managing shared and private address subspaces. Fork95 makes the assemblylevel synchronicity of the underlying hardware available to the programmer at the language level. Nevertheless, it provides comfortable facilities for locally asynchronous computation where desired by the programmer. We show that Fork95 o ers full expressibility for the implementation of practically relevant parallel algorithms. We do this by examining all known parallel programming paradigms used for the parallel solution of real{world problems, such as strictly synchronous execution, asynchronous processes, pipelining and systolic algorithms, parallel divide and conquer, parallel pre x computation, data parallelism, etc., and show how these parallel programming paradigms are supported bytheFork95 language and run time system. 1
Parallel Canonical Recoding
 Electronics Letters
, 1996
"... We introduce a parallel algorithm for generating the canonical signeddigit expansion of an nbit number in O#log n# time using O#n# gates. The algorithm is similar to the computation of the carries in a carry lookahead circuit. We also prove that if the binary number x + bx=2c is given, then th ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We introduce a parallel algorithm for generating the canonical signeddigit expansion of an nbit number in O#log n# time using O#n# gates. The algorithm is similar to the computation of the carries in a carry lookahead circuit. We also prove that if the binary number x + bx=2c is given, then the canonical signeddigit recoding of x can be computed in O#1# time using O#n# gates. 1 Introduction Recoding techniques #Booth recoding, bitpair recoding, etc.# for sparse signeddigit representations of binary numbers have been e#ectively used in multiplication #3, 4# and exponentiation algorithms #2#. For example, the original Booth recoding technique #3, 4# scans the bits of the multiplier one bit at a time, and adds or subtracts the multiplicand to or from the partial product, depending on the value of the current bit and the previous bit. The modi#ed versions of the Booth algorithm scan the bits of the multiplier two bits or three bits at a time #4#. These techniques are equivalent ...