Results 1 
6 of
6
Parallelization in Calculational Forms
 In 25th ACM Symposium on Principles of Programming Languages
, 1998
"... The problems involved in developing efficient parallel programs have proved harder than those in developing efficient sequential ones, both for programmers and for compilers. Although program calculation has been found to be a promising way to solve these problems in the sequential world, we believe ..."
Abstract

Cited by 31 (23 self)
 Add to MetaCart
The problems involved in developing efficient parallel programs have proved harder than those in developing efficient sequential ones, both for programmers and for compilers. Although program calculation has been found to be a promising way to solve these problems in the sequential world, we believe that it needs much more effort to study its effective use in the parallel world. In this paper, we propose a calculational framework for the derivation of efficient parallel programs with two main innovations:  We propose a novel inductive synthesis lemma based on which an elementary but powerful parallelization theorem is developed.  We make the first attempt to construct a calculational algorithm for parallelization, deriving associative operators from data type definition and making full use of existing fusion and tupling calculations. Being more constructive, our method is not only helpful in the design of efficient parallel programs in general but also promising in the construc...
(De)Composition Rules for Parallel Scan and Reduction
 In Proc. 3rd Int. Working Conf. on Massively Parallel Programming Models (MPPM'97
, 1998
"... We study the use of welldefined building blocks for SPMD programming of machines with distributed memory. Our general framework is based on homomorphisms, functions that capture the idea of dataparallelism and have a close correspondence with collective operations of the MPI standard, e.g., scan an ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
We study the use of welldefined building blocks for SPMD programming of machines with distributed memory. Our general framework is based on homomorphisms, functions that capture the idea of dataparallelism and have a close correspondence with collective operations of the MPI standard, e.g., scan and reduction. We prove two composition rules: under certain conditions, a composition of a scan and a reduction can be transformed into one reduction, and a composition of two scans into one scan. As an example of decomposition, we transform a segmented reduction into a composition of partial reduction and allgather. The performance gain and overhead of the proposed composition and decomposition rules are assessed analytically for the hypercube and compared with the estimates for some other parallel models.
An Analytical Method For Parallelization Of Recursive Functions
, 2001
"... Programming with parallel skeletons is an attractive framework because it encourages programmers to develop efficient and portable parallel programs. However, extracting parallelism from sequential specifications and constructing efficient parallel programs using the skeletons are still difficult ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Programming with parallel skeletons is an attractive framework because it encourages programmers to develop efficient and portable parallel programs. However, extracting parallelism from sequential specifications and constructing efficient parallel programs using the skeletons are still difficult tasks. In this paper, we propose an analytical approach to transforming recursive functions on general recursive data structures into compositions of parallel skeletons. Using static slicing, we have defined a classification of subexpressions based on their dataparallelism. Then, skeletonbased parallel programs are generated from the classification. To extend the scope of parallelization, we have adopted more general parallel skeletons which do not require the associativity of argument functions. In this way, our analytical method can parallelize recursive functions with complex data flows. Keywords: data parallelism, parallelization, functional languages, parallel skeletons, data flow analysis, static slice 1.
Automatic Inversion Generates DivideandConquer Parallel Programs
"... Divideandconquer algorithms are suitable for modern parallel machines, tending to have large amounts of inherent parallelism and working well with caches and deep memory hierarchies. Among others, list homomorphisms are a class of recursive functions on lists, which match very well with the divide ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
Divideandconquer algorithms are suitable for modern parallel machines, tending to have large amounts of inherent parallelism and working well with caches and deep memory hierarchies. Among others, list homomorphisms are a class of recursive functions on lists, which match very well with the divideandconquer paradigm. However, direct programming with list homomorphisms is a challenge for many programmers. In this paper, we propose and implement a novel system that can automatically derive costoptimal list homomorphisms from a pair of sequential programs, based on the third homomorphism theorem. Our idea is to reduce extraction of list homomorphisms to derivation of weak right inverses. We show that a weak right inverse always exists and can be automatically generated from a wide class of sequential programs. We demonstrate our system with several nontrivial examples, including the maximum prefix sum problem, the prefix sum computation, the maximum segment sum problem, and the lineofsight problem. The experimental results show practical efficiency of our automatic parallelization algorithm and good speedups of the generated parallel programs.
The Third Homomorphism Theorem on Trees Downward & Upward Lead to DivideandConquer
"... Parallel programs on lists have been intensively studied. It is well known that associativity provides a good characterization for divideandconquer parallel programs. In particular, the third homomorphism theorem is not only useful for systematic development of parallel programs on lists, but it i ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Parallel programs on lists have been intensively studied. It is well known that associativity provides a good characterization for divideandconquer parallel programs. In particular, the third homomorphism theorem is not only useful for systematic development of parallel programs on lists, but it is also suitable for automatic parallelization. The theorem states that if two sequential programs iterate the same list leftward and rightward, respectively, and compute the same value, then there exists a divideandconquer parallel program that computes the same value as the sequential programs. While there have been many studies on lists, few have been done for characterizing and developing of parallel programs on trees. Naive divideandconquer programs, which divide a tree at the root and compute independent subtrees in parallel, take time that is proportional to the height of the input tree and have poor scalability with respect to the number of processors when the input tree is illbalanced. In this paper, we develop a method for systematically constructing scalable divideandconquer parallel programs on trees, in which two sequential programs lead to a scalable divideandconquer parallel program. We focus on paths instead of trees so as to utilize rich results on lists and demonstrate that associativity provides good characterization for scalable divideandconquer parallel programs on trees. Moreover, we generalize the third homomorphism theorem from lists to trees. We demonstrate the effectiveness of our method with various examples. Our results, being generalizations of known results for lists, are generic in the sense that they work well for all polynomial data structures.
An Analytical Method For Parallelization Of Recursive Functions
, 2000
"... Programming with parallel skeletons is an attractive framework because it encourages programmers to develop efficient and portable parallel programs. However, extracting parallelism from sequential specifications and constructing efficient parallel programs using the skeletons are still difficult ..."
Abstract
 Add to MetaCart
Programming with parallel skeletons is an attractive framework because it encourages programmers to develop efficient and portable parallel programs. However, extracting parallelism from sequential specifications and constructing efficient parallel programs using the skeletons are still difficult tasks. In this paper, we propose an analytical approach to transforming recursive functions on general recursive data structures into compositions of parallel skeletons. Using static slicing, wehave defined a classification of subexpressions based on their dataparallelism. Then, skeletonbased parallel programs are generated from the classification. To extend the scope of parallelization, wehave adopted more general parallel skeletons which do not require the associativity of argument functions. In this way, our analytical method can parallelize recursive functions with complex data flows. Keywords: data parallelism, parallelization, functional languages, parallel skeletons, data flow analysis, static slice 1.