Results 11  20
of
22
Integrating Synchronous and Asynchronous Paradigms: The Fork95 Parallel Programming Language
 Proc. MPPM95 Int. Conf. on Massively Parallel Programming Models
, 1995
"... ..."
An Effective Load Balancing Policy for Geometric Decaying Algorithms
"... Parallel algorithms are often first designed as a sequence of rounds, where each round includes any number of independent constant time operations. This socalled worktime presentation is then followed by a processor scheduling implementation ona more concrete computational model. Many parallel alg ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Parallel algorithms are often first designed as a sequence of rounds, where each round includes any number of independent constant time operations. This socalled worktime presentation is then followed by a processor scheduling implementation ona more concrete computational model. Many parallel algorithms are geometricdecaying in the sense that the sequence of work loads is upper bounded by a decreasing geometric series. A standard scheduling implementation of such algorithms consists of a repeated application of load balancing. We present a more effective, yet as simple, policy for the utilization of load balancing in geometric decaying algorithms. By making a more careful choice of when and how often load balancing should be employed, and by using a simple amortization argument, we showthat the number of required applications of load balancing should be nearlyconstant. The policy is not restricted to any particular model of parallel computation, and, up to a constant factor, it is the best possible.
PARALLEL PREFIX COMPUTATION WITH FEW PROCESSORS
, 1992
"... We present a parallel prefix algorithm which uses (2(p + 1)/p (p + 1) + 2) n 1 arithmetic and (p (p 1)/p (p + 1) + 2) n + (1/2) p (p 1) routing steps to compute the prefixes of n elemealta on a distributedmemary multiprocessor with p < n nodes. The algorithm is compared with the distributedm ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We present a parallel prefix algorithm which uses (2(p + 1)/p (p + 1) + 2) n 1 arithmetic and (p (p 1)/p (p + 1) + 2) n + (1/2) p (p 1) routing steps to compute the prefixes of n elemealta on a distributedmemary multiprocessor with p < n nodes. The algorithm is compared with the distributedmemory implementation of the parallel prefix algorithm proposed by Kruskal, Rudolph, and Snir. We show that there is a tradeoff between the two algorithms in terms of the number of processors, and the parameter ~ = ~'ft/'rA, which is the ratio of the time required to transfer an operand to the time required to perform the operation of the prefix problem. The new algorithm is shown to be more efficient when n is large and p2(p _ 1) ( 4/¢.
Fast Computation of Divided Differences and Parallel Hermite Interpolation
"... We present parallel algorithms for fast polynomial interpolation. These algorithms can be used for constructing and evaluating polynomials interpolating the function values and its derivatives of arbitrary order (Hermite interpolation). For interpolation, the parallel arithmetic complexity is O(log² ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We present parallel algorithms for fast polynomial interpolation. These algorithms can be used for constructing and evaluating polynomials interpolating the function values and its derivatives of arbitrary order (Hermite interpolation). For interpolation, the parallel arithmetic complexity is O(log² M + log N) for large M and N...
On the Power of Some PRAM Models
 Journal of Parallel Algorithms and Applications. Vol
, 1997
"... The focus here is the power of some underexplored CRCW PRAMs, which are strictly more powerful than exclusive write PRAM but strictly less powerful than BSR. We show that some problems can be solved more efficiently in time and/or processor bounds on these models. For example, we show that n linearl ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The focus here is the power of some underexplored CRCW PRAMs, which are strictly more powerful than exclusive write PRAM but strictly less powerful than BSR. We show that some problems can be solved more efficiently in time and/or processor bounds on these models. For example, we show that n linearlyranged integers can be sorted in O(log n= log log n) time with optimal linear work on Sum CRCW PRAM. We also show that the maximum gap problem can be solved within the same resource bounds on Maximum CRCW PRAM. Though some models can be shown to be more powerful than others, some of them appear to have incomparable powers. Keywords: PRAM; BSR; time and processor bounds; simulation; sorting. Classification Categories: F.1.1, F.1.2, F.2.2 1 Preliminaries The focus of this work is on some underexplored Concurrent Read Concurrent Write (CRCW) Parallel Random Access Machine (PRAM) models. These PRAM models differ only in the way of resolving write conflicts. Some of them can be shown to be s...
Recursive Individually Distributed Object
"... Abstract. Distributed Objects (DO) as de ned by OMG's CORBA architecture provide a model for objectoriented parallel distributed computing. The parallelism in this model however is limited in that the distribution refers to the mappings of di erent objects to di erent hosts, and not to the d ..."
Abstract
 Add to MetaCart
Abstract. Distributed Objects (DO) as de ned by OMG's CORBA architecture provide a model for objectoriented parallel distributed computing. The parallelism in this model however is limited in that the distribution refers to the mappings of di erent objects to di erent hosts, and not to the distribution of any individual object. We propose in this paper an alternative model called Individually Distributed Object (IDO) which allows a single large object to be distributed over a network, thus providing a high level interface for the exploitation of parallelism inside the computation of each object which was left out of the distributed objects model. Moreover, we propose a set of functionally orthogonal operations for the objects which allow the objects to be recursively divided, combined, and communicate over recursively divided address space. Programming by divideandconquer is therefore e ectively supported under this framework. The Recursive Individually Distributed Object (RIDO) has been adopted as the primary parallel programming model in the Brokered Objects for Raggednetwork Giga ops (BORG) project at the Applied Physics Laboratory of Johns Hopkins University, and applied to largescale realworld problems.
Optimal Parallel Prefix on Mesh Architectures
, 1993
"... Algorithms for efficient implementation of computation of prefix products on meshconnected... ..."
Abstract
 Add to MetaCart
Algorithms for efficient implementation of computation of prefix products on meshconnected...
On the Complexity of Parallelizing Sequential Circuits using the Parallel Prefix Method
"... The parallel prex method uses a tree of identical processing nodes to calculate in parallel the state and output response of a nite state machine (FSM) to a nitelength input sequence. Traditionally, each node in the tree has been required to perform multiplication of binary matrices. In this pap ..."
Abstract
 Add to MetaCart
The parallel prex method uses a tree of identical processing nodes to calculate in parallel the state and output response of a nite state machine (FSM) to a nitelength input sequence. Traditionally, each node in the tree has been required to perform multiplication of binary matrices. In this paper, we show that under appropriate modications of the inputoutput mappings at the leaf nodes of the tree, the operation of each node can be reduced to the operation of the unique nite semigroup that is associated with the given FSM. In terms of this view, previous parallel prex approaches for sequential circuits have treated the worst case scenario, in which the order of the associated semigroup is exponential in the number of states of the given FSM. Keywords: nite state machines, parallel processing, prex technique, discretetime iteration bound. 1