Results 1  10
of
12
A practical hierarchical model of parallel computation: binary tree and FFT graph algorithms
, 1991
"... We introduce a model of parallel computation that retains the ideal properties of the PRAM by using it as a submodel, while simultaneously being more reflective of realistic parallel architectures by accounting for and providing abstract control over communication and synchronization costs. The Hi ..."
Abstract

Cited by 37 (5 self)
 Add to MetaCart
We introduce a model of parallel computation that retains the ideal properties of the PRAM by using it as a submodel, while simultaneously being more reflective of realistic parallel architectures by accounting for and providing abstract control over communication and synchronization costs. The Hierarchical PRAM (HPRAM) model controls conceptual complexity in the face of asynchrony in two ways. First, by providing the simplifying assumption of synchronization to the design of algorithms, but allowing the algorithms to work asynchronously with each other; and organizing this "control asynchrony " via an implicit hierarchy relation. Second, by allowing.the restriction of "communication asynchrony " in order to obtain determinate algorithms (thus greatly simplifying proofs of correctness). It is shown that the model is reflective of a variety of existing and proposed parallel architectures, particularly ones that can support massive parallelism. Relationships to programming
On the CostEffectiveness of PRAMs
, 1991
"... We introduce a formalism which allows to treat computer architecture as a formal optimization problem. We apply this to the design of shared memory parallel machines. Present computers of this type support the programming model of a shared memory. But simultaneous access to the shared memory by seve ..."
Abstract

Cited by 33 (12 self)
 Add to MetaCart
We introduce a formalism which allows to treat computer architecture as a formal optimization problem. We apply this to the design of shared memory parallel machines. Present computers of this type support the programming model of a shared memory. But simultaneous access to the shared memory by several processors is in many situations processed sequentially. Asymptotically good solutions for this problem are offered by theoretical computer science. We modify these constructions under engineering aspects and improve the price/performance ratio by roughly a factor of 6. The resulting machine has surprisingly good price/performance ratio even if compared with distributed memory machines. For almost all access patterns of all processors into the shared memory, access is as fast as the access of only a single processor. 1 Introduction Commercially available parallel machines can be classified as distributed memory machines or shared memory machines. Exchange of data between different proce...
Efficient Deterministic and Probabilistic Simulations of PRAMs on Linear Arrays with Reconfigurable Pipelined Bus Systems
 Journal of Supercomputing
, 2000
"... . In this paper, we present deterministic and probabilistic methods for simulating PRAM computations on linear arrays with reconfigurable pipelined bus systems (LARPBS). The following results are established in this paper. (1) Each step of a pprocessor PRAM with m = O#p# shared memory cells can b ..."
Abstract

Cited by 16 (11 self)
 Add to MetaCart
(Show Context)
. In this paper, we present deterministic and probabilistic methods for simulating PRAM computations on linear arrays with reconfigurable pipelined bus systems (LARPBS). The following results are established in this paper. (1) Each step of a pprocessor PRAM with m = O#p# shared memory cells can be simulated by a pprocessors LARPBS in O#log p# time, where the constant in the bigO notation is small. (2) Each step of a pprocessor PRAM with m = ##p# shared memory cells can be simulated by a pprocessors LARPBS in O#log m# time. (3) Each step of a pprocessor PRAM can be simulated by a pprocessor LARPBS in O#log p# time with probability larger than 1  1/p c for all c>0. (4) As an interesting byproduct, we show that a pprocessor LARPBS can sort p items in O#log p# time, with a small constant hidden in the bigO notation. Our results indicate that an LARPBS can simulate a PRAM very efficiently. Keywords: Concurrent read, concurrent write, deterministic simulation, linear array...
Simulation of PRAM Models on Meshes
 Nordic Journal on Computing, 2(1):51
, 1994
"... We analyze the complexity of simulating a PRAM (parallel random access machine) on a mesh structured distributed memory machine. By utilizing suitable algorithms for randomized hashing, routing in a mesh, and sorting in a mesh, we prove that simulation of a PRAM on p N \Theta p N (or 3 p N \The ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
We analyze the complexity of simulating a PRAM (parallel random access machine) on a mesh structured distributed memory machine. By utilizing suitable algorithms for randomized hashing, routing in a mesh, and sorting in a mesh, we prove that simulation of a PRAM on p N \Theta p N (or 3 p N \Theta 3 p N \Theta 3 p N ) mesh is possible with O( p N ) (respectively O( 3 p N )) delay with high probability and a relatively small constant. Furthermore, with more sophisticated simulations further speedups are achieved; experiments show delays as low as p N + o( p N ) (respectively 3 p N + o( 3 p N )) per N PRAM processors. These simulations compare quite favorably with PRAM simulations on butterfly and hypercube. 1 Introduction PRAM 1 (Parallel Random Access Machine) is an abstract model of computation. It consists of N processors, each of which may have some local memory and registers, and a global shared memory of size m. A step of a PRAM is often seen to consist of...
Constructive Deterministic PRAM Simulation on a MeshConnected Computer
 In Proc. 6th ACM Symp. on Parallel Algorithms and Architectures
, 1993
"... The PRAM model of computation consists of a collection of sequential RAM machines accessing a shared memory in lockstep fashion. The PRAM is a very highlevel abstraction of a parallel computer, and its direct realization in hardware is beyond reach of the current (or even foreseeable) technology. ..."
Abstract

Cited by 12 (10 self)
 Add to MetaCart
The PRAM model of computation consists of a collection of sequential RAM machines accessing a shared memory in lockstep fashion. The PRAM is a very highlevel abstraction of a parallel computer, and its direct realization in hardware is beyond reach of the current (or even foreseeable) technology. In this paper we present a deterministic simulation scheme to emulate PRAM computation on a meshconnected computer, a feasible machine where each processor has its own memory module and is connected to at most four other processors via pointtopoint links. In order to achieve a good worstcase performance, any deterministic simulation scheme has to replicate each variable in a number of copies. Such copies are stored in the local memory modules according to a Memory Organization Scheme (MOS), which is known to all the processors. A variable is then accessed by routing packets to its copies. All deterministic schemes in the literature make use of a MOS whose existence is proved via the prob...
A practical constructive scheme for deterministic sharedmemory access
 In Proc. 5th ACM Symp. on Parallel Algorithms and Architectures
, 1993
"... Abstract. We present three explicit schemes for distributing M variables among N memory modules, where M = �(N 1.5), M = �(N 2), and M = �(N 3), respectively. Each variable is replicated into a constant number of copies stored in distinct modules. We show that N processors, directly accessing the me ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
Abstract. We present three explicit schemes for distributing M variables among N memory modules, where M = �(N 1.5), M = �(N 2), and M = �(N 3), respectively. Each variable is replicated into a constant number of copies stored in distinct modules. We show that N processors, directly accessing the memories through a complete interconnection, can read/write any set of N variables in worstcase time O(N 1/3), O(N 1/2), and O(N 2/3), respectively for the three schemes. The access times for the last two schemes are optimal with respect to the particular redundancy values used by such schemes. The address computation can be carried out efficiently by each processor without recourse to a complete memory map and requiring only O(1) internal storage. 1.
BSP Scheduling of Regular Patterns of Computation
, 1997
"... One of the major challenges of the current research in the field of parallel computing is the development of a realistic underlying framework for the design and programming of general purpose parallel computers. The bulksynchronous parallel (BSP) model is largely viewed as the most suitable candida ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
One of the major challenges of the current research in the field of parallel computing is the development of a realistic underlying framework for the design and programming of general purpose parallel computers. The bulksynchronous parallel (BSP) model is largely viewed as the most suitable candidate for this role, as it offers support for both the design of scalable parallel architectures and the generation of portable parallel code. However, when considering the development of portable parallel software within the framework of the BSP model, one cannot disregard the existence of a broad basis of efficient sequential and PRAM solutions for the most various classes of problems. In fact, the recent emergence of reliable techniques for the identification of the potential parallelism of a sequential program has rendered the automatic parallelisation of existing sequential code more compelling than ever. At first sight, BSP simulation of PRAMs appears to be the ideal strategy for taking a...
Transgressing The Boundaries: Unified Scalable Parallel Programming
, 1996
"... The diverse architectural features of parallel computers, and the lack of commonly accepted parallelprogramming environments, meant that software development for these systems has been significantly more difficult than the sequential case. Until better approaches are developed, the programming envi ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The diverse architectural features of parallel computers, and the lack of commonly accepted parallelprogramming environments, meant that software development for these systems has been significantly more difficult than the sequential case. Until better approaches are developed, the programming environment will remain a serious obstacle to mainstream scalable parallel computing. The work reported in this paper attempts to integrate architectureindependent scalable parallel programming in the Bulk Synchronous Parallel (BSP) model with the sharedmemory parallel programming using the theoretical PRAM model. We start with a discussion of problem parallelism, that is, the parallelism inherent to a problem instead of a specific algorithm, and the parallelprogramming techniques that allow the capture of this notion. We then review the ubiquitous PRAM model in terms of the model's pragmatic limitations, where particular attention is paid to simulations on practical machines. The BSP model i...
Deterministic PRAM Simulation with Constant Redundancy * (Preliminary Version)
"... Abstract: In this paper, we show that distributing the memory of a parallel computer and, thereby, decreasing its granularity allows a reduction in the redundancy required to achieve polylog simulation time for each PRAM step. Previously, realistic models of parallel computation assigned one memory ..."
Abstract
 Add to MetaCart
Abstract: In this paper, we show that distributing the memory of a parallel computer and, thereby, decreasing its granularity allows a reduction in the redundancy required to achieve polylog simulation time for each PRAM step. Previously, realistic models of parallel computation assigned one memory module to each processor and, as a result, insisted on relatively coarsegrain memory. We propose, on the other hand, a more flexible, but equally valid model of computation, the distributedmemory, boundeddegree network (DMBDN) model. This model allows the use of finegrain memory while maintaining the realism of a boundeddegree interconnection network. We describe a PRAM simulation scheme, which is admitted under the DMBDN model, that exploits the increased memory bandwidth provided by a twodimensional mesh of trees (2DMOT) network to achieve an overhead in memory redundancy lower than that required by other fast, deterministic PRAM simulations. Specifically, for a deterministic simulation of an nprocessor PRAM on a boundeddegree network, we are able to reduce the number of copies of each variable from O(logn/loglogn) to ®(1) and still simulate each PRAM step in polylog time. 1.
A Practical Constructive Scheme for Deterministic SharedMemory Access*
"... We present an explicit memory organization scheme for distributing hl data items among N memory modules where M 6 ..."
Abstract
 Add to MetaCart
We present an explicit memory organization scheme for distributing hl data items among N memory modules where M 6