Results 1 - 10
of
10
Overview of Mesh Results
- MAX-PLANCK INSTITUT FUR INFORMATIK, SAARBRUCKEN
, 1995
"... This paper provides an overview of lower and upper bounds for algorithms for mesh-connected processor networks. Most of our attention goes to routing and sorting problems, but other problems are mentioned as well. Results from 1977 to 1995 are covered. We provide numerous results, references and ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper provides an overview of lower and upper bounds for algorithms for mesh-connected processor networks. Most of our attention goes to routing and sorting problems, but other problems are mentioned as well. Results from 1977 to 1995 are covered. We provide numerous results, references and open problems. The text is completed with an index. This is a worked-out version of the author's contribution to a joint paper with Miltos D. Grammatikakis, D. Frank Hsu and Miro Kraetzl on multicomputer routing, submitted to the Journal of Parallel and Distributed Computing.
Solving Fundamental Problems on Sparse-Meshes
- IEEE Transactions on Parallel & Distributed Systems
, 1998
"... A sparse-mesh, which has PUs on the diagonal of a two-dimensional grid only, is a cost effective distributed memory machine. Variants of this machine have been considered before, but none of them is so simple and pure as a sparse-mesh. Various fundamental problems (routing, sorting, list ranking) ar ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
A sparse-mesh, which has PUs on the diagonal of a two-dimensional grid only, is a cost effective distributed memory machine. Variants of this machine have been considered before, but none of them is so simple and pure as a sparse-mesh. Various fundamental problems (routing, sorting, list ranking) are analyzed, proving that sparse-meshes have a great potential. The results are extended for higher dimensional sparse-meshes. 1 Introduction On ordinary two-dimensional meshes we must accept that, due to their small bisection width, for most problems the maximum achievable speed-up with n 2 processing units (PUs) is only \Theta(n). On the other hand, networks such as hypercubes impose increasing conditions on the interconnection modules with increasing network sizes. Cube-connected-cycles do not have this problem, but are harder to program due to their irregularity. Anyway, because of a basic theorem from VLSI lay-out [18], all planar architectures have an area that is quadratic in their...
Work-Optimal Simulation of PRAM Models on Meshes
- Nordic Journal on Computing, 2(1):51
, 1994
"... In this paper we consider work-optimal simulations of PRAM models on coated meshes. Coated meshes consist of a mesh connected routing machinery with processors on the surface of the mesh. We prove that coated meshes with 2-dimensional or 3-dimensional routing machinery can work-optimally simulate ER ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
In this paper we consider work-optimal simulations of PRAM models on coated meshes. Coated meshes consist of a mesh connected routing machinery with processors on the surface of the mesh. We prove that coated meshes with 2-dimensional or 3-dimensional routing machinery can work-optimally simulate EREW, CREW, and CRCW PRAM models. The general idea behind this simulation is to use Valiant's XPRAM approach, and ignore the work-complexity of simple nodes of the routing machinery. 1 Introduction There are a wide variety of approaches to parallelism in general [40], and even to general purpose parallelism [39] -- reflecting the prevailing uncertainty of the correct approach. One model aiming at general purpose parallelism is the PRAM (Parallel Random Access Machine) model, which is a natural generalization of the classical RAM model. It consists of N processors, each of which may have some local memory and registers, and a global shared memory of size m. A step of PRAM is often seen to con...
Experimental Results for Four Work-Optimal PRAM Simulation Algorithms on Coated Meshes
, 1994
"... In this paper we consider the effect of overloading in four work-optimal PRAM simulation algorithms on coated meshes with P real processors. A coated mesh consists of a mesh connected routing machinery, and processor&memory pairs, which form a coat on the routing machinery. Previously work-optimal P ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
In this paper we consider the effect of overloading in four work-optimal PRAM simulation algorithms on coated meshes with P real processors. A coated mesh consists of a mesh connected routing machinery, and processor&memory pairs, which form a coat on the routing machinery. Previously work-optimal PRAM simulations, which ignore the effect of overloading, has been presented for coated meshes, but their cost is relatively high (around 100). The algorithms we study here are based on greedy routing, sorting, improved virtual levelled network technique, and combining queues method. Our results show that overloading alone can be used to improve the simulation cost of all PRAM models on coated meshes to circa 10 (and even less) routing steps per P simulated PRAM processors. 1 Introduction In [13] three algorithms for simulating PRAM models on coated meshes were presented (see also [15, 16]). The EREW PRAM simulation algorithm was based on a modification of the basic greedy routing algorithm ...
Balanced PRAM Simulations via Moving Threads and Hashing
"... : We present a novel approach to parallel computing, where (virtual) PRAM processors are represented as light-weight threads, and each physical processor is capable of managing several threads. Instead of moving read and write requests, and replies between processor&memory pairs (and caches), we mov ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
: We present a novel approach to parallel computing, where (virtual) PRAM processors are represented as light-weight threads, and each physical processor is capable of managing several threads. Instead of moving read and write requests, and replies between processor&memory pairs (and caches), we move the light-weight threads. Consequently, the processor load balancing problem reduces to the problem of producing evenly distributed memory references. In PRAM computations, this can be achieved by properly hashing the shared memory into the processor&memory pairs. We describe the idea of moving threads, and show that the moving threads framework provides a natural validation for Brent's theorem in work-optimal PRAM simulation situations on mesh of trees, coated mesh, and OCPC based distributed memory machines (DMMs). We prove that an EREW PRAM computation C requiring work W and time T , can be implemented work-optimally on those p-processor DMMs with high probability, if W =\Omega (p \De...
On Implementing EREW Work-Optimally on Mesh of Trees
, 1995
"... : We show how to implement an `1 \Theta n log n-processor EREW PRAM workoptimally on a 2-dimensional n-sided mesh of trees, consisting of n processors, n memory modules, and O(n 2 ) nodes. Similarly, we prove that an `2 \Theta n 2 log n-processor EREW PRAM can be implemented work-optimally on a ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
: We show how to implement an `1 \Theta n log n-processor EREW PRAM workoptimally on a 2-dimensional n-sided mesh of trees, consisting of n processors, n memory modules, and O(n 2 ) nodes. Similarly, we prove that an `2 \Theta n 2 log n-processor EREW PRAM can be implemented work-optimally on a 3-dimensional n-sided mesh of trees. By the work-optimality of implementations we mean that the expected routing time of PRAM memory requests is O(1) per simulated PRAM processor with high probability. Experiments show that on relatively small `1 and `2 the cost per simulated PRAM processor is 1:5--2:5 in the 2-dimensional case, and 2--3 in the 3-dimensional case. If at each step at most 1 3 'th of the PRAM processors make a reference to the shared memory, then the simulation cost is approximately 1. We also compare our work-optimal simulations to those proposed for coated meshes. Key Words: EREW, mesh of trees, shared memory, simulation, work-optimal, randomized, coated mesh. Category: ...
Improved Virtual Leveled Routing Strategy for Meshes
, 1994
"... We present an improved version of the virtual leveled network routing strategy for mesh connected computers. This improvement achieves a speedup of approximately 2, and requires practically no additional hardware. We confirm this by providing experimental results concerning time to accomplish CRCW P ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We present an improved version of the virtual leveled network routing strategy for mesh connected computers. This improvement achieves a speedup of approximately 2, and requires practically no additional hardware. We confirm this by providing experimental results concerning time to accomplish CRCW PRAM simulation influenced routing situations. 1 Introduction Combining messages during a routing process is the method to avoid performance degradation in the presence of hot-spots. Especially, combining on the route provides a practical method to accomplish the memory reference primitives of strong CRCW PRAM (as well as CREW) models. Several methods have been proposed to accomplish combining [1, 3, 6, 8, 10], which all can be applied to mesh connected computers. These methods can be applied to other interconnection structures as well, but here we deal only with the 2-dimensional and the 3-dimensional mesh structure, since it is simple, regular, modularly extendible, well scalable, and in g...
PRAM Simulation Programs for Mesh Structures
, 1993
"... A lot of experimental results were provided in author's Ph.Lic. thesis [3]. In this report we give the source listing of simulation programs (and tools to analyse the results) that were used to produce the mentioned results. i ii Contents 1 Introduction 1 1.1 Structure of programs : : : : : : : : ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
A lot of experimental results were provided in author's Ph.Lic. thesis [3]. In this report we give the source listing of simulation programs (and tools to analyse the results) that were used to produce the mentioned results. i ii Contents 1 Introduction 1 1.1 Structure of programs : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.2 Organization of this document : : : : : : : : : : : : : : : : : : : : : : : : : : 3 2 EREW simulation programs for ordinary and toroidal meshes 5 2.1 Ordinary 2-dimensional mesh : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.2 Toroidal 2-dimensional mesh : : : : : : : : : : : : : : : : : : : : : : : : : : : 16 2.3 Ordinary 3-dimensional mesh : : : : : : : : : : : : : : : : : : : : : : : : : : : 27 2.4 Toroidal 3-dimensional mesh : : : : : : : : : : : : : : : : : : : : : : : : : : : 40 2.5 Matrix multiplication experiments : : : : : : : : : : : : : : : : : : : : : : : : 53 2.5.1 Head of file mesh32g2.c : : : : : : : : : : : :...
Performance of Work-Optimal PRAM Simulation Algorithms on Coated Meshes
, 1996
"... We study the effect of varying the multithreading level of processors in work-optimal PRAM simulation algorithms on coated meshes. A coated mesh consists of a mesh connected routing machinery and P processor &memory pairs that form a coat on the routing machinery. The algorithms studied are based on ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We study the effect of varying the multithreading level of processors in work-optimal PRAM simulation algorithms on coated meshes. A coated mesh consists of a mesh connected routing machinery and P processor &memory pairs that form a coat on the routing machinery. The algorithms studied are based on greedy routing, sorting, improved virtual leveled network technique, combining queues method, and synchronization wave. Our results show that increasing the multithreading level considerably improves the simulation cost. The cost can be decreased below 5 routing steps per P simulated PRAM processors. In case of one algorithm, even costs 1:1 : : : 2 are achieved. 1 Introduction Work-optimal simulation of PRAM models means that a constant fraction of the aggregate power of processors can be given to (arbitrary) PRAM computations. We study the efficiency of five PRAM simulation algorithms on a structure called coated mesh. This is interesting, since (a) the coated mesh (rigid definition is g...
Goodness of Time-Processor Optimal PRAM Simulations
"... . We address the question 'how to measure goodness of timeprocessor optimal PRAM simulations'. Instead of measuring only the asymptotic complexity of simulation time, we attempt to take into account all aspects of simulations exactly. We present a goodness function framework and propose a generic fu ..."
Abstract
- Add to MetaCart
. We address the question 'how to measure goodness of timeprocessor optimal PRAM simulations'. Instead of measuring only the asymptotic complexity of simulation time, we attempt to take into account all aspects of simulations exactly. We present a goodness function framework and propose a generic function for measuring the goodness. 1 Introduction A simulation of an N-processor PRAM on a P -processor distributed memory machine (DMM) is time-processor optimal, if simulation of a PRAM step succeeds in time O(N=P ) (with high probability). The simulation time is lower bounded by the diameter OE of the routing machinery and the expected memory congestion fl. If the DMM is symmetric and memory requests can be satisfied by only one memory module (one hash function), expected length ffi of memory request route is ffi = \Theta(OE). If the total routing capacity is /P (/ packets per physical processor per step), two necessary conditions for time-processor optimality are that the load ` = N=P (...

