Results 1  10
of
10
Overview of Mesh Results
 MAXPLANCK INSTITUT FUR INFORMATIK, SAARBRUCKEN
, 1995
"... This paper provides an overview of lower and upper bounds for algorithms for meshconnected processor networks. Most of our attention goes to routing and sorting problems, but other problems are mentioned as well. Results from 1977 to 1995 are covered. We provide numerous results, references and ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
This paper provides an overview of lower and upper bounds for algorithms for meshconnected processor networks. Most of our attention goes to routing and sorting problems, but other problems are mentioned as well. Results from 1977 to 1995 are covered. We provide numerous results, references and open problems. The text is completed with an index. This is a workedout version of the author's contribution to a joint paper with Miltos D. Grammatikakis, D. Frank Hsu and Miro Kraetzl on multicomputer routing, submitted to the Journal of Parallel and Distributed Computing.
Solving Fundamental Problems on SparseMeshes
 IEEE Transactions on Parallel & Distributed Systems
, 1998
"... A sparsemesh, which has PUs on the diagonal of a twodimensional grid only, is a cost effective distributed memory machine. Variants of this machine have been considered before, but none of them is so simple and pure as a sparsemesh. Various fundamental problems (routing, sorting, list ranking) ar ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
A sparsemesh, which has PUs on the diagonal of a twodimensional grid only, is a cost effective distributed memory machine. Variants of this machine have been considered before, but none of them is so simple and pure as a sparsemesh. Various fundamental problems (routing, sorting, list ranking) are analyzed, proving that sparsemeshes have a great potential. The results are extended for higher dimensional sparsemeshes. 1 Introduction On ordinary twodimensional meshes we must accept that, due to their small bisection width, for most problems the maximum achievable speedup with n 2 processing units (PUs) is only \Theta(n). On the other hand, networks such as hypercubes impose increasing conditions on the interconnection modules with increasing network sizes. Cubeconnectedcycles do not have this problem, but are harder to program due to their irregularity. Anyway, because of a basic theorem from VLSI layout [18], all planar architectures have an area that is quadratic in their...
WorkOptimal Simulation of PRAM Models on Meshes
 Nordic Journal on Computing, 2(1):51
, 1994
"... In this paper we consider workoptimal simulations of PRAM models on coated meshes. Coated meshes consist of a mesh connected routing machinery with processors on the surface of the mesh. We prove that coated meshes with 2dimensional or 3dimensional routing machinery can workoptimally simulate ER ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
In this paper we consider workoptimal simulations of PRAM models on coated meshes. Coated meshes consist of a mesh connected routing machinery with processors on the surface of the mesh. We prove that coated meshes with 2dimensional or 3dimensional routing machinery can workoptimally simulate EREW, CREW, and CRCW PRAM models. The general idea behind this simulation is to use Valiant's XPRAM approach, and ignore the workcomplexity of simple nodes of the routing machinery. 1 Introduction There are a wide variety of approaches to parallelism in general [40], and even to general purpose parallelism [39]  reflecting the prevailing uncertainty of the correct approach. One model aiming at general purpose parallelism is the PRAM (Parallel Random Access Machine) model, which is a natural generalization of the classical RAM model. It consists of N processors, each of which may have some local memory and registers, and a global shared memory of size m. A step of PRAM is often seen to con...
Experimental Results for Four WorkOptimal PRAM Simulation Algorithms on Coated Meshes
, 1994
"... In this paper we consider the effect of overloading in four workoptimal PRAM simulation algorithms on coated meshes with P real processors. A coated mesh consists of a mesh connected routing machinery, and processor&memory pairs, which form a coat on the routing machinery. Previously workoptimal P ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
In this paper we consider the effect of overloading in four workoptimal PRAM simulation algorithms on coated meshes with P real processors. A coated mesh consists of a mesh connected routing machinery, and processor&memory pairs, which form a coat on the routing machinery. Previously workoptimal PRAM simulations, which ignore the effect of overloading, has been presented for coated meshes, but their cost is relatively high (around 100). The algorithms we study here are based on greedy routing, sorting, improved virtual levelled network technique, and combining queues method. Our results show that overloading alone can be used to improve the simulation cost of all PRAM models on coated meshes to circa 10 (and even less) routing steps per P simulated PRAM processors. 1 Introduction In [13] three algorithms for simulating PRAM models on coated meshes were presented (see also [15, 16]). The EREW PRAM simulation algorithm was based on a modification of the basic greedy routing algorithm ...
Balanced PRAM Simulations via Moving Threads and Hashing
"... : We present a novel approach to parallel computing, where (virtual) PRAM processors are represented as lightweight threads, and each physical processor is capable of managing several threads. Instead of moving read and write requests, and replies between processor&memory pairs (and caches), we mov ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
: We present a novel approach to parallel computing, where (virtual) PRAM processors are represented as lightweight threads, and each physical processor is capable of managing several threads. Instead of moving read and write requests, and replies between processor&memory pairs (and caches), we move the lightweight threads. Consequently, the processor load balancing problem reduces to the problem of producing evenly distributed memory references. In PRAM computations, this can be achieved by properly hashing the shared memory into the processor&memory pairs. We describe the idea of moving threads, and show that the moving threads framework provides a natural validation for Brent's theorem in workoptimal PRAM simulation situations on mesh of trees, coated mesh, and OCPC based distributed memory machines (DMMs). We prove that an EREW PRAM computation C requiring work W and time T , can be implemented workoptimally on those pprocessor DMMs with high probability, if W =\Omega (p \De...
On Implementing EREW WorkOptimally on Mesh of Trees
, 1995
"... : We show how to implement an `1 \Theta n log nprocessor EREW PRAM workoptimally on a 2dimensional nsided mesh of trees, consisting of n processors, n memory modules, and O(n 2 ) nodes. Similarly, we prove that an `2 \Theta n 2 log nprocessor EREW PRAM can be implemented workoptimally on a ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
: We show how to implement an `1 \Theta n log nprocessor EREW PRAM workoptimally on a 2dimensional nsided mesh of trees, consisting of n processors, n memory modules, and O(n 2 ) nodes. Similarly, we prove that an `2 \Theta n 2 log nprocessor EREW PRAM can be implemented workoptimally on a 3dimensional nsided mesh of trees. By the workoptimality of implementations we mean that the expected routing time of PRAM memory requests is O(1) per simulated PRAM processor with high probability. Experiments show that on relatively small `1 and `2 the cost per simulated PRAM processor is 1:52:5 in the 2dimensional case, and 23 in the 3dimensional case. If at each step at most 1 3 'th of the PRAM processors make a reference to the shared memory, then the simulation cost is approximately 1. We also compare our workoptimal simulations to those proposed for coated meshes. Key Words: EREW, mesh of trees, shared memory, simulation, workoptimal, randomized, coated mesh. Category: ...
Improved Virtual Leveled Routing Strategy for Meshes
, 1994
"... We present an improved version of the virtual leveled network routing strategy for mesh connected computers. This improvement achieves a speedup of approximately 2, and requires practically no additional hardware. We confirm this by providing experimental results concerning time to accomplish CRCW P ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We present an improved version of the virtual leveled network routing strategy for mesh connected computers. This improvement achieves a speedup of approximately 2, and requires practically no additional hardware. We confirm this by providing experimental results concerning time to accomplish CRCW PRAM simulation influenced routing situations. 1 Introduction Combining messages during a routing process is the method to avoid performance degradation in the presence of hotspots. Especially, combining on the route provides a practical method to accomplish the memory reference primitives of strong CRCW PRAM (as well as CREW) models. Several methods have been proposed to accomplish combining [1, 3, 6, 8, 10], which all can be applied to mesh connected computers. These methods can be applied to other interconnection structures as well, but here we deal only with the 2dimensional and the 3dimensional mesh structure, since it is simple, regular, modularly extendible, well scalable, and in g...
PRAM Simulation Programs for Mesh Structures
, 1993
"... A lot of experimental results were provided in author's Ph.Lic. thesis [3]. In this report we give the source listing of simulation programs (and tools to analyse the results) that were used to produce the mentioned results. i ii Contents 1 Introduction 1 1.1 Structure of programs : : : : : : : : ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
A lot of experimental results were provided in author's Ph.Lic. thesis [3]. In this report we give the source listing of simulation programs (and tools to analyse the results) that were used to produce the mentioned results. i ii Contents 1 Introduction 1 1.1 Structure of programs : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.2 Organization of this document : : : : : : : : : : : : : : : : : : : : : : : : : : 3 2 EREW simulation programs for ordinary and toroidal meshes 5 2.1 Ordinary 2dimensional mesh : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.2 Toroidal 2dimensional mesh : : : : : : : : : : : : : : : : : : : : : : : : : : : 16 2.3 Ordinary 3dimensional mesh : : : : : : : : : : : : : : : : : : : : : : : : : : : 27 2.4 Toroidal 3dimensional mesh : : : : : : : : : : : : : : : : : : : : : : : : : : : 40 2.5 Matrix multiplication experiments : : : : : : : : : : : : : : : : : : : : : : : : 53 2.5.1 Head of file mesh32g2.c : : : : : : : : : : : :...
Performance of WorkOptimal PRAM Simulation Algorithms on Coated Meshes
, 1996
"... We study the effect of varying the multithreading level of processors in workoptimal PRAM simulation algorithms on coated meshes. A coated mesh consists of a mesh connected routing machinery and P processor &memory pairs that form a coat on the routing machinery. The algorithms studied are based on ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We study the effect of varying the multithreading level of processors in workoptimal PRAM simulation algorithms on coated meshes. A coated mesh consists of a mesh connected routing machinery and P processor &memory pairs that form a coat on the routing machinery. The algorithms studied are based on greedy routing, sorting, improved virtual leveled network technique, combining queues method, and synchronization wave. Our results show that increasing the multithreading level considerably improves the simulation cost. The cost can be decreased below 5 routing steps per P simulated PRAM processors. In case of one algorithm, even costs 1:1 : : : 2 are achieved. 1 Introduction Workoptimal simulation of PRAM models means that a constant fraction of the aggregate power of processors can be given to (arbitrary) PRAM computations. We study the efficiency of five PRAM simulation algorithms on a structure called coated mesh. This is interesting, since (a) the coated mesh (rigid definition is g...
Goodness of TimeProcessor Optimal PRAM Simulations
"... . We address the question 'how to measure goodness of timeprocessor optimal PRAM simulations'. Instead of measuring only the asymptotic complexity of simulation time, we attempt to take into account all aspects of simulations exactly. We present a goodness function framework and propose a generic fu ..."
Abstract
 Add to MetaCart
. We address the question 'how to measure goodness of timeprocessor optimal PRAM simulations'. Instead of measuring only the asymptotic complexity of simulation time, we attempt to take into account all aspects of simulations exactly. We present a goodness function framework and propose a generic function for measuring the goodness. 1 Introduction A simulation of an Nprocessor PRAM on a P processor distributed memory machine (DMM) is timeprocessor optimal, if simulation of a PRAM step succeeds in time O(N=P ) (with high probability). The simulation time is lower bounded by the diameter OE of the routing machinery and the expected memory congestion fl. If the DMM is symmetric and memory requests can be satisfied by only one memory module (one hash function), expected length ffi of memory request route is ffi = \Theta(OE). If the total routing capacity is /P (/ packets per physical processor per step), two necessary conditions for timeprocessor optimality are that the load ` = N=P (...