Results 1  10
of
66
Special Purpose Parallel Computing
 Lectures on Parallel Computation
, 1993
"... A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing ..."
Abstract

Cited by 77 (5 self)
 Add to MetaCart
A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing [365] demonstrated that, in principle, a single general purpose sequential machine could be designed which would be capable of efficiently performing any computation which could be performed by a special purpose sequential machine. The importance of this universality result for subsequent practical developments in computing cannot be overstated. It showed that, for a given computational problem, the additional efficiency advantages which could be gained by designing a special purpose sequential machine for that problem would not be great. Around 1944, von Neumann produced a proposal [66, 389] for a general purpose storedprogram sequential computer which captured the fundamental principles of...
Derandomizing Algorithms for Routing and Sorting on Meshes
 Proc. 5th Symp. on Discrete Algorithms
, 1994
"... We describe a new technique that can be used to derandomize a number of randomized algorithms for routing and sorting on meshes. We demonstrate the power of this technique by deriving improved deterministic algorithms for a variety of routing and sorting problems. Our main results are an optimal alg ..."
Abstract

Cited by 30 (15 self)
 Add to MetaCart
We describe a new technique that can be used to derandomize a number of randomized algorithms for routing and sorting on meshes. We demonstrate the power of this technique by deriving improved deterministic algorithms for a variety of routing and sorting problems. Our main results are an optimal algorithm for kk routing on multidimensional meshes, a permutation routing algorithm with running time 2n+o(n) and queue size 5, and an optimal algorithm for 11 sorting. 1 Introduction One of the main problems in the simulation of idealistic parallel computers by realistic ones is the problem of message routing through the sparse network of links connecting a set of processing units (PUs) among each other. In this paper, we consider the case of the n \Theta n mesh, in which n 2 PUs are connected by a regular twodimensional grid of bidirectional communication links. There may also be additional wraparound connections between the two PUs at opposite ends of each row and each column of t...
Constant queue routing on a mesh
 Journal of Parallel and Distributed Computing
, 1992
"... Packet routing is an important problem in parallel computation since a single step of interprocessor communication can be thought of as a packet routing task. In this paper we present an optimal algorithm for packet routing on a meshconnected computer. Two important criteria for judging a routing ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
Packet routing is an important problem in parallel computation since a single step of interprocessor communication can be thought of as a packet routing task. In this paper we present an optimal algorithm for packet routing on a meshconnected computer. Two important criteria for judging a routing algorithm will be 1) its run time, i.e., the number of parallel steps it takes for the last packet to reach its destination, and 2) its queue size, i.e., the maximum number of packets that any node will have to store at any time during routing. We present a 2n − 2 step routing algorithm for an n × n MIMD mesh that requires a queue size of only 112. The previous best known result is a routing algorithm with the same time bound but with a queue size of 1008. The time bound of 2n − 2isoptimal. Aqueue size of 1008 is rather large for practical use. We believe that the queue size of our algorithm is practical. The improvement in the queue size is possible due to (from among other things) a new sorting algorithm for the MIMD mesh. 2 1
Packet Routing In FixedConnection Networks: A Survey
, 1998
"... We survey routing problems on fixedconnection networks. We consider many aspects of the routing problem and provide known theoretical results for various communication models. We focus on (partial) permutation, krelation routing, routing to random destinations, dynamic routing, isotonic routing ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
We survey routing problems on fixedconnection networks. We consider many aspects of the routing problem and provide known theoretical results for various communication models. We focus on (partial) permutation, krelation routing, routing to random destinations, dynamic routing, isotonic routing, fault tolerant routing, and related sorting results. We also provide a list of unsolved problems and numerous references.
Columnsort Lives! An Efficient OutofCore Sorting Program
 IN PROCEEDINGS OF THE THIRTEENTH ANNUAL ACM SYMPOSIUM ON PARALLEL ALGORITHMS AND ARCHITECTURES
, 2001
"... We present the design and implementation of a parallel outofcore sorting algorithm, which is based on Leighton's columnsort algorithm. We show how to relax some of the steps of the original columnsort algorithm to permit a faster outofcore implementation. Our algorithm requires only 4 passes ove ..."
Abstract

Cited by 20 (7 self)
 Add to MetaCart
We present the design and implementation of a parallel outofcore sorting algorithm, which is based on Leighton's columnsort algorithm. We show how to relax some of the steps of the original columnsort algorithm to permit a faster outofcore implementation. Our algorithm requires only 4 passes over the data, and a 3pass implementation is possible. Although there is a limit on the number of records that can be sortedas a function of the memory used per processorthis upper limit need not be a severe restriction, and it increases superlinearly with the perprocessor memory. To the best of our knowledge, our implementation is the first outofcore multiprocessor sorting algorithm whose output is in the order assumed by the Parallel Disk Model. We define several measures of sorting e#ciency and demonstrate that our implementation's sorting e#ciency is competitive with that of NOWSort, a sorting algorithm developed to sort large amounts of data quickly on a cluster of workstations.
Deterministic Permutation Routing on Meshes
 Proc. 5th Symp. on Parallel and Distributed Proc., IEEE
, 1993
"... We present a new deterministic algorithm for routing permutations on a twodimensional MIMD mesh. The algorithm runs in the optimal time 2n \Gamma 2 on an n \Theta n mesh, and the maximal number of packets stored in a processing unit is 81. A modification of the algorithm, running in time 2n + O(1), ..."
Abstract

Cited by 20 (13 self)
 Add to MetaCart
We present a new deterministic algorithm for routing permutations on a twodimensional MIMD mesh. The algorithm runs in the optimal time 2n \Gamma 2 on an n \Theta n mesh, and the maximal number of packets stored in a processing unit is 81. A modification of the algorithm, running in time 2n + O(1), has the maximal queue length of only 31. The algorithm is simple, no conflictresolution strategy is required. Keywords: meshconnected computer, permutation routing, optimal algorithm, conflict freeness. 1 Introduction The exchange of information between processing units, PUs, is one of the most important problems on parallel computers in which the PUs communicate through an interconnection network. The basic communication step is that of transferring packets. These are portions of information generated and received by This research was partially supported by EC Cooperative Action IC1000 (project ALTEC: Algorithms for Future Technologies). y Instytut Informatyki, Uniwersytet Warszaws...
BranchandBound and Backtrack Search on MeshConnected Arrays of Processors
 Mathematical Systems Theory
, 1992
"... In this paper we investigate the parallel complexity of backtrack and branchandbound search on the meshconnected array. We present an \Omega\Gamma p dN= p log N) lower bound for the time needed by a randomized algorithm to perform backtrack and branchandbound search of a tree of depth d on ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
In this paper we investigate the parallel complexity of backtrack and branchandbound search on the meshconnected array. We present an \Omega\Gamma p dN= p log N) lower bound for the time needed by a randomized algorithm to perform backtrack and branchandbound search of a tree of depth d on the p N \Theta p N mesh, even when the depth of the tree is known in advance. The lower bound holds also for algorithms that are allowed to move treenodes and create multiple copies of the same treenode. For the upper bounds we give deterministic algorithms that are within a factor of O(log 3 2 N) from our lower bound. Our algorithms do not make any assumption on the shape of the tree to be searched, do not know the depth of the tree in advance and do not move treenodes nor create multiple copies of the same node. The best previously known algorithm for backtrack search on the mesh was randomized and required \Theta(d p N= log N) time. Our algorithm for branchandbound is the fir...
Supporting the hypercube programming model on mesh architectures (A fast sorter for iWarp tori)
, 1992
"... ..."
Scalability of Parallel Sorting on Mesh Multicomputers
, 1991
"... This paper presents two new parallel algorithms QSP1 and QSP2 based on sequential quicksort for sorting data on a mesh multicomputer, and analyzes their scalability using the isoefficiency metric. We show that QSP2 matches the lower bound on the isoefficiency function for mesh multicomputers. The is ..."
Abstract

Cited by 18 (12 self)
 Add to MetaCart
This paper presents two new parallel algorithms QSP1 and QSP2 based on sequential quicksort for sorting data on a mesh multicomputer, and analyzes their scalability using the isoefficiency metric. We show that QSP2 matches the lower bound on the isoefficiency function for mesh multicomputers. The isoefficiency of QSP1 is also fairly close to optimal. Lang et al. and Schnorr et al. have developed parallel sorting algorithms for the mesh architecture that have either optimal (Schnorr) or close to optimal (Lang) runtime complexity for the oneelementperprocessor case. Both QSP1 and QSP2 have worse performance than these algorithms for the oneelementperprocessor case. But QSP1 and QSP2 have better scalability than the scaleddown variants of these algorithms (for the case in which there are more elements than processors). As a result, our new parallel formulations are better than these scaleddown variants in terms of speedup w.r.t the best sequential algorithms. We also present a dif...
Desnakification of Mesh Sorting Algorithms
 Proc. 2nd European Symp. on Algorithms, LNCS 855
, 1994
"... In all recent nearoptimal sorting algorithms for meshes, the packets are sorted with respect to some snakelike indexing. In this paper we present deterministic algorithms for sorting with respect to the more natural rowmajor indexing. For 11 sorting on an n \Theta n mesh, we give an algorithm t ..."
Abstract

Cited by 15 (11 self)
 Add to MetaCart
In all recent nearoptimal sorting algorithms for meshes, the packets are sorted with respect to some snakelike indexing. In this paper we present deterministic algorithms for sorting with respect to the more natural rowmajor indexing. For 11 sorting on an n \Theta n mesh, we give an algorithm that runs in 2\Deltan+o(n) steps, matching the distance bound, with maximal queue size five. It is considerably simpler than earlier algorithms. Another algorithm performs kk sorting in k \Delta n=2 + o(k \Delta n) steps, matching the bisection bound. Furthermore, we present uniaxial algorithms for rowmajor sorting. Uniaxial algorithms have clear practical and theoretical advantages over biaxial algorithms. We show that 11 sorting can be performed in 2 1 = 2 \Delta n + o(n) steps. Alternatively, this problem is solved with maximal queue size five in 4 1 = 3 \Delta n steps, without any additional terms. For practically important values of n, this algorithm is much faster than any alg...