Results 1  10
of
66
Scans as Primitive Parallel Operations
 IEEE Transactions on Computers
, 1987
"... In most parallel randomaccess machine (PRAM) models, memory references are assumed to take unit time. In practice, and in theory, certain scan operations, also known as prefix computations, can executed in no more time than these parallel memory references. This paper outline an extensive study of ..."
Abstract

Cited by 157 (12 self)
 Add to MetaCart
In most parallel randomaccess machine (PRAM) models, memory references are assumed to take unit time. In practice, and in theory, certain scan operations, also known as prefix computations, can executed in no more time than these parallel memory references. This paper outline an extensive study of the effect of including in the PRAM models, such scan operations as unittime primitives. The study concludes that the primitives improve the asymptotic running time of many algorithms by an O(lg n) factor, greatly simplify the description of many algorithms, and are significantly easier to implement than memory references. We therefore argue that the algorithm designer should feel free to use these operations as if they were as cheap as a memory reference. This paper describes five algorithms that clearly illustrate how the scan primitives can be used in algorithm design: a radixsort algorithm, a quicksort algorithm, a minimumspanning tree algorithm, a linedrawing algorithm and a mergi...
Randomized routing and sorting on fixedconnection networks
 Journal of Algorithms
, 1994
"... This paper presents a general paradigm for the design of packet routing algorithms for fixedconnection networks. Its basis is a randomized online algorithm for scheduling any set of N packets whose paths have congestion c on any boundeddegree leveled network with depth L in O(c + L + log N) steps ..."
Abstract

Cited by 88 (13 self)
 Add to MetaCart
This paper presents a general paradigm for the design of packet routing algorithms for fixedconnection networks. Its basis is a randomized online algorithm for scheduling any set of N packets whose paths have congestion c on any boundeddegree leveled network with depth L in O(c + L + log N) steps, using constantsize queues. In this paradigm, the design of a routing algorithm is broken into three parts: (1) showing that the underlying network can emulate a leveled network, (2) designing a path selection strategy for the leveled network, and (3) applying the scheduling algorithm. This strategy yields randomized algorithms for routing and sorting in time proportional to the diameter for meshes, butterflies, shuffleexchange graphs, multidimensional arrays, and hypercubes. It also leads to the construction of an areauniversal network: an Nnode network with area Θ(N) that can simulate any other network of area O(N) with slowdown O(log N).
Packet Routing In FixedConnection Networks: A Survey
, 1998
"... We survey routing problems on fixedconnection networks. We consider many aspects of the routing problem and provide known theoretical results for various communication models. We focus on (partial) permutation, krelation routing, routing to random destinations, dynamic routing, isotonic routing ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
We survey routing problems on fixedconnection networks. We consider many aspects of the routing problem and provide known theoretical results for various communication models. We focus on (partial) permutation, krelation routing, routing to random destinations, dynamic routing, isotonic routing, fault tolerant routing, and related sorting results. We also provide a list of unsolved problems and numerous references.
The Case for Chaotic Adaptive Routing
 IEEE Transactions on Computers
, 1994
"... Chaotic routers are randomizing, nonminimal adaptive packet routers designed for use in the communication networks of parallel computers. Chaotic routing is reviewed along with other contemporary network routing approaches, including the stateoftheart oblivious routers. Each routing approach is ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
Chaotic routers are randomizing, nonminimal adaptive packet routers designed for use in the communication networks of parallel computers. Chaotic routing is reviewed along with other contemporary network routing approaches, including the stateoftheart oblivious routers. Each routing approach is evaluated for its effectiveness as a multicomputer message router. The results indicate that the Chaos router is the most effective of known routing methods. 1 Introduction In spite of the fact that network routing has been an active research area in recent years, leading to many diverse proposals, practical experience with routers is extremely limited. The routers used in most implemented parallel computers are all from a single class, known as oblivious routers. Most of the nonoblivious routers have appeared only in single instance machines such as the HEP, CM2, and CM5 computers, making it difficult to separate fundamental properties of the routers from artifacts of the specific insta...
Parametric Binary Dissection
 Institute for Computer
, 1993
"... Binary dissection is widely used to partition nonuniform domains over parallel computers. This algorithm does not consider the perimeter, surface area, or aspect ratio of the regions being generated and can yield decompositions that have poor communication to computation ratio. Parametric Binar ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
Binary dissection is widely used to partition nonuniform domains over parallel computers. This algorithm does not consider the perimeter, surface area, or aspect ratio of the regions being generated and can yield decompositions that have poor communication to computation ratio. Parametric Binary Dissection (PBD) is a new algorithm in which each cut is chosen to minimize load + \Theta(shape). In a 2 (or 3) dimensional problem, load is the amount of computation to be performed in a subregion and shape could refer to the perimeter (respectively surface) of that subregion. Shape is a measure of communication overhead and the parameter permits us to trade off load imbalance against communication overhead. When is zero, the algorithm reduces to plain binary dissection. This algorithm can be used to partition graphs embedded in 2 or 3d. Here load is the number of nodes in a subregion, shape the number of edges that leave that subregion, and the ratio of time to communicate o...
Broadcast communications and distributed algorithms
 IEEE Trans. on Computers
, 1986
"... AbstractThe paper addresses ways in which one can use "broadcast communication " in distributed algorithms and the relevant issues of design and complexity. We present an algorithm for merging k sorted lists of n/k elements using k processors and prove its worst case complexity to be 2n, regardless ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
AbstractThe paper addresses ways in which one can use "broadcast communication " in distributed algorithms and the relevant issues of design and complexity. We present an algorithm for merging k sorted lists of n/k elements using k processors and prove its worst case complexity to be 2n, regardless of the number of processors, while neglecting the cost arising from possible conflicts on the broadcast channel. We also show that this algorithm is optimal under singlechannel broadcast communication. In a variation of the algorithm, we show that by using an extra local memory of 0(k) the number of broadcasts is reduced to n. When the algorithm is used for sorting n elements with k processors, where each processor sorts its own list first and then merging, it has a complexity of 0(n/k log(n/k) + n), and is thus asymptotically optimal for large n. We also discuss the cost incurred by the channel access scheme and prove that resolving conflicts whenever k processors are involved introduces a cost factor of at least log k. Index Terms Access scheme, broadcast, complexity analysis, distributed algorithm, merging, parallel algorithms, sorting.
Deterministic Permutation Routing on Meshes
 Proc. 5th Symp. on Parallel and Distributed Proc., IEEE
, 1993
"... We present a new deterministic algorithm for routing permutations on a twodimensional MIMD mesh. The algorithm runs in the optimal time 2n \Gamma 2 on an n \Theta n mesh, and the maximal number of packets stored in a processing unit is 81. A modification of the algorithm, running in time 2n + O(1), ..."
Abstract

Cited by 20 (13 self)
 Add to MetaCart
We present a new deterministic algorithm for routing permutations on a twodimensional MIMD mesh. The algorithm runs in the optimal time 2n \Gamma 2 on an n \Theta n mesh, and the maximal number of packets stored in a processing unit is 81. A modification of the algorithm, running in time 2n + O(1), has the maximal queue length of only 31. The algorithm is simple, no conflictresolution strategy is required. Keywords: meshconnected computer, permutation routing, optimal algorithm, conflict freeness. 1 Introduction The exchange of information between processing units, PUs, is one of the most important problems on parallel computers in which the PUs communicate through an interconnection network. The basic communication step is that of transferring packets. These are portions of information generated and received by This research was partially supported by EC Cooperative Action IC1000 (project ALTEC: Algorithms for Future Technologies). y Instytut Informatyki, Uniwersytet Warszaws...
BranchandBound and Backtrack Search on MeshConnected Arrays of Processors
 Mathematical Systems Theory
, 1992
"... In this paper we investigate the parallel complexity of backtrack and branchandbound search on the meshconnected array. We present an \Omega\Gamma p dN= p log N) lower bound for the time needed by a randomized algorithm to perform backtrack and branchandbound search of a tree of depth d on ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
In this paper we investigate the parallel complexity of backtrack and branchandbound search on the meshconnected array. We present an \Omega\Gamma p dN= p log N) lower bound for the time needed by a randomized algorithm to perform backtrack and branchandbound search of a tree of depth d on the p N \Theta p N mesh, even when the depth of the tree is known in advance. The lower bound holds also for algorithms that are allowed to move treenodes and create multiple copies of the same treenode. For the upper bounds we give deterministic algorithms that are within a factor of O(log 3 2 N) from our lower bound. Our algorithms do not make any assumption on the shape of the tree to be searched, do not know the depth of the tree in advance and do not move treenodes nor create multiple copies of the same node. The best previously known algorithm for backtrack search on the mesh was randomized and required \Theta(d p N= log N) time. Our algorithm for branchandbound is the fir...
Universal Wormhole Routing
 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1993
"... In this paper, we examine the wormhole routing problem in terms of the "congestion" c and "dilation" d for a set of packet paths. We show, with mild restrictions, that there is a simple randomized algorithm for routing any set of P packets in O (cdj + cLj log P ) time with high probability, where L ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
In this paper, we examine the wormhole routing problem in terms of the "congestion" c and "dilation" d for a set of packet paths. We show, with mild restrictions, that there is a simple randomized algorithm for routing any set of P packets in O (cdj + cLj log P ) time with high probability, where L is the number of flits in a packet, and j = minfd; Lg; only a constant number of flits are stored in each queue at any time. Using this result, we show that a fattree network of area \Theta(A) can simulate wormhole routing on any network of comparable area with O(log 3 A) slowdown, when all worms have the same length. Variablelength worms are also considered. We run some simulations on the fattree which show that not only does wormhole routing tend to perform better than the more heavily studied storeandforward routing in this context, but that performance superior to our provable bound is attainable in practice.