Results 1 - 10
of
46
Direct Bulk-Synchronous Parallel Algorithms
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1992
"... We describe a methodology for constructing parallel algorithms that are transportable among parallel computers having different numbers of processors, different bandwidths of interprocessor communication and different periodicity of global synchronisation. We do this for the bulk-synchronous paralle ..."
Abstract
-
Cited by 157 (26 self)
- Add to MetaCart
We describe a methodology for constructing parallel algorithms that are transportable among parallel computers having different numbers of processors, different bandwidths of interprocessor communication and different periodicity of global synchronisation. We do this for the bulk-synchronous parallel (BSP) model, which abstracts the characteristics of a parallel machine into three numerical parameters p, g, and L, corresponding to processors, bandwidth, and periodicity respectively. The model differentiates memory that is local to a processor from that which is not, but, for the sake of universality, does not differentiate network proximity. The advantages of this model in supporting shared memory or PRAM style programming have been treated elsewhere. Here we emphasise the viability of an alternative direct style of programming where, for the sake of efficiency the programmer retains control of memory allocation. We show that optimality to within a multiplicative factor close to one ca...
Randomized routing and sorting on fixed-connection networks
- Journal of Algorithms
, 1994
"... This paper presents a general paradigm for the design of packet routing algorithms for fixed-connection networks. Its basis is a randomized on-line algorithm for scheduling any set of N packets whose paths have congestion c on any bounded-degree leveled network with depth L in O(c + L + log N) steps ..."
Abstract
-
Cited by 84 (13 self)
- Add to MetaCart
This paper presents a general paradigm for the design of packet routing algorithms for fixed-connection networks. Its basis is a randomized on-line algorithm for scheduling any set of N packets whose paths have congestion c on any bounded-degree leveled network with depth L in O(c + L + log N) steps, using constant-size queues. In this paradigm, the design of a routing algorithm is broken into three parts: (1) showing that the underlying network can emulate a leveled network, (2) designing a path selection strategy for the leveled network, and (3) applying the scheduling algorithm. This strategy yields randomized algorithms for routing and sorting in time proportional to the diameter for meshes, butterflies, shuffle-exchange graphs, multidimensional arrays, and hypercubes. It also leads to the construction of an area-universal network: an N-node network with area Θ(N) that can simulate any other network of area O(N) with slowdown O(log N).
The Power of Two Random Choices: A Survey of Techniques and Results
- in Handbook of Randomized Computing
, 2000
"... ITo motivate this survey, we begin with a simple problem that demonstrates a powerful fundamental idea. Suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random. Then the maximum load, or the largest number of balls in any bin, is approximately ..."
Abstract
-
Cited by 79 (2 self)
- Add to MetaCart
ITo motivate this survey, we begin with a simple problem that demonstrates a powerful fundamental idea. Suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random. Then the maximum load, or the largest number of balls in any bin, is approximately log n= log log n with high probability. Now suppose instead that the balls are placed sequentially, and each ball is placed in the least loaded of d 2 bins chosen independently and uniformly at random. Azar, Broder, Karlin, and Upfal showed that in this case, the maximum load is log log n= log d + (1) with high probability [ABKU99]. The important implication of this result is that even a small amount of choice can lead to drastically different results in load balancing. Indeed, having just two random choices (i.e.,...
Models of Machines and Computation for Mapping in Multicomputers
, 1993
"... It is now more than a quarter of a century since researchers started publishing papers on mapping strategies for distributing computation across the computation resource of multiprocessor systems. There exists a large body of literature on the subject, but there is no commonly-accepted framework ..."
Abstract
-
Cited by 76 (1 self)
- Add to MetaCart
It is now more than a quarter of a century since researchers started publishing papers on mapping strategies for distributing computation across the computation resource of multiprocessor systems. There exists a large body of literature on the subject, but there is no commonly-accepted framework whereby results in the field can be compared. Nor is it always easy to assess the relevance of a new result to a particular problem. Furthermore, changes in parallel computing technology have made some of the earlier work of less relevance to current multiprocessor systems. Versions of the mapping problem are classified, and research in the field is considered in terms of its relevance to the problem of programming currently available hardware in the form of a distributed memory multiple instruction stream multiple data stream computer: a multicomputer.
Randomized Routing on Fat-Trees
- Advances in Computing Research
, 1996
"... Fat-trees are a class of routing networks for hardware-efficient parallel computation. This paper presents a randomized algorithm for routing messages on a fat-tree. The quality of the algorithm is measured in terms of the load factor of a set of messages to be routed, which is a lower bound on the ..."
Abstract
-
Cited by 47 (10 self)
- Add to MetaCart
Fat-trees are a class of routing networks for hardware-efficient parallel computation. This paper presents a randomized algorithm for routing messages on a fat-tree. The quality of the algorithm is measured in terms of the load factor of a set of messages to be routed, which is a lower bound on the time required to deliver the messages. We show that if a set of messages has load factor on a fat-tree with n processors, the number of delivery cycles (routing attempts) that the algorithm requires is O(+lg n lg lg n) with probability 1 \Gamma O(1=n). The best previous bound was O( lg n) for the off-line problem in which the set of messages is known in advance. In the context of a VLSI model that equates hardware cost with physical volume, the routing algorithm can be used to demonstrate that fat-trees are universal routing networks. Specifically, we prove that any routing network can be efficiently simulated by a fat-tree of comparable hardware cost. 1 Introduction Fat-trees constitute...
Fast Algorithms for Bit-Serial Routing on a Hypercube
, 1991
"... In this paper, we describe an O(log N)-bit-step randomized algorithm for bit-serial message routing on a hypercube. The result is asymptotically optimal, and improves upon the best previously known algorithms by a logarithmic factor. The result also solves the problem of on-line circuit switching in ..."
Abstract
-
Cited by 36 (9 self)
- Add to MetaCart
In this paper, we describe an O(log N)-bit-step randomized algorithm for bit-serial message routing on a hypercube. The result is asymptotically optimal, and improves upon the best previously known algorithms by a logarithmic factor. The result also solves the problem of on-line circuit switching in an O(1)-dilated hypercube (i.e., the problem of establishing edge-disjoint paths between the nodes of the dilated hypercube for any one-to-one mapping). Our algorithm is adaptive and we show that this is necessary to achieve the logarithmic speedup. We generalize the Borodin-Hopcroft lower bound on oblivious routing by proving that any randomized oblivious algorithm on a polylogarithmic degree network requires at least \Omega\Gammaast 2 N= log log N) bit steps with high probability for almost all permutations. 1 Introduction Substantial effort has been devoted to the study of store-and-forward packet routing algorithms for hypercubic networks. The fastest algorithms are randomized, and c...
A Theory of Wormhole Routing in Parallel Computers
, 1993
"... Virtually all theoretical work on message routing in parallel computers has dwelt on packet routing: messages are conveyed as packets, an entire packet can reside at a node of the network, and a packet is sent from the queue of one node to the queue of another node until its reaches its destination. ..."
Abstract
-
Cited by 35 (2 self)
- Add to MetaCart
Virtually all theoretical work on message routing in parallel computers has dwelt on packet routing: messages are conveyed as packets, an entire packet can reside at a node of the network, and a packet is sent from the queue of one node to the queue of another node until its reaches its destination. A trend in multicomputer architecture, however, is to use wormhole routing. In wormhole routing a message is transmitted as a contiguous stream of bits, physically occupying a sequence of nodes/edges in the network. Thus, a message resembles a worm burrowing through the network. In this paper we give theoretical analyses of simple wormhole routing algorithms, showing them to be nearly optimal for butterfly and mesh connected networks. Our analysis requires initial random delays in injecting messages to the network. We report simulation results suggesting that the idea of random initial delays may have an impact beyond theoretical analysis. IBM Almaden Research Center, San Jose, CA., IBM A...
A Packet Routing Protocol for Arbitrary Networks
- In Proceedings of the 12th Symposium on Theoretical Aspects of Computer Science
, 1995
"... . In this paper, we introduce an on-line protocol which routes any set of packets along shortest paths through an arbitrary N-node network in O(congestion + diameter + log N) rounds, with high probability. This time bound is optimal up to the additive log N , and it was previously only reached for ..."
Abstract
-
Cited by 32 (16 self)
- Add to MetaCart
. In this paper, we introduce an on-line protocol which routes any set of packets along shortest paths through an arbitrary N-node network in O(congestion + diameter + log N) rounds, with high probability. This time bound is optimal up to the additive log N , and it was previously only reached for bounded-degree levelled networks. Further, we prove bounds on the congestion of random routing problems for Cayley networks and general node symmetric networks based on the construction of shortest paths systems. In particular, we give construction schemes for shortest paths systems and show that if every processor sends p packets to random destinations along the paths described in the paths system, then the congestion is bounded by O(p \Delta diameter + log N ), with high probability. Finally, we prove an (apparently suboptimal) congestion bound for random routing problems on randomly chosen regular networks. 1 Introduction Communication among the processors of a parallel computer usually ...
Constant queue routing on a mesh
- Journal of Parallel and Distributed Computing
, 1992
"... Packet routing is an important problem in parallel computation since a single step of inter-processor communication can be thought of as a packet routing task. In this paper we present an optimal algorithm for packet routing on a mesh-connected computer. Two important criteria for judging a routing ..."
Abstract
-
Cited by 29 (5 self)
- Add to MetaCart
Packet routing is an important problem in parallel computation since a single step of inter-processor communication can be thought of as a packet routing task. In this paper we present an optimal algorithm for packet routing on a mesh-connected computer. Two important criteria for judging a routing algorithm will be 1) its run time, i.e., the number of parallel steps it takes for the last packet to reach its destination, and 2) its queue size, i.e., the maximum number of packets that any node will have to store at any time during routing. We present a 2n − 2 step routing algorithm for an n × n MIMD mesh that requires a queue size of only 112. The previous best known result is a routing algorithm with the same time bound but with a queue size of 1008. The time bound of 2n − 2isoptimal. Aqueue size of 1008 is rather large for practical use. We believe that the queue size of our algorithm is practical. The improvement in the queue size is possible due to (from among other things) a new sorting algorithm for the MIMD mesh. 2 1
Bounding Delays in Packet-Routing Networks
- In Proceedings of the Twenty-Seventh Annual ACM Symposium on the Theory of Computing
, 1995
"... Consider the problem of computing the average packet delay in a general dynamic packet-routing network with Poisson input stream, during steady-state. Any packet-routing network can be formulated as a queueing network, where each server has a constant service time. If each server had exponentially-d ..."
Abstract
-
Cited by 28 (1 self)
- Add to MetaCart
Consider the problem of computing the average packet delay in a general dynamic packet-routing network with Poisson input stream, during steady-state. Any packet-routing network can be formulated as a queueing network, where each server has a constant service time. If each server had exponentially-distributed service time, queueing theory techniques could be used to determine the expected packet delay. However, it is not known how to compute the average packet delay for all but the simplest networks with constant time servers. It has been conjectured that to get an upper bound on expected packet delay in the constant service network, one can simply replace each constant time server with an exponential server of equal mean service time. This paper shows that for a large class of networks, this conjecture is true, but that surprisingly there exists a network for which it is false. This large class of networks is all queueing networks with Markovian routing. Queueing networks with Markovi...

