Results 1  10
of
56
Randomized routing and sorting on fixedconnection networks
 JOURNAL OF ALGORITHMS
, 1994
"... This paper presents a general paradigm for the design of packet routing algorithms for fixedconnection networks. Its basis is a randomized online algorithm for scheduling any set of N packets whose paths have congestion c on any boundeddegree leveled network with depth L in O(c + L + log N) steps ..."
Abstract

Cited by 86 (13 self)
 Add to MetaCart
(Show Context)
This paper presents a general paradigm for the design of packet routing algorithms for fixedconnection networks. Its basis is a randomized online algorithm for scheduling any set of N packets whose paths have congestion c on any boundeddegree leveled network with depth L in O(c + L + log N) steps, using constantsize queues. In this paradigm, the design of a routing algorithm is broken into three parts: (1) showing that the underlying network can emulate a leveled network, (2) designing a path selection strategy for the leveled network, and (3) applying the scheduling algorithm. This strategy yields randomized algorithms for routing and sorting in time proportional to the diameter for meshes, butterflies, shuffleexchange graphs, multidimensional arrays, and hypercubes. It also leads to the construction of an areauniversal network: an Nnode network with area Θ(N) that can simulate any other network of area O(N) with slowdown O(log N).
The MessageDriven Processor: A Multicomputer Processing Node with Efficient Mechanisms
 IEEE MICRO
, 1992
"... The MessageDriven Processor (MDP) is an integrated multicomputer node that provides efficient mechanisms for parallel computing. It incorporates a 36bit integer processor, a memory management unit, a router for a 3D mesh network, a network interface, a 4Kword x 36bit SRAM, and an ECC DRAM co ..."
Abstract

Cited by 83 (11 self)
 Add to MetaCart
The MessageDriven Processor (MDP) is an integrated multicomputer node that provides efficient mechanisms for parallel computing. It incorporates a 36bit integer processor, a memory management unit, a router for a 3D mesh network, a network interface, a 4Kword x 36bit SRAM, and an ECC DRAM controller in a single 1.1M transistor VLSI chip. Rather than being specialized for a single model of computation, the MDP incorporates efficient primitive mechanisms for communication, synchronization and naming. These mechanisms efficiently support most proposed parallel programming models. Each processing node of the MIT JMachine consists of an MDP with 1 MByte of DRAM. MDPs have been operational since June 1991 and JMachines built from them have been online since July 1991.
Fast Algorithms for BitSerial Routing on a Hypercube
, 1991
"... In this paper, we describe an O(log N)bitstep randomized algorithm for bitserial message routing on a hypercube. The result is asymptotically optimal, and improves upon the best previously known algorithms by a logarithmic factor. The result also solves the problem of online circuit switching in ..."
Abstract

Cited by 38 (10 self)
 Add to MetaCart
In this paper, we describe an O(log N)bitstep randomized algorithm for bitserial message routing on a hypercube. The result is asymptotically optimal, and improves upon the best previously known algorithms by a logarithmic factor. The result also solves the problem of online circuit switching in an O(1)dilated hypercube (i.e., the problem of establishing edgedisjoint paths between the nodes of the dilated hypercube for any onetoone mapping). Our algorithm is adaptive and we show that this is necessary to achieve the logarithmic speedup. We generalize the BorodinHopcroft lower bound on oblivious routing by proving that any randomized oblivious algorithm on a polylogarithmic degree network requires at least \Omega\Gammaast 2 N= log log N) bit steps with high probability for almost all permutations. 1 Introduction Substantial effort has been devoted to the study of storeandforward packet routing algorithms for hypercubic networks. The fastest algorithms are randomized, and c...
A theory of wormhole routing in parallel computers
 IEEE Transactions on Computers
, 1996
"... ..."
Packet Routing In FixedConnection Networks: A Survey
, 1998
"... We survey routing problems on fixedconnection networks. We consider many aspects of the routing problem and provide known theoretical results for various communication models. We focus on (partial) permutation, krelation routing, routing to random destinations, dynamic routing, isotonic routing ..."
Abstract

Cited by 35 (3 self)
 Add to MetaCart
We survey routing problems on fixedconnection networks. We consider many aspects of the routing problem and provide known theoretical results for various communication models. We focus on (partial) permutation, krelation routing, routing to random destinations, dynamic routing, isotonic routing, fault tolerant routing, and related sorting results. We also provide a list of unsolved problems and numerous references.
Wormhole Routing Techniques for Directly Connected Multicomputer Systems
 ACM Computing Surveys
, 1998
"... Wormhole routing has emerged as the most widely used switching technique in massively parallel computers. We present here a detailed survey of various techniques for enhancing the performance and reliability of the wormhole routing schemes in directly connected networks. We start with an overview of ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
(Show Context)
Wormhole routing has emerged as the most widely used switching technique in massively parallel computers. We present here a detailed survey of various techniques for enhancing the performance and reliability of the wormhole routing schemes in directly connected networks. We start with an overview of the direct network topologies and a comparison of various switching techniques. Next, the characteristics of wormhole routing mechanism are described in detail along with the theory behind deadlockfree routing. The performance of routing algorithms depends on the selection of path between the source and the destination, the network traffic, and the router design. The routing algorithms are implemented in the router chips. We outline the router characteristics and describe the functionality of various elements of the router. Depending on the usage of paths between the source and the destination, the routing algorithms are classified as deterministic, fully adaptive, and partially adaptive. ...
ThroughputCentric Routing Algorithm Design
 Past, Present, and Future,º Proc. 20th Anniversary Conf. Advanced Research in Very Large Systems Intelligence
, 2003
"... The increasing application space of interconnection networks now encompasses several applications, such as packet routing and I/O interconnect, where the throughput of a routing algorithm, not just its locality, becomes an important performance metric. We show that the problem of designing oblivious ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
The increasing application space of interconnection networks now encompasses several applications, such as packet routing and I/O interconnect, where the throughput of a routing algorithm, not just its locality, becomes an important performance metric. We show that the problem of designing oblivious routing algorithms that have high worstcase or averagecase throughput can be cast as a linear program. Globally optimal solutions to these optimization problems can be efficiently found, yielding provably good oblivious routing algorithms.
Performance Modeling of Distributed Memory Architectures
 JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1991
"... We provide performance models for several primitive operations on data structures distributed over memory units interconnected by a Boolean cube network. In particular, we model single source, and multiple source concurrent broadcasting or reduction, concurrent gather and scatter operations, shifts ..."
Abstract

Cited by 20 (7 self)
 Add to MetaCart
(Show Context)
We provide performance models for several primitive operations on data structures distributed over memory units interconnected by a Boolean cube network. In particular, we model single source, and multiple source concurrent broadcasting or reduction, concurrent gather and scatter operations, shifts along several axes of multidimensional arrays, and emulation of butterfly networks. We also show how the processor configuration, data aggregation, and the encoding of the address space affect the performance for two important basic computations: the multiplication of arbitrarily shaped matrices, and the Fast Fourier Transform. We also give an example of the performance behavior for local matrix operations for a processor with a single path to local memory, and a set of registers. The analytic models are verified by measurements on the Connection Machine model CM2.
Better Tradeoffs for Parallel List Ranking
, 1997
"... An earlier parallel list ranking algorithm performs well for problem sizes N that are extremely large in comparison to the number of PUs P . However, no existing algorithm gives good performance for reasonable loads. We present a novel family of algorithms, that achieve a better tradeoff between th ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
An earlier parallel list ranking algorithm performs well for problem sizes N that are extremely large in comparison to the number of PUs P . However, no existing algorithm gives good performance for reasonable loads. We present a novel family of algorithms, that achieve a better tradeoff between the number of startups and the routing volume. We have implemented them on an Intel Paragon, and they turn out to considerably outperform all earlier algorithms: with P = 2 the sequential algorithm is already beaten for N = 25,000; for P = 100 and N = 10 7 , the speedup is 21, and for N = 10 8 it even reaches 30. A modification of one of our algorithms solves a theoretical question: we show that on onedimensional processor arrays, list ranking can be solved with a number of steps equal to the diameter of the network. 1 Introduction A linked list, hereafter just list, is a basic data structure: it consists of nodes which are linked together, such that every node has precisely one predec...
Optimal Algorithms For Dissemination Of Information In Generalized Communication Modes
 Proc. PARLE'92, Lecture Notes in Computer Science 605
, 1992
"... Some generalized communication modes enabling the dissemination of information among processors of interconnection networks via vertexdisjoint or edgedisjoint paths in one communication step will be investigated. A thorough study of these communication modes will be presented by giving optimal alg ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
Some generalized communication modes enabling the dissemination of information among processors of interconnection networks via vertexdisjoint or edgedisjoint paths in one communication step will be investigated. A thorough study of these communication modes will be presented by giving optimal algorithms for broadcasting, accumulation and gossiping in most of the well known parallel architectures. For those networks in which a Hamiltonian path exists (Hypercubes, Cube Connected Cycles, Butterflies, Shuffle Exchange, etc.) optimal algorithms can be obtained quite easily, but for complete binary trees, complete kary trees (k 3) and arbitrary degree bounded graphs, the optimal algorithms as well as the matching lower bound proofs are more involved. An interesting consequence of the presented algorithms is the fact that in almost all these interconnection networks the gossip problem cannot be solved in less time than the sum of time complexities of the accumulation problem and the bro...