Adaptive Packet Routing for Bursty Adversarial Traffic
, 1998
One of the central tasks of networking is packetrouting when edge bandwidth is limited. Tremendous progress has been achieved by separating the issue of routing into two conceptual subproblems: path selection and congestion resolution along the selected paths. However, this conceptual separation has a serious drawback: each packet's path is fixed at the source and cannot be modified adaptively enroute. The problem is especially severe when packet injections are modeled by an adversary, whose goal is to cause "trafficjams".
A constantfactor approximation algorithm for packet routing, and balancing local vs. global criteria
 In Proceedings of the ACM Symposium on the Theory of Computing (STOC
, 1997
Abstract. We present the first constantfactor approximation algorithm for a fundamental problem: the storeandforward packet routing problem on arbitrary networks. Furthermore, the queue sizes required at the edges are bounded by an absolute constant. Thus, this algorithmbalances a global criterion (routing time) with a local criterion (maximum queue size) and shows how to get simultaneous good bounds for both. For this particular problem, approximating the routing time well, even without considering the queue sizes, was open. We then consider a class of such local vs. global problems in the context of covering integer programs and show how to improve the local criterion by a logarithmic factor by losing a constant factor in the global criterion.
Distributed Packet Switching in Arbitrary Networks
 In Proceedings of the 28th Annual ACM Symposium on Theory of Computing
, 1996
In a seminal paper Leighton, Maggs, and Rao consider the packet scheduling problem when a single packet has to traverse each path. They show that there exists a schedule where each packet reaches its destination in O(C + D) steps, where C is the congestion and D is the dilation. The proof relies on the Lov'asz Local Lemma, and hence is not algorithmic. In a followup paper Leighton and Maggs use an algorithmic version of the Local Lemma due to Beck to give centralized algorithms for the problem. Leighton, Maggs, and Rao also give a distributed randomized algorithm where all packets reach their destinations with high probability in O(C +D log n) steps. In this paper we develop techniques to guarantee the high probability of delivering packets without resorting to the Lov'asz Local Lemma. We improve the distributed algorithm for problems with relatively high dilation to O(C) + (log n) O(log n) D + poly(log n). We extend the techniques to handle the case of infinite streams of ...
Accounting for memory bank contention and delay in highbandwidth multiprocessors
 In Proc. 7th ACM Symp. on Parallel Algorithms and Architectures
, 1997
Abstract—For years, the computation rate of processors has been much faster than the access rate of memory banks, and this divergence in speeds has been constantly increasing in recent years. As a result, several sharedmemory multiprocessors consist of more memory banks than processors. The object of this paper is to provide a simple model (with only a few parameters) for the design and analysis of irregular parallel algorithms that will give a reasonable characterization of performance on such machines. For this purpose, we extend Valiant’s bulksynchronous parallel (BSP) model with two parameters: a parameter for memory bank delay, the minimum time for servicing requests at a bank, and a parameter for memory bank expansion, the ratio of the number of banks to the number of processors. We call this model the (d, x)BSP. We show experimentally that the (d, x)BSP captures the impact of bank contention and delay on the CRAY C90 and J90 for irregular access patterns, without modeling machinespecific details of these machines. The model has clarified the performance characteristics of several unstructured algorithms on the CRAY C90 and J90, and allowed us to explore tradeoffs and optimizations for these algorithms. In addition to modeling individual algorithms directly, we also consider the use of the (d, x)BSP as a bridging model for emulating a very highlevel abstract model, the Parallel Random Access Machine (PRAM). We provide matching upper and lower bounds for emulating the EREW and QRQW PRAMs on the (d, x)BSP.
Efficient LowContention Parallel Algorithms
 the 1994 ACM Symp. on Parallel Algorithms and Architectures
, 1994
The queueread, queuewrite (qrqw) parallel random access machine (pram) model permits concurrent reading and writing to shared memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. The qrqw pram model reflects the contention properties of most commercially available parallel machines more accurately than either the wellstudied crcw pram or erew pram models, and can be efficiently emulated with only logarithmic slowdown on hypercubetype noncombining networks. This paper describes fast, lowcontention, workoptimal, randomized qrqw pram algorithms for the fundamental problems of load balancing, multiple compaction, generating a random permutation, parallel hashing, and distributive sorting. These logarithmic or sublogarithmic time algorithms considerably improve upon the best known erew pram algorithms for these problems, while avoiding the highcontention steps typical of crcw pram algorithms. An illustrative expe...
Packet Routing In FixedConnection Networks: A Survey
, 1998
We survey routing problems on fixedconnection networks. We consider many aspects of the routing problem and provide known theoretical results for various communication models. We focus on (partial) permutation, krelation routing, routing to random destinations, dynamic routing, isotonic routing, fault tolerant routing, and related sorting results. We also provide a list of unsolved problems and numerous references.
Bounding Delays in PacketRouting Networks
 In Proceedings of the TwentySeventh Annual ACM Symposium on the Theory of Computing
, 1995
Consider the problem of computing the average packet delay in a general dynamic packetrouting network with Poisson input stream, during steadystate. Any packetrouting network can be formulated as a queueing network, where each server has a constant service time. If each server had exponentiallydistributed service time, queueing theory techniques could be used to determine the expected packet delay. However, it is not known how to compute the average packet delay for all but the simplest networks with constant time servers. It has been conjectured that to get an upper bound on expected packet delay in the constant service network, one can simply replace each constant time server with an exponential server of equal mean service time. This paper shows that for a large class of networks, this conjecture is true, but that surprisingly there exists a network for which it is false. This large class of networks is all queueing networks with Markovian routing. Queueing networks with Markovi...
The QueueRead QueueWrite PRAM Model: Accounting for Contention in Parallel Algorithms
 Proc. 5th ACMSIAM Symp. on Discrete Algorithms
, 1997
Abstract. This paper introduces the queueread queuewrite (qrqw) parallel random access machine (pram) model, which permits concurrent reading and writing to sharedmemory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. Prior to this work there were no formal complexity models that accounted for the contention to memory locations, despite its large impact on the performance of parallel programs. The qrqw pram model reflects the contention properties of most commercially available parallel machines more accurately than either the wellstudied crcw pram or erew pram models: the crcw model does not adequately penalize algorithms with high contention to sharedmemory locations, while the erew model is too strict in its insistence on zero contention at each step. The�qrqw pram is strictly more powerful than the erew pram. This paper shows a separation of log n between the two models, and presents faster and more efficient qrqw algorithms for several basic problems, such as linear compaction, leader election, and processor allocation. Furthermore, we present a workpreserving emulation of the qrqw pram with only logarithmic slowdown on Valiant’s bsp model, and hence on hypercubetype noncombining networks, even when latency, synchronization, and memory granularity overheads are taken into account. This matches the bestknown emulation result for the erew pram, and considerably improves upon the bestknown efficient emulation for the crcw pram on such networks. Finally, the paper presents several lower bound results for this model, including lower bounds on the time required for broadcasting and for leader election.
Dynamic Routing on Networks with FixedSize Buffers
 In Proc. of the 14th ann. ACMSIAM Symposium on Discrete Algorithms
, 2003
William Aiello Rafail Ostrovsky Eyal Kushilevitz Adi Ros'en Abstract The combination of the buffer size of routers deployed in the Internet and the Internet traffic itself leads routinely to routers dropping packets. Motivated by this, we initiate the rigorous study of dynamic storeand forward routing on arbitrary networks in a model in which dropped packets must explicitly be taken into account. To avoid the uncertainties of traffic modeling, we consider arbitrary traffic on the network. We analyze and compare the effectiveness of several greedy, online, localcontrol protocols using a competitive analysis of the throughput. One goal of our approach is for the competitive results to continue to hold as a network grows without requiring the memory in the nodes to increase with the size of the network. Thus, in our model, we have link buffers of fixed size, B, which is independent of the size of the network, and B becomes a parameter of the model.
Scheduling TimeConstrained Communication in Linear Networks
 IN PROC. 10TH ANN. ACM SYMP. ON PARALLEL ALGORITHMS AND ARCHITECTURES
, 1998
We study the problem of centrally scheduling multiple messages in a linear network, when each message has both a release time and a deadline. We show that the problem of transmitting optimally many messages is NPhard, both when messages may be buffered in transit and when they may not be; for either case, we present efficient algorithms that produce approximately optimal schedules. In particular, our bufferless scheduling algorithm achieves throughput that is within a factor of two of optimal. We show that buffering can improve throughput in general by a logarithmic factor (but no more), but that in several significant special cases, such as when all messages can be released immediately, buffering can help by only a small constant factor. Finally, we show how to convert our centralized, offline bufferless schedules to equally productive fully...