Results 1  10
of
198
Minimizing Congestion in General Networks
 IN PROCEEDINGS OF THE 43RD IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS
, 2002
"... A principle task in parallel and distributed systems is to reduce the communication load in the interconnection network, as this is usually the major bottleneck for the performance of distributed applications. In this paper we introduce a framework for solving online problems that aim to minimize t ..."
Abstract

Cited by 108 (13 self)
 Add to MetaCart
A principle task in parallel and distributed systems is to reduce the communication load in the interconnection network, as this is usually the major bottleneck for the performance of distributed applications. In this paper we introduce a framework for solving online problems that aim to minimize the congestion (i.e. the maximum load of a network link) in general topology networks. We apply this
ChernoffHoeffding Bounds for Applications with Limited Independence
 SIAM J. Discrete Math
, 1993
"... ChernoffHoeffding bounds are fundamental tools used in bounding the tail probabilities of the sums of bounded and independent random variables. We present a simple technique which gives slightly better bounds than these, and which more importantly requires only limited independence among the rando ..."
Abstract

Cited by 102 (10 self)
 Add to MetaCart
ChernoffHoeffding bounds are fundamental tools used in bounding the tail probabilities of the sums of bounded and independent random variables. We present a simple technique which gives slightly better bounds than these, and which more importantly requires only limited independence among the random variables, thereby importing a variety of standard results to the case of limited independence for free. Additional methods are also presented, and the aggregate results are sharp and provide a better understanding of the proof techniques behind these bounds. They also yield improved bounds for various tail probability distributions and enable improved approximation algorithms for jobshop scheduling. The "limited independence" result implies that a reduced amount of randomness and weaker sources of randomness are sufficient for randomized algorithms whose analyses use the ChernoffHoeffding bounds, e.g., the analysis of randomized algorithms for random sampling and oblivious packet routi...
The Power of Two Random Choices: A Survey of Techniques and Results
 in Handbook of Randomized Computing
, 2000
"... ITo motivate this survey, we begin with a simple problem that demonstrates a powerful fundamental idea. Suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random. Then the maximum load, or the largest number of balls in any bin, is approximately ..."
Abstract

Cited by 98 (2 self)
 Add to MetaCart
ITo motivate this survey, we begin with a simple problem that demonstrates a powerful fundamental idea. Suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random. Then the maximum load, or the largest number of balls in any bin, is approximately log n= log log n with high probability. Now suppose instead that the balls are placed sequentially, and each ball is placed in the least loaded of d 2 bins chosen independently and uniformly at random. Azar, Broder, Karlin, and Upfal showed that in this case, the maximum load is log log n= log d + (1) with high probability [ABKU99]. The important implication of this result is that even a small amount of choice can lead to drastically different results in load balancing. Indeed, having just two random choices (i.e.,...
RouteBricks: Exploiting Parallelism to Scale Software Routers
 In Proceedings of the 22nd ACM Symposium on Operating Systems Principles
, 2009
"... We revisit the problem of scaling software routers, motivated by recent advances in server technology that enable highspeed parallel processing—a feature router workloads appear ideally suited to exploit. We propose a software router architecture that parallelizes router functionality both across mu ..."
Abstract

Cited by 95 (12 self)
 Add to MetaCart
We revisit the problem of scaling software routers, motivated by recent advances in server technology that enable highspeed parallel processing—a feature router workloads appear ideally suited to exploit. We propose a software router architecture that parallelizes router functionality both across multiple servers and across multiple cores within a single server. By carefully exploiting parallelism at every opportunity, we demonstrate a 35Gbps parallel router prototype; this router capacity can be linearly scaled through the use of additional servers. Our prototype router is fully programmable using the familiar Click/Linux environment and is built entirely from offtheshelf, generalpurpose server hardware. 1
Hedera: Dynamic flow scheduling for data center networks
 In Proc. of Networked Systems Design and Implementation (NSDI) Symposium
, 2010
"... Today’s data centers offer tremendous aggregate bandwidth to clusters of tens of thousands of machines. However, because of limited port densities in even the highestend switches, data center topologies typically consist of multirooted trees with many equalcost paths between any given pair of hos ..."
Abstract

Cited by 94 (3 self)
 Add to MetaCart
Today’s data centers offer tremendous aggregate bandwidth to clusters of tens of thousands of machines. However, because of limited port densities in even the highestend switches, data center topologies typically consist of multirooted trees with many equalcost paths between any given pair of hosts. Existing IP multipathing protocols usually rely on perflow static hashing and can cause substantial bandwidth losses due to longterm collisions. In this paper, we present Hedera, a scalable, dynamic flow scheduling system that adaptively schedules a multistage switching fabric to efficiently utilize aggregate network resources. We describe our implementation using commodity switches and unmodified hosts, and show that for a simulated 8,192 host data center, Hedera delivers bisection bandwidth that is 96 % of optimal and up to 113 % better than static loadbalancing methods. 1
Randomized routing and sorting on fixedconnection networks
 Journal of Algorithms
, 1994
"... This paper presents a general paradigm for the design of packet routing algorithms for fixedconnection networks. Its basis is a randomized online algorithm for scheduling any set of N packets whose paths have congestion c on any boundeddegree leveled network with depth L in O(c + L + log N) steps ..."
Abstract

Cited by 88 (13 self)
 Add to MetaCart
This paper presents a general paradigm for the design of packet routing algorithms for fixedconnection networks. Its basis is a randomized online algorithm for scheduling any set of N packets whose paths have congestion c on any boundeddegree leveled network with depth L in O(c + L + log N) steps, using constantsize queues. In this paradigm, the design of a routing algorithm is broken into three parts: (1) showing that the underlying network can emulate a leveled network, (2) designing a path selection strategy for the leveled network, and (3) applying the scheduling algorithm. This strategy yields randomized algorithms for routing and sorting in time proportional to the diameter for meshes, butterflies, shuffleexchange graphs, multidimensional arrays, and hypercubes. It also leads to the construction of an areauniversal network: an Nnode network with area Θ(N) that can simulate any other network of area O(N) with slowdown O(log N).
Models of Machines and Computation for Mapping in Multicomputers
, 1993
"... It is now more than a quarter of a century since researchers started publishing papers on mapping strategies for distributing computation across the computation resource of multiprocessor systems. There exists a large body of literature on the subject, but there is no commonlyaccepted framework ..."
Abstract

Cited by 79 (1 self)
 Add to MetaCart
It is now more than a quarter of a century since researchers started publishing papers on mapping strategies for distributing computation across the computation resource of multiprocessor systems. There exists a large body of literature on the subject, but there is no commonlyaccepted framework whereby results in the field can be compared. Nor is it always easy to assess the relevance of a new result to a particular problem. Furthermore, changes in parallel computing technology have made some of the earlier work of less relevance to current multiprocessor systems. Versions of the mapping problem are classified, and research in the field is considered in terms of its relevance to the problem of programming currently available hardware in the form of a distributed memory multiple instruction stream multiple data stream computer: a multicomputer.
Optimal Oblivious Routing in Polynomial Time
, 2003
"... A recent seminal result of Räcke is that for any network there is an oblivious routing algorithm with a polylog competitive ratio with respect to congestion. Unfortunately, Räcke's construction is not polynomial time. We give a polynomial time construction that guarantee's Räcke's bounds, and more g ..."
Abstract

Cited by 62 (9 self)
 Add to MetaCart
A recent seminal result of Räcke is that for any network there is an oblivious routing algorithm with a polylog competitive ratio with respect to congestion. Unfortunately, Räcke's construction is not polynomial time. We give a polynomial time construction that guarantee's Räcke's bounds, and more generally gives the true optimal ratio for any network.
Scheduling Nonuniform Traffic In A Packet Switching System With Small Propagation Delay
 IEEE/ACM Transactions on Networking
, 1994
"... A new model of nonuniform traffic is introduced for a singlehop packet switching system. This traffic model allows arbitrary traffic streams subject only to a constraint on the number of data packets which can arrive at any individual source in the system or for any individual destination in the sy ..."
Abstract

Cited by 60 (1 self)
 Add to MetaCart
A new model of nonuniform traffic is introduced for a singlehop packet switching system. This traffic model allows arbitrary traffic streams subject only to a constraint on the number of data packets which can arrive at any individual source in the system or for any individual destination in the system over time periods of specified length. The nonuniform traffic model is flexible enough to cover integrated data networks carrying diverse classes of data. The system model is rather general and includes passive optical star wavelength division networks. Transmission algorithms are introduced for a singlehop packet switching system with such nonuniform traffic and with propagation delay that is negligible relative to the packet length. The algorithms are based on collisionfree scheduling of packets using graph matching algorithms, since the global state of the system is known to all stations at any time. A companion paper introduces transmission algorithms for the same network and traf...
Scaling Internet Routers Using Optics
 ACM SIGCOMM
, 2003
"... Routers built around a singlestage crossbar and a centralized scheduler do not scale, and (in practice) do not provide the throughput guarantees that network operators need to make e#cient use of their expensive longhaul links. In this paper we consider how optics can be used to scale capacity and ..."
Abstract

Cited by 58 (15 self)
 Add to MetaCart
Routers built around a singlestage crossbar and a centralized scheduler do not scale, and (in practice) do not provide the throughput guarantees that network operators need to make e#cient use of their expensive longhaul links. In this paper we consider how optics can be used to scale capacity and reduce power in a router. We start with the promising loadbalanced switch architecture proposed by CS. Chang. This approach eliminates the scheduler, is scalable, and guarantees 100% throughput for a broad class of tra#c. But several problems need to be solved to make this architecture practical: (1) Packets can be missequenced, (2) Pathological periodic tra#c patterns can make throughput arbitrarily small, (3) The architecture requires a rapidly configuring switch fabric, and (4) It does not work when linecards are missing or have failed. In this paper we solve each problem in turn, and describe new architectures that include our solutions. We motivate our work by designing a 100Tb/s packetswitched router arranged as 640 linecards, each operating at 160Gb/s. We describe two di#erent implementations based on technology available within the next three years.