Results 1 - 10
of
20
Load Balanced Birkhoff-von Neumann Switches, Part II: Multi-stage Buffering
, 2001
"... The main objective of this sequel is to solve the out-of-sequence problem that occurs in the load balanced Birkhoff-von Neumann switch with one-stage buffering. We do this by adding a load-balancing buffer in front of the first stage and a resequencing-and-output buffer after the second stage. Moreo ..."
Abstract
-
Cited by 89 (12 self)
- Add to MetaCart
The main objective of this sequel is to solve the out-of-sequence problem that occurs in the load balanced Birkhoff-von Neumann switch with one-stage buffering. We do this by adding a load-balancing buffer in front of the first stage and a resequencing-and-output buffer after the second stage. Moreover, packets are distributed at the first stage according to their flows, instead of their arrival times in Part I. In this paper, we consider multicasting ows with two types of scheduling policies: the First Come First Served (FCFS) policy and the Earliest Deadline First (EDF) policy. The FCFS policy requires a jitter control mechanism in front of the second stage to ensure proper ordering of the traffic entering the second stage. For the EDF scheme, there is no need for jitter control. It uses the departure times of the corresponding FCFS output-buffered switch as deadlines and schedules packets according to their deadlines. For both policies, we show that the end-to-end delay through our multistage switch is bounded above by the sum of the delay from the corresponding FCFS output-buffered switch and a constant that only depends on the size of the switch and the number of multicasting flows supported by the switch.
Scaling Internet Routers Using Optics
- ACM SIGCOMM
, 2003
"... Routers built around a single-stage crossbar and a centralized scheduler do not scale, and (in practice) do not provide the throughput guarantees that network operators need to make e#cient use of their expensive long-haul links. In this paper we consider how optics can be used to scale capacity and ..."
Abstract
-
Cited by 49 (15 self)
- Add to MetaCart
Routers built around a single-stage crossbar and a centralized scheduler do not scale, and (in practice) do not provide the throughput guarantees that network operators need to make e#cient use of their expensive long-haul links. In this paper we consider how optics can be used to scale capacity and reduce power in a router. We start with the promising load-balanced switch architecture proposed by CS. Chang. This approach eliminates the scheduler, is scalable, and guarantees 100% throughput for a broad class of tra#c. But several problems need to be solved to make this architecture practical: (1) Packets can be mis-sequenced, (2) Pathological periodic tra#c patterns can make throughput arbitrarily small, (3) The architecture requires a rapidly configuring switch fabric, and (4) It does not work when linecards are missing or have failed. In this paper we solve each problem in turn, and describe new architectures that include our solutions. We motivate our work by designing a 100Tb/s packet-switched router arranged as 640 linecards, each operating at 160Gb/s. We describe two di#erent implementations based on technology available within the next three years.
Routers with a Single Stage of Buffering
, 2002
"... Most high performance routers today use combined input and output queueing (CIOQ). The CIOQ router is also frequently used as an abstract model for routers: at one extreme is input queueing, at the other extreme is output queueing, and in-between there is a continuum of performance as the speedup is ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
Most high performance routers today use combined input and output queueing (CIOQ). The CIOQ router is also frequently used as an abstract model for routers: at one extreme is input queueing, at the other extreme is output queueing, and in-between there is a continuum of performance as the speedup is increased from 1 to N (where N is the number of linecards). The model includes architectures in which a switch fabric is sandwiched between two stages of buffering. There is a rich and growing theory for CIOQ routers, including algorithms, throughput results and conditions under which delays can be guaranteed. But there is a broad class of architectures that are not captured by the CIOQ model, including routers with centralized shared memory, and load-balanced routers. In this paper we propose an abstract model called Single-Buffered (SB) routers that includes these architectures. We describe a method called Constraint Sets to analyze a number of SB router architectures. The model helped identify previously unstudied architectures, in particular the Distributed Shared Memory router. Although commercially deployed, its performance is not widely known. We find conditions under which it can emulate an ideal shared memory router, and believe it to be a promising architecture. Questions remain about its complexity, but we find that the memory bandwidth, and potentially the power consumption of the router is lower than for a CIOQ router.
Providing Guaranteed Rate Services in the Load Balanced Birkhoff-von Neumann Switches
, 2003
"... In this paper, we propose two schemes for the load balanced Birkhoff-von Neumann switches to provide guaranteed rate services. As in [7], the first scheme is based on an Earliest Deadline First (EDF) scheduling policy. In such a scheme, we assign every packet of a guaranteed rate flow a targeted dep ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
In this paper, we propose two schemes for the load balanced Birkhoff-von Neumann switches to provide guaranteed rate services. As in [7], the first scheme is based on an Earliest Deadline First (EDF) scheduling policy. In such a scheme, we assign every packet of a guaranteed rate flow a targeted departure time that is the departure time from the corresponding work conserving link with capacity equal to the guaranteed rate. By adding a jitter control mechanism in front of the buffer at the second stage and running the EDF policy at the output buffer, we show that the end-to-end delay for every packet of a guaranteed rate flow is bounded by the sum of its targeted departure time and a constant that only depends on the number of flows and the size of the switch.
Using Switched Delay Lines for Exact Emulation of FIFO Multiplexers with Variable Length Bursts
- IEEE Journal on Selected Areas in Communications
, 2003
"... It has been studied extensively in the literature how one achieves exact emulation of First In First Out (FIFO) multiplexers for fixed size cells (or packets) using optical crossbar Switches and fiber Delay Lines (SDL). In this paper, we take a step further and propose a new architecture that achiev ..."
Abstract
-
Cited by 11 (9 self)
- Add to MetaCart
It has been studied extensively in the literature how one achieves exact emulation of First In First Out (FIFO) multiplexers for fixed size cells (or packets) using optical crossbar Switches and fiber Delay Lines (SDL). In this paper, we take a step further and propose a new architecture that achieves exact emulation of FIFO multiplexers for variable length bursts. Our architecture consists of two blocks: a cell scheduling block and an FIFO multiplexer for fixed size cells. Both blocks are made of SDL units. The objective of the cell scheduling block is to schedule cells in a burst to the right input at the right time so that cells in the same burst depart contiguously from the multiplexer for fixed size cells. We show that cell scheduling can be done efficiently by keeping track of a single state variable, called the total virtual waiting time in this paper. Moreover, the delay through the cell scheduling block is bounded above by a constant that only depends on the number of inputs and the maximum number of cells in a burst. Such a delay bound provides a limit on the number of fiber delay lines needed in the cell scheduling block.
Mailbox Switch: A Scalable Two-stage Switch Architecture for Conflict Resolution of Ordered Packets
- Proceedings of IEEE INFOCOM
, 2004
"... Traditionally, conflict resolution in an inputbuffered switch is solved by finding a matching between inputs and outputs per time slot. To do this, a switch not only needs to gather the information of the virtual output queues at the inputs, but also uses the gathered information to compute a matchi ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Traditionally, conflict resolution in an inputbuffered switch is solved by finding a matching between inputs and outputs per time slot. To do this, a switch not only needs to gather the information of the virtual output queues at the inputs, but also uses the gathered information to compute a matching. As such, both the communication overhead and the computation overhead make it difficult to scale. Recent works on the two-stage switch architecture in [6], [7], [12], [8] showed that conflict resolution can be easily solved over time and space without communication and computation overhead. However, the main problem of such a two-stage switch architecture is that packets might be out of sequence. The main objective of this paper is to propose a scalable solution, called the mailbox switch, that solves the out-of-sequence problem in the two-stage switch architecture. The key idea of the mailbox switch is to use a set of symmetric connection patterns to create a feedback path for packet departure times. With the information of packet departure times, the mailbox switch can schedule packets so that they depart in the order of their arrivals. Despite the simplicity of the mailbox switch, we show via both the theoretical models and simulations that the throughput of the mailbox switch can be as high as 75%. With limited resequencing delay, a modified version of the mailbox switch achieves 95% throughput. We also propose a recursive way to construct the switch fabrics for the set of symmetric connection patterns. If the number of inputs, N, is a power of 2, we show that the switch fabric for the mailbox switch can be built with switches.
Padded frames: a novel algorithm for stable scheduling in load-balanced switches
, 2008
"... The Load-balanced Birkhoff-von Neumann Switching architecture consists of two stages: a load balancer and a deterministic input-queued crossbar switch. The advantages of this architecture are its simplicity and scalability, while its main drawback is the possible out-of-sequence reception of packets ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
The Load-balanced Birkhoff-von Neumann Switching architecture consists of two stages: a load balancer and a deterministic input-queued crossbar switch. The advantages of this architecture are its simplicity and scalability, while its main drawback is the possible out-of-sequence reception of packets belonging to the same flow. Several solutions have been proposed to overcome this problem; among the most promising are the Uniform Frame Spreading (UFS) and the Full Ordered Frames First (FOFF) algorithms. In this paper, we present a new algorithm called Padded Frames (PF), which eliminates the packet reordering problem, achieves 100 % throughput, and improves the delay performance of previously known algorithms.
Optimal Load-Balancing
- in Proceedings of IEEE Infocom
, 2005
"... This paper is about load-balancing packets across multiple paths inside a switch, or across a network. It is motivated by the recent interest in load-balanced switches. Load-balanced switches provide an appealing alternative to crossbars with centralized schedulers. A load-balanced switch has no sch ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
This paper is about load-balancing packets across multiple paths inside a switch, or across a network. It is motivated by the recent interest in load-balanced switches. Load-balanced switches provide an appealing alternative to crossbars with centralized schedulers. A load-balanced switch has no scheduler, is particularly amenable to optics, and -- most relevant here -- guarantees 100% throughput. A uniform mesh is used to loadbalance packets uniformly across all 2-hop paths in the switch. In this paper we explore whether this particular method of load-balancing is optimal in the sense that it achieves the highest throughput for a given capacity of interconnect. The method we use allows the load-balanced switch to be compared with ring, torus and hypercube interconnects, too. We prove that for a given interconnect capacity, the load-balancing mesh has the maximum throughput. Perhaps surprisingly, we find that the best mesh is slightly non-uniform, or biased, and has a throughput of N/(2N-1), where N is the number of nodes.
Design a simple and high performance switch using a two stage switch architecture
- IEEE Globecom’05
"... Abstract — Recently, there is tremendous interest in the research of two-stage switches. Unlike input-buffered switches, two-stage switches do not need to find matchings between inputs and outputs. As such, they are much easier to scale and much simpler to implement. However, twostage switches usual ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract — Recently, there is tremendous interest in the research of two-stage switches. Unlike input-buffered switches, two-stage switches do not need to find matchings between inputs and outputs. As such, they are much easier to scale and much simpler to implement. However, twostage switches usually suffer from the out-of-sequence problem. Though there are several methods proposed in the literature to solve such a problem, these proposed methods require either complex scheduling or additional hardware, which defeats the purpose of design simplicity. To design a simple and high performance switch using the two-stage architecture, we address three buffer design problems in this paper: re-sequencing buffers, central buffers and input buffers. We show that the size of the re-sequencing buffer needs to be proportional to the size of the central buffer to ensure that no packets are lost due to re-sequencing. Via simulations, we find that a moderate size of central buffer yields good throughput when traffic is not bursty. However, when the traffic is bursty, one needs to address the head-of-line blocking (HOL) problem at the input. We also find that using the round-robin service policy for multiple virtual output queues (VOQ) at inputs may exhibit a catastrophic phenomenon, called a non-ergodic mode. When a switch is trapped in a nonergodic mode, its throughput is sharply reduced. To solve such a problem in input buffers, we show that one may introduce “randomness ” into a switch to jump out of a non-ergodic mode.

