Results 1 - 10
of
50
Strong performance guarantees for asynchronous crossbar schedulers
- In IEEE INFOCOM
, 2006
"... Crossbar-based switches are commonly used to implement routers with throughputs up to about 1 Tb/s. The advent of crossbar scheduling algorithms that provide strong performance guarantees now makes it possible to engineer systems that perform well, even under extreme traffic conditions. Until recent ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
(Show Context)
Crossbar-based switches are commonly used to implement routers with throughputs up to about 1 Tb/s. The advent of crossbar scheduling algorithms that provide strong performance guarantees now makes it possible to engineer systems that perform well, even under extreme traffic conditions. Until recently, such performance guarantees have only been developed for crossbars that switch cells rather than variable length packets. Cell-based crossbars incur a worst-case bandwidth penalty of up to a factor of two, since they must fragment variable length packets into fixed length cells. In addition, schedulers for cell-based crossbars may fail to deliver the expected performance guarantees when used in routers that forward packets. We show how to obtain performance guarantees for asynchronous crossbars that are directly comparable to those previously developed for synchronous, cell-based crossbars. In particular we define derivatives of the Group by Virtual Output Queue (GVOQ) scheduler of Chuang et al. and the Least Occupied Output First Scheduler of Krishna et al. and show that both can provide strong performance guarantees in systems with speedup 2. Specifically, we show that these schedulers are work-conserving and that they can emulate an output-queued switch using any queueing discipline in the class of restricted Push-In, First-Out queueing disciplines. We also show that there are schedulers for segment-based crossbars, (introduced recently by Katevenis and Passas) that can deliver strong performance guarantees with small buffer requirements and no bandwidth fragmentation. 1
Throughput region of finitebuffered networks
- IEEE Trans. Parallel Distrib. Syst
, 2007
"... Abstract—Most of the current communication networks, including the Internet, are packet switched networks. One of the main reasons behind the success of packet switched networks is the possibility of performance gain due to multiplexing of network bandwidth. The multiplexing gain crucially depends o ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Most of the current communication networks, including the Internet, are packet switched networks. One of the main reasons behind the success of packet switched networks is the possibility of performance gain due to multiplexing of network bandwidth. The multiplexing gain crucially depends on the size of the buffers available at the nodes of the network to store packets at the congested links. However, most of the previous work assumes the availability of infinite buffer-size. In this paper, we study the effect of finite buffer-size on the performance of networks of interacting queues. In particular, we study the throughput of flow-controlled loss-less networks with finite buffers. The main result of this paper is the characterization of a dynamic scheduling policy that achieves the maximal throughput with a minimal finite buffer at the internal nodes of the network under memory-less (e.g., Bernoulli IID) exogenous arrival process. However, this ideal performance policy is rather complex and, hence, difficult to implement. This leads us to the design of a simpler and possibly implementable policy. We obtain a natural trade-off between throughput and buffer-size for such implementable policy. Finally, we apply our results to packet switches with buffered crossbar architecture. Index Terms—Queuing theory, flow-controlled networks, scheduling, packet switching, buffered crossbars. 1
Packet-mode emulation of output-queued switches
- ACM SPAA
, 2006
"... Most common network protocols (e.g., the Internet Protocol) work with variable size packets, whereas contemporary switches still operate with fixed size cells, which are easier to transmit and buffer. This necessitates packet segmentation and reassembly modules, resulting in significant computation ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
(Show Context)
Most common network protocols (e.g., the Internet Protocol) work with variable size packets, whereas contemporary switches still operate with fixed size cells, which are easier to transmit and buffer. This necessitates packet segmentation and reassembly modules, resulting in significant computation and communication overhead that might be too costly as switches become faster and bigger. It is therefore imperative to investigate an alternative mode of scheduling, in which packets are scheduled contiguously over the switch fabric. This paper studies packet-mode scheduling for the combined input output queued (CIOQ) switch architecture and investigates its cost. We devise frame-based schedulers that allow a packet-mode CIOQ switch with small speedup to mimic an ideal output-queued switch with bounded relative queuing delay. The schedulers are pipelined and are based on matrix decomposition. Our schedulers demonstrate a trade-off between the switch speedup and the relative queuing delay incurred while mimicking an output-queued switch. When the switch is allowed to incur high relative queuing delay, a speedup arbitrarily close to 2 suffices to mimic an ideal output-queued switch. This implies that a packet-mode scheduler does not require a fundamentally higher speedup than a cell-based scheduler. The relative queuing delay can be significantly reduced with just a doubling of the speedup. We further show that it is impossible to achieve zero relative queuing delay (that is, a perfect emulation), regardless of the switch speedup. Finally, we show that a speedup arbitrarily close to 1 suffices to mimic an output-queued switch with a bounded buffer size. Submitted to the regular track.
On guaranteed smooth switching for buffered crossbar switches
- IEEE/ACM Trans. Networking
, 2008
"... Abstract—Scalability considerations drive the evolution of switch design from output queuing to input queuing and further to combined input and crosspoint queuing (CICQ). However, CICQ switches with credit-based flow control face new challenges of scalability and predictability. In this paper, we pr ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Scalability considerations drive the evolution of switch design from output queuing to input queuing and further to combined input and crosspoint queuing (CICQ). However, CICQ switches with credit-based flow control face new challenges of scalability and predictability. In this paper, we propose a novel approach of rate-based smoothed switching, and design a CICQ switch called the smoothed buffered crossbar or sBUX. First, the concept of smoothness is developed from two complementary perspectives of covering and spacing, which, commonly known as fairness and jitter, are unified in the same model. Second, a smoothed multiplexer sMUX is designed that allocates bandwidth among competing flows sharing a link and guarantees almost ideal smoothness for each flow. Third, the buffered crossbar sBUX is designed that uses the scheduler sMUX at each input and output, and a two-cell buffer at each crosspoint. It is proved that sBUX guarantees 100 % throughput for real-time services and almost ideal smoothness for each flow. Fourth, an on-line bandwidth regulator is designed that periodically estimates bandwidth demand and generates admissible allocations, which enables sBUX to support best-effort services. Simulation shows almost 100 % throughput and multi-microsecond average delay. In particular, neither credit-based flow control nor speedup is used, and arbitrary fabric-internal latency is allowed between line cards and the switch core, simplifying the switch implementation. Index Terms—Buffered crossbar, scheduling, smoothness, switches.
The throughput of a buffered crossbar switch
- IEEE Communications Letters
, 2004
"... Abstract — The throughput of an input-queued crossbar switch – with a single FIFO queue at each input – is limited to 2−√2 ≈ 58.6 % for uniformly distributed, Bernoulli i.i.d. arrivals of fixed length packets. In this letter we prove that if the crossbar switch can buffer one packet at each crosspoi ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
(Show Context)
Abstract — The throughput of an input-queued crossbar switch – with a single FIFO queue at each input – is limited to 2−√2 ≈ 58.6 % for uniformly distributed, Bernoulli i.i.d. arrivals of fixed length packets. In this letter we prove that if the crossbar switch can buffer one packet at each crosspoint, then the throughput increases to 100 % asymptotically as N → ∞, where N is the number of switch ports. Index Terms — Input-queued switch, buffered crossbar switch, throughput. I.
Providing 100 % Throughput in a Buffered Crossbar Switch
"... Abstract- Buffered crossbar switches have received great attention recently because they have become technologically feasible, have simpler scheduling algorithms, and achieve better performance than a bufferless crossbar switch. Buffered crossbar switches have a buffer placed at each crosspoint. A c ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
(Show Context)
Abstract- Buffered crossbar switches have received great attention recently because they have become technologically feasible, have simpler scheduling algorithms, and achieve better performance than a bufferless crossbar switch. Buffered crossbar switches have a buffer placed at each crosspoint. A cell is first delivered to a crosspoint buffer and then transferred to the output port. With a speedup of two, a buffered crossbar switch has previously been proved to provide 100 % throughput. We propose what we believe is the first feasible scheduling scheme that can achieve 100 % throughput without speedup and a finite crosspoint buffer. The proposed scheme is called SQUISH: a Stable Queue Input-output Scheduler with Hamiltonian walk. With SQUISH, each input/output first makes decisions based on the information from the virtual output queues and crosspoint buffers. Then it is compared with a Hamiltonian walk schedule to avoid possible "bad " states. We then prove that SQUISH can achieve 100% throughput with a speedup of one. Our simulation results also show good delay performance for SQUISH. I.
A Reconfigurable Hardware Based Embedded Scheduler for Buffered Crossbar Switches
- ACM/SIGDA Fourteenth International Symposium on Field Programmable Gate Arrays (FPGA 2006
, 2006
"... In this paper, we propose a new internally buffered crossbar (IBC) switching architecture where the input and output distributed schedulers are embedded inside the crossbar fabric chip. As opposed to previous designs, where these schedulers are spread across input and output line cards, our design a ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
(Show Context)
In this paper, we propose a new internally buffered crossbar (IBC) switching architecture where the input and output distributed schedulers are embedded inside the crossbar fabric chip. As opposed to previous designs, where these schedulers are spread across input and output line cards, our design allows the schedulers to have cheap and fast access to the internal buffers, optimizes the flow control mechanism and makes the IBC more scalable. We employed the Xilinx Virtex-4FX platform to show the feasibility of our proposal and implemented a reconfigurable hardware based IBC switch with the maximum port count that we could fit on a single chip. The experiments suggest that a 24 × 24 IBC switch running a 10 Gbps port speed and a clock cycle time of 6.4 ns can be implemented.
Max-min fair bandwidth allocation algorithms for packet switches
- Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International
, 2007
"... With the rapid development of broadband applications, the capability of networks to provide quality of service (QoS) has become an important issue. Fair scheduling algorithms are a common approach for switches and routers to support QoS. All fair scheduling algorithms are running based on a bandwidt ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
(Show Context)
With the rapid development of broadband applications, the capability of networks to provide quality of service (QoS) has become an important issue. Fair scheduling algorithms are a common approach for switches and routers to support QoS. All fair scheduling algorithms are running based on a bandwidth allocation scheme. The scheme should be feasible in order to be applied in practice, and should be efficient to fully utilize available bandwidth and allocate bandwidth in a fair manner. However, since a single input port or output port of a switch has only the bandwidth information of its local flows (i.e., the flows traversing itself), it is difficult to obtain a globally feasible and efficient bandwidth allocation scheme. In this paper, we show how to fairly allocate bandwidth in packet switches based on the max-min fairness principle. We first formulate the problem, and give the definitions of feasibility and max-min fairness for bandwidth allocation in packet switches. As the first step to solve the problem, we consider the simpler unicast scenarios, and present the max-min fair bandwidth allocation algorithm for unicast traffic. We then extend the analysis to the more general multicast scenarios, and present the max-min fair bandwidth allocation algorithm for multicast traffic. We prove that both algorithms achieve max-min fairness, and analyze their complexity. The proposed algorithms are universally applicable to any type of switches and scheduling algorithms. 1
Design and Analysis of Optical Flow-Switched Networks
, 2009
"... In our previous work [Chan et al., “Optical flow switching,” in BROADNETS 2006, pp. 1–8; Weichenberg ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
In our previous work [Chan et al., “Optical flow switching,” in BROADNETS 2006, pp. 1–8; Weichenberg
Providing flow based performance guarantees for buffered crossbar switches
- IEEE IPDPS 2008
, 2008
"... Buffered crossbar switches are a special type of com-bined input-output queued switches with each crosspoint of the crossbar having small on-chip buffers. The introduc-tion of crosspoint buffers greatly simplifies the scheduling process of buffered crossbar switches, and furthermore en-ables buffere ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
(Show Context)
Buffered crossbar switches are a special type of com-bined input-output queued switches with each crosspoint of the crossbar having small on-chip buffers. The introduc-tion of crosspoint buffers greatly simplifies the scheduling process of buffered crossbar switches, and furthermore en-ables buffered crossbar switches with speedup of two to eas-ily provide port based performance guarantees. However, recent research results have indicated that, in order to pro-vide flow based performance guarantees, buffered crossbar switches have to either increase the speedup of the cross-bar to three or greatly increase the total number of cross-point buffers, both adding significant hardware complexity. In this paper, we present scheduling algorithms for buffered crossbar switches to achieve flow based performance guar-antees with speedup of two and with only one or two buffers at each crosspoint. When there is no crosspoint blocking in a specific time slot, only the simple and distributed in-put scheduling and output scheduling are necessary. Other-wise, the special urgent matching is introduced to guarantee the on-time delivery of crosspoint blocked cells. With the proposed algorithms, buffered crossbar switches can pro-vide flow based performance guarantees by emulating push-in-first-out output queued switches, and we use the counting method to formally prove the perfect emulation. For the special urgent matching, we present sequential and paral-lel matching algorithms. Both algorithms converge with N iterations in the worst case, and the latter needs less itera-tions in the average case. Finally, we discuss an alternative backup-buffer implementation scheme to the bypass path, and compare our algorithms with existing algorithms in the literature.