Results 1  10
of
12
Rate Quantization and Service Quality over Single Crossbar Switches
 in Proceedings of IEEE INFOCOM, Hong Kong
, 2004
"... We study the provision of deterministic rate guarantees over single crossbar switches. Birkhoff decomposition yields a general approach for this problem, but the required complexity can be very high and the quality of service can be unsatisfactory for practical traffic sources. ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
We study the provision of deterministic rate guarantees over single crossbar switches. Birkhoff decomposition yields a general approach for this problem, but the required complexity can be very high and the quality of service can be unsatisfactory for practical traffic sources.
The inherent queuing delay of parallel packet switches
 IEEE Transactions on Parallel and Distributed Systems
, 2004
"... ..."
(Show Context)
Rate Guarantees and Overload Protection in InputQueued Switches
 IN PROCEEDINGS OF IEEE INFOCOM 2004
, 2004
"... Despite increasing bandwidth demand and the significant research and commercial activity in largescale Terabit routers for multigigabit/s links, many current switch designs do not provide adequate support for rate guarantees. In particular, designs based on the popular combinedinput/outputqueuei ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Despite increasing bandwidth demand and the significant research and commercial activity in largescale Terabit routers for multigigabit/s links, many current switch designs do not provide adequate support for rate guarantees. In particular, designs based on the popular combinedinput/outputqueueing (CIOQ) paradigm have unpredictable performance despite implementing sophisticated scheduling schemes on egress links, because the crossbar arbitration between ingress and egress links is done without regard to desired rate guarantees or prevailing traffic conditions. This paper describes the design of an inputqueued switch system and its associated arbitration and rate allocation algorithms that achieve both absolute rate guarantees and proportional bandwidth sharing even under overloaded or adversarial traffic. Our algorithms are simple and scalable and require a switch speedup of two to provide rate guarantees; we give the theoretical justification and report on simulation results that justify our claims. A semiconductor chipset based on variants of these algorithms for routers with an aggregate capacity of 160 Gbps with links up to 10 Gbps is now commercially available, and a secondgeneration chipset supporting 640 Gbps will be available soon.
Design and Implementation of a PerFlow Queue Manager for an ATM Switch Using FPGA Technology
 Institute of Computer
, 2002
"... Advanced Switches and routers rely mostly on Dynamic RAM technology for providing large, lowcost buffer space needed due to the burstiness of Internet traffic. Quality of Service is also desirable, therefore, per flow queueing of traffic is often implemented. ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Advanced Switches and routers rely mostly on Dynamic RAM technology for providing large, lowcost buffer space needed due to the burstiness of Internet traffic. Quality of Service is also desirable, therefore, per flow queueing of traffic is often implemented.
Randomization does not reduce the average delay in parallel packet switches
 In the 17th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA
, 2005
"... Switching cells in parallel is a common approach to building switches with very high external line rate and a large number of ports. A prime example is the parallel packet switch (in short, PPS) in which a demultiplexing algorithm sends cells, arriving at rate R on N inputports, through one of K in ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Switching cells in parallel is a common approach to building switches with very high external line rate and a large number of ports. A prime example is the parallel packet switch (in short, PPS) in which a demultiplexing algorithm sends cells, arriving at rate R on N inputports, through one of K intermediate slower switches, operating at rate r < R. In order to utilize the parallelism of the PPS, a key issue is to balance the load among the planes; since randomization is known as a successful paradigm to solve load balancing problems, it is tempting to design randomized demultiplexing algorithms that balance the load on the average. This paper presents lower bounds on the average queuing delay introduced by the PPS relative to an optimal workconserving FCFS switch, for randomized demultiplexing algorithms that does not have full and immediate information about the switch status. These lower bounds are shown to be asymptotically optimal through a methodology for analyzing the maximal relative queuing delay by measuring the imbalance between the middle stage switches; clearly, this also bounds (from above) the average relative queuing delay. The methodology is used to devise new algorithms that rely on slightly outdated global information on the switch status. It is also used to provide, for the first time, a complete proof of the maximum relative queuing delay provided by the fractional traffic dispatch algorithm [20, 25]. These optimal algorithms are deterministic, proving that randomization does not reduce the relative queuing delay of PPS.
ABSTRACT Randomization does not Reduce the Average Delay in Parallel Packet Switches
"... Switching cells in parallel is a common approach to build switches with very high external line rate and a large number of ports. A prime example is the parallel packet switch (in short, PPS) in which a demultiplexing algorithm sends cells, arriving at rate R on N inputports, through one of K inter ..."
Abstract
 Add to MetaCart
(Show Context)
Switching cells in parallel is a common approach to build switches with very high external line rate and a large number of ports. A prime example is the parallel packet switch (in short, PPS) in which a demultiplexing algorithm sends cells, arriving at rate R on N inputports, through one of K intermediate slower switches, operating at rate r < R. This paper presents lower bounds on the average queuing delay introduced by the PPS relative to an optimal workconserving FCFS switch, for demultiplexing algorithms that does not have full and immediate information about the switch status. The bounds hold even if the algorithm is randomized. These lower bounds are shown to be asymptotically optimal through a new methodology for analyzing the maximal relative queuing delay; this clearly upper bounds their average relative queuing delay. The methodology is used to devise a new algorithm that relies on slightly outdated global information on the switch status. It is also used to provide, for the first time, a complete proof of the maximum relative queuing delay provided by the fractional traffic dispatch [19, 22] algorithm.
Efficient, fully local algorithms for CIOQ switches
"... Abstract — A number of algorithms have been proposed in the literature for scheduling CIOQ switches. The algorithms which have been proven to provide strict performance guarantees on delay (via the emulation of an outputqueued switch) have been too complicated to implement because they require the ..."
Abstract
 Add to MetaCart
Abstract — A number of algorithms have been proposed in the literature for scheduling CIOQ switches. The algorithms which have been proven to provide strict performance guarantees on delay (via the emulation of an outputqueued switch) have been too complicated to implement because they require the exchange of a large amount of information between inputs and outputs. With implementation as our primary focus, we consider scheduling algorithms that are “fully local. ” This means inputs and outputs must be able to make decisions regarding matchings using only local information (except requests, grants and accepts). This constraint, which is essentially necessary for highspeed implementations, appears too restrictive for designing algorithms which enable the emulation of an outputqueued switch. Rather surprisingly, we find a very simple and fully local algorithm FLGS (for fully local GaleShapley) which, at a speedup of 2, emulates an outputqueued switch implementing a number of different output link scheduling algorithms such as weighted round robin and strict priority. We explore the performance of the algorithm at speedups between 1 and 2 using simulations and find that it partitions the bandwidth nearly as well as an outputqueued switch at speedups 1.2 or higher. I.