Results 1 - 10
of
45
Exact Emulation of an Output Queueing Switch by a Combined Input Output Queueing Switch
- In Sixth IEEE/IFIP International Workshop on Quality of Service
, 1998
"... Combined input output queueing switches (CIOQ) have better scaling properties than output queueing (OQ) switches. However, a CIOQ switch may have lower switch throughput, and more importantly, it is difficult to control delay in a CIOQ switch due to the existence of multiple queueing points. In this ..."
Abstract
-
Cited by 45 (1 self)
- Add to MetaCart
Combined input output queueing switches (CIOQ) have better scaling properties than output queueing (OQ) switches. However, a CIOQ switch may have lower switch throughput, and more importantly, it is difficult to control delay in a CIOQ switch due to the existence of multiple queueing points. In this paper, we study the following problem, originally formulated and studied by Prabhakar and Mckeown [16]: Can a CIOQ switch be designed to behave identically to an OQ switch? In [16], an algorithm was proposed so that a CIOQ switch with an internal speedup of four can behave identically to an OQ switch with FIFO as the output queueing discipline. In this paper, we propose a new switch scheduling algorithm called Joined Preferred Matching (JPM) that improves Prahhakar and Mckeown's results in two aspects. First, with JPM, the internal speedup needed for a CIOQ switch to achieve exact emulation of an OQ switch is only 2 instead of 4. Second, the result applies to OQ switches that employ a gener...
Practical Algorithms for Performance Guarantees in Buffered Crossbars
, 2005
"... Network operators would like high capacity routers that give guaranteed throughput, rate and delay guarantees. Because they want high capacity, the trend has been towards input queued or combined input and output queued (CIOQ) routers using crossbar switching fabrics. But these routers require impra ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
Network operators would like high capacity routers that give guaranteed throughput, rate and delay guarantees. Because they want high capacity, the trend has been towards input queued or combined input and output queued (CIOQ) routers using crossbar switching fabrics. But these routers require impractically complex scheduling algorithms to provide the desired guarantees. In this paper, we explore how a buffered crossbar --- a crossbar switch with a packet buffer at each crosspoint --- can provide guaranteed performance (throughput, rate, and delay), with less complex, practical scheduling algorithms. We describe scheduling algorithms that operate in parallel on each input and output port, and hence are scalable. With these algorithms, buffered crossbars with a speedup of two can provide 100% throughput, rate, and delay guarantees. Index Terms--- system design, combinatorics, packet switching, buffered crossbar, scheduling algorithm, performance guarantees, throughput, mimic, quality of service. I. BACKGROUND Network operators would like high capacity routers that give guaranteed performance. First, they prefer routers that guarantee throughput so they can maximize the utilization of their expensive long-haul links. Second, they want routers that can allocate to each flow a guaranteed rate. Third, they want the capability to control the delay for packets of individual flows for real-time applications. Because they want high capacity, the trend has been towards input queued or combined input and output queued (CIOQ) routers. Most of these routers use a crossbar switching fabric with a centralized scheduler. While it is theoretically possible to build crossbar schedulers that give 100% throughput [1] or rate and delay guarantees [2][3] they are considered too complex to b...
A Parallel-Polled Virtual Output Queued Switch with a Buffered Crossbar
"... Input buffered switches with Virtual Output Queues (VOQ) are scalable to very high speeds, but require switch matrix scheduling algorithms to achieve high throughput. Existing scheduling algorithms based on parallel requestgrant -accept cycles cannot natively support variable length Ethernet packets ..."
Abstract
-
Cited by 22 (8 self)
- Add to MetaCart
Input buffered switches with Virtual Output Queues (VOQ) are scalable to very high speeds, but require switch matrix scheduling algorithms to achieve high throughput. Existing scheduling algorithms based on parallel requestgrant -accept cycles cannot natively support variable length Ethernet packets. In this paper, a Parallel-Polled VOQ (PPVOQ) architecture is proposed that natively supports variable length packets. Small amounts of FIFO buffering within a crossbar are used. Using simulation, the PP-VOQ with buffered crossbar switch is shown to have lower switch delay at high offered loads than an iSLIP switch for both cell and variable-length packet traffic. The PP-VOQ switch does not require internal speed-up or complex reassembly mechanisms. Priority mechanism implemented in both the iSLIP and PPVOQ switches are demonstrated to provide guaranteed rate and bounded delay for schedulable traffic. I.
Coordinated Multihop Scheduling: A Framework for End-to-End Services
, 2002
"... In multi-hop networks, packet schedulers at downstream nodes have an opportunity to make up for excessive latencies due to congestion at upstream nodes. Similarly, when packets incur low delays at upstream nodes, downstream nodes can reduce priority and schedule other packets first. The goal of this ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
In multi-hop networks, packet schedulers at downstream nodes have an opportunity to make up for excessive latencies due to congestion at upstream nodes. Similarly, when packets incur low delays at upstream nodes, downstream nodes can reduce priority and schedule other packets first. The goal of this paper is to define a framework for design and analysis of Coordinated Multihop Scheduling (CMS) which exploit such inter-node coordination. We first provide a general CMS definition which enables us to classify a number of schedulers from the literature including, G-EDF, FIFO+, CEDF, and work-conserving CJVC as examples of CMS schedulers. We then develop a distributed theory of traffic envelopes which enables us to derive end-to-end statistical admission control conditions for CMS schedulers. We show that CMS schedulers are able to limit traffic distortion to within a narrow range resulting in improved end-to-end performance and more efficient resource utilization. Consequently, our technique exploits statistical resource sharing among flows, classes, and nodes, and our results provide the first statistical multi-node multi-class admission control algorithm for networks of work conserving servers.
Weighted Fairness in Buffered Crossbar Scheduling
- Proc. IEEE HPSR’03
, 2003
"... Abstract — The crossbar is the most popular packet switch architecture. By adding small buffers at the crosspoints, important advantages can be obtained: (1) Crossbar scheduling is simplified. (2) High throughput is achievable. (3) Weighted scheduling becomes feasible. In this paper we study the fai ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
Abstract — The crossbar is the most popular packet switch architecture. By adding small buffers at the crosspoints, important advantages can be obtained: (1) Crossbar scheduling is simplified. (2) High throughput is achievable. (3) Weighted scheduling becomes feasible. In this paper we study the fairness properties of a buffered crossbar with weighted fair schedulers. We show by means of simulation that, under heavy demand, the system will allocate throughput in a weighted max-min fair manner. We study the impact of the size of the crosspoint buffers in approximating the weighted max-min fair rates and we find that a small amount of buffering per crosspoint (3-8 cells) suffices for the maximum percentage discrepancy, to fall below 5 % for switches. 1
Variable packet size buffered crossbar (CICQ) switches
- IEEE ICC
, 2004
"... Abstract — One of the most widely used architectures for packet switches is the crossbar. A special version of a it is the buffered crossbar, where small buffers are associated with the crosspoints. The advantages of this organization, when compared to the unbuffered architecture, is that it needs m ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
Abstract — One of the most widely used architectures for packet switches is the crossbar. A special version of a it is the buffered crossbar, where small buffers are associated with the crosspoints. The advantages of this organization, when compared to the unbuffered architecture, is that it needs much simpler and slower scheduling circuits, while it can shape the switched traffic according to a given set of Quality of Service (QoS) criteria in a more efficient way. Furthermore, by supporting variable length packets throughout a buffered crossbar: a) there is no need for segmentation and reassembly circuits, b) no internal speedup is necessary, and c) synchronization between the input and output clock domains is simplified. In this paper we present an architecture, a hardware implementation analysis, and a performance evaluation of such a buffered crossbar. The proposed organization is simple, yet powerful and can be easily implemented using today’s technologies. Our evaluation shows that it outperforms most of the existing packet switch architectures, while its hardware cost is kept to a minimum. 1.
Proportional-share scheduling for distributed storage systems
- In ProACM Transactions on
, 2007
"... Fully distributed storage systems have gained popularity in the past few years because of their ability to use cheap commodity hardware and their high scalability. While there are a number of algorithms for providing differentiated quality of service to clients of a centralized storage system, the p ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Fully distributed storage systems have gained popularity in the past few years because of their ability to use cheap commodity hardware and their high scalability. While there are a number of algorithms for providing differentiated quality of service to clients of a centralized storage system, the problem has not been solved for distributed storage systems. Providing performance guarantees in distributed storage systems is more complex because clients may have different data layouts and access their data through different coordinators (access nodes), yet the performance guarantees required are global. This paper presents a distributed scheduling framework. It is an adaptation of fair queuing algorithms for distributed servers. Specifically, upon scheduling each request, it enforces an extra delay (possibly zero) that corresponds to the amount of service the client gets on other servers. Different performance goals, e.g., per storage node proportional sharing, total service proportional sharing or mixed, can be met by different delay functions. The delay functions can be calculated at coordinators locally so excess communication is avoided. The analysis and experimental results show that the framework can enforce performance goals under different data layouts and workloads. 1
Providing QoS Guarantees in Input Buffered Crossbar Switches with Speedup
, 1998
"... This dissertation investigates a number of issues related to providing Quality of Service guarantees in input-buffered crossbar switches with speedup. It is shown that speedup of 4 is sufficient to ensure 100% asymptotic throughput with any maximal matching algorithm employed by the arbiter. It is a ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
This dissertation investigates a number of issues related to providing Quality of Service guarantees in input-buffered crossbar switches with speedup. It is shown that speedup of 4 is sufficient to ensure 100% asymptotic throughput with any maximal matching algorithm employed by the arbiter. It is also demonstrated that the crossbar architecture is capable of providing delay guarantees comparable to those known for output-buffered switch architecture. Several algorithms which ensure different delay guarantees with different values of speedup are presented and analyzed.
Scheduling in non-blocking buffered three-stage switching fabrics
- Proceedings of IEEE INFOCOM
, 2006
"... Abstract — Three-stage non-blocking switching fabrics are the next step in scaling current crossbar switches to many hundreds or few thousands of ports. Congestion management, however, is the central open problem; without it, performance suffers heavily under real-world traffic patterns. Schedulers ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Abstract — Three-stage non-blocking switching fabrics are the next step in scaling current crossbar switches to many hundreds or few thousands of ports. Congestion management, however, is the central open problem; without it, performance suffers heavily under real-world traffic patterns. Schedulers for bufferless crossbars perform congestion management but are not scalable to high valencies and to multi-stage fabrics. Distributed scheduling, as used in buffered crossbars, is scalable but has never been scaled beyond crossbar valencies. We combine ideas from central and distributed schedulers, from request-grant protocols and from credit-based flow control, to propose a novel, practical architecture for scheduling in non-blocking buffered switching fabrics. The new architecture relies on multiple, independent, single-resource schedulers, operating in a pipeline. It: (i) isolates well-behaved against congested flows; (ii) provides throughput in excess of 95 % under unbalanced traffic, and delays that successfully compete again output queueing; (iii) provides weighted max-min fairness; (iv) directly operates on variable-size packets or multi-packet segments; (v) resequences cells or segments using very small buffers; and (vi) can be realistically implemented for a 1024×1024 reference fabric made out of 32×32 buffered crossbar switch elements. This paper carefully studies the many intricacies of the problem and the solution, discusses implementation, and provides performance simulation results. 1
The RR/RR CICQ Switch: Hardware Design for 10-Gbps Link Speed
- In Proceedings, IEEE International Performance, Computing, and Communications Conference
, 2003
"... The combined input and crossbar queued (CICQ) switch is an input buffered switch suitable for very high-speed networks. The implementation feasibility of the CICQ switch architecture for 24 ports and 10-Gbps link speed is shown in this paper with an FPGA-based design (estimated cost of $30,000 in mi ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
The combined input and crossbar queued (CICQ) switch is an input buffered switch suitable for very high-speed networks. The implementation feasibility of the CICQ switch architecture for 24 ports and 10-Gbps link speed is shown in this paper with an FPGA-based design (estimated cost of $30,000 in mid-2002). The bottleneck of a CICQ switch with RR scheduling is the RR poller. We develop a priority encoder based RR poller that uses feedback masking. This design has lower delay than any known design for an FPGA implementation.

