Results 1 - 10
of
25
Practical Algorithms for Performance Guarantees in Buffered Crossbars
, 2005
"... Network operators would like high capacity routers that give guaranteed throughput, rate and delay guarantees. Because they want high capacity, the trend has been towards input queued or combined input and output queued (CIOQ) routers using crossbar switching fabrics. But these routers require impra ..."
Abstract
-
Cited by 50 (2 self)
- Add to MetaCart
(Show Context)
Network operators would like high capacity routers that give guaranteed throughput, rate and delay guarantees. Because they want high capacity, the trend has been towards input queued or combined input and output queued (CIOQ) routers using crossbar switching fabrics. But these routers require impractically complex scheduling algorithms to provide the desired guarantees. In this paper, we explore how a buffered crossbar --- a crossbar switch with a packet buffer at each crosspoint --- can provide guaranteed performance (throughput, rate, and delay), with less complex, practical scheduling algorithms. We describe scheduling algorithms that operate in parallel on each input and output port, and hence are scalable. With these algorithms, buffered crossbars with a speedup of two can provide 100% throughput, rate, and delay guarantees. Index Terms--- system design, combinatorics, packet switching, buffered crossbar, scheduling algorithm, performance guarantees, throughput, mimic, quality of service. I. BACKGROUND Network operators would like high capacity routers that give guaranteed performance. First, they prefer routers that guarantee throughput so they can maximize the utilization of their expensive long-haul links. Second, they want routers that can allocate to each flow a guaranteed rate. Third, they want the capability to control the delay for packets of individual flows for real-time applications. Because they want high capacity, the trend has been towards input queued or combined input and output queued (CIOQ) routers. Most of these routers use a crossbar switching fabric with a centralized scheduler. While it is theoretically possible to build crossbar schedulers that give 100% throughput [1] or rate and delay guarantees [2][3] they are considered too complex to b...
Variable packet size buffered crossbar (CICQ) switches
- IEEE ICC
, 2004
"... Abstract — One of the most widely used architectures for packet switches is the crossbar. A special version of a it is the buffered crossbar, where small buffers are associated with the crosspoints. The advantages of this organization, when compared to the unbuffered architecture, is that it needs m ..."
Abstract
-
Cited by 30 (7 self)
- Add to MetaCart
(Show Context)
Abstract — One of the most widely used architectures for packet switches is the crossbar. A special version of a it is the buffered crossbar, where small buffers are associated with the crosspoints. The advantages of this organization, when compared to the unbuffered architecture, is that it needs much simpler and slower scheduling circuits, while it can shape the switched traffic according to a given set of Quality of Service (QoS) criteria in a more efficient way. Furthermore, by supporting variable length packets throughout a buffered crossbar: a) there is no need for segmentation and reassembly circuits, b) no internal speedup is necessary, and c) synchronization between the input and output clock domains is simplified. In this paper we present an architecture, a hardware implementation analysis, and a performance evaluation of such a buffered crossbar. The proposed organization is simple, yet powerful and can be easily implemented using today’s technologies. Our evaluation shows that it outperforms most of the existing packet switch architectures, while its hardware cost is kept to a minimum. 1.
Throughput region of finitebuffered networks
- IEEE Trans. Parallel Distrib. Syst
, 2007
"... Abstract—Most of the current communication networks, including the Internet, are packet switched networks. One of the main reasons behind the success of packet switched networks is the possibility of performance gain due to multiplexing of network bandwidth. The multiplexing gain crucially depends o ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Most of the current communication networks, including the Internet, are packet switched networks. One of the main reasons behind the success of packet switched networks is the possibility of performance gain due to multiplexing of network bandwidth. The multiplexing gain crucially depends on the size of the buffers available at the nodes of the network to store packets at the congested links. However, most of the previous work assumes the availability of infinite buffer-size. In this paper, we study the effect of finite buffer-size on the performance of networks of interacting queues. In particular, we study the throughput of flow-controlled loss-less networks with finite buffers. The main result of this paper is the characterization of a dynamic scheduling policy that achieves the maximal throughput with a minimal finite buffer at the internal nodes of the network under memory-less (e.g., Bernoulli IID) exogenous arrival process. However, this ideal performance policy is rather complex and, hence, difficult to implement. This leads us to the design of a simpler and possibly implementable policy. We obtain a natural trade-off between throughput and buffer-size for such implementable policy. Finally, we apply our results to packet switches with buffered crossbar architecture. Index Terms—Queuing theory, flow-controlled networks, scheduling, packet switching, buffered crossbars. 1
Katevenis: ”Multiple Priorities in a Two-Lane Buffered Crossbar
- Proc. IEEE Globecom, TX, USA, 29 Nov. - 4 Dec. 2004, CR-ROM paper ID
"... Abstract — A significant advantage of buffered crossbar (combined input-crosspoint queueing- CICQ) switches is that they can directly operate on variable-size packets, thus saving the costs and inefficiencies of packet segmentation and reassembly (SAR). However, in order to support multiple priority ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
(Show Context)
Abstract — A significant advantage of buffered crossbar (combined input-crosspoint queueing- CICQ) switches is that they can directly operate on variable-size packets, thus saving the costs and inefficiencies of packet segmentation and reassembly (SAR). However, in order to support multiple priority levels, separate queues per priority are needed at each crosspoint, in order to prevent HOL blocking and buffer hogging; these queues are expensive because they each need a size of at least one maximum-size packet. In this paper we propose a scheme that uses only two queues per crosspoint to effectively support multiple priorities. We adaptively adjust the priority levels of the two queues so that most traffic goes through the “lower ” queue, while the “upper ” queue remains usually available for higher priority packets to overtake the former. Through simulation, and assuming 8 priority levels, we compare our scheme to an ideal system that uses 8 queues per crosspoint. For realistic traffic, the two systems perform almost identically, although ours uses 4 times less memory in the crossbar. Even under a highly irregular traffic pattern Bursts60, our system will not increase the average delay of any priority level by more than 75 percent compared to the ideal system. 1
Variable-size multipacket segments in buffered crossbar (CICQ) architectures
- in Proceedings of the IEEE Int. Conf. on Communications (ICC’2005
, 2005
"... Abstract — Buffered crossbars can directly switch variable size packets, but require large crosspoint buffers to do so, especially when jumbo frames are to be supported. When this is not feasible, segmentation and reassembly (SAR) must be used. We propose a novel SAR scheme for buffered crossbars th ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
(Show Context)
Abstract — Buffered crossbars can directly switch variable size packets, but require large crosspoint buffers to do so, especially when jumbo frames are to be supported. When this is not feasible, segmentation and reassembly (SAR) must be used. We propose a novel SAR scheme for buffered crossbars that uses variable-size segments while merging multiple packets (or fragments thereof) into each segment. This scheme eliminates padding overhead, reduces header overhead, reduces crosspoint buffer size and is suitable for use with external, modern DRAM buffer memory in the ingress line cards. We evaluate the new scheme using simulation, and show that it outperforms existing segmentation schemes in buffered as well as unbuffered crossbars. We also study how the size of the maximum segment affects system performance.
Scheduling in non-blocking buffered three-stage switching fabrics
- Proceedings of IEEE INFOCOM
, 2006
"... Abstract — Three-stage non-blocking switching fabrics are the next step in scaling current crossbar switches to many hundreds or few thousands of ports. Congestion management, however, is the central open problem; without it, performance suffers heavily under real-world traffic patterns. Schedulers ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
(Show Context)
Abstract — Three-stage non-blocking switching fabrics are the next step in scaling current crossbar switches to many hundreds or few thousands of ports. Congestion management, however, is the central open problem; without it, performance suffers heavily under real-world traffic patterns. Schedulers for bufferless crossbars perform congestion management but are not scalable to high valencies and to multi-stage fabrics. Distributed scheduling, as used in buffered crossbars, is scalable but has never been scaled beyond crossbar valencies. We combine ideas from central and distributed schedulers, from request-grant protocols and from credit-based flow control, to propose a novel, practical architecture for scheduling in non-blocking buffered switching fabrics. The new architecture relies on multiple, independent, single-resource schedulers, operating in a pipeline. It: (i) isolates well-behaved against congested flows; (ii) provides throughput in excess of 95 % under unbalanced traffic, and delays that successfully compete again output queueing; (iii) provides weighted max-min fairness; (iv) directly operates on variable-size packets or multi-packet segments; (v) resequences cells or segments using very small buffers; and (vi) can be realistically implemented for a 1024×1024 reference fabric made out of 32×32 buffered crossbar switch elements. This paper carefully studies the many intricacies of the problem and the solution, discusses implementation, and provides performance simulation results. 1
On the maximal throughput of networks with finite buffers and its application to buffered crossbars
- in Proceedings of IEEE Infocom
, 2005
"... Abstract — The advent of packet networks has motivated many researchers to study the performance of networks of queues in the last decade or two. However, most of the previous work assumes the availability of infinite queue-size. Instead, in this paper, we study the maximal achievable throughput in ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
(Show Context)
Abstract — The advent of packet networks has motivated many researchers to study the performance of networks of queues in the last decade or two. However, most of the previous work assumes the availability of infinite queue-size. Instead, in this paper, we study the maximal achievable throughput in a flow-controlled lossless network with finite-queue size. In such networks, throughput depends on the packet scheduling policy utilized. As the main of this paper, we obtain a dynamic scheduling policy that achieves the maximal throughput (equal to the maximal throughput in the presence of infinite queuesize) with a minimal finite queue-size at the internal nodes of the network. Though the performance of the policy is ideal, it is quite complex and hence difficult to implement. This leads us to a design of simpler and possibly implementable policy. We obtain a natural trade-off between throughput and queue-size for this policy. We apply our results to the packet switches with buffered crossbar architecture. We propose a simple, implementable, distributed scheduling policy which provides high throughput in the presence of minimal internal buffer. We also obtain a natural trade-off between throughput, internal speedup and buffer-size providing a switch designer with a gamut of designs. To the best of authors ’ knowledge, this is one of the first attempts to study the throughput for general networks with finite queue-size. We believe that our methods are general and can be useful in other contexts. I.
Multicast scheduling in buffered crossbar switches with multiple input queues
- Proc. IEEE HPSR 2005
, 2005
"... Abstract—We consider the problem of scheduling multicast traffic in a buffered crossbar switch with multiple input queues at each input port. In this paper, we design and investigate a series of combinations of queuing policies and scheduling algorithms and report the simulation result. It is shown ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract—We consider the problem of scheduling multicast traffic in a buffered crossbar switch with multiple input queues at each input port. In this paper, we design and investigate a series of combinations of queuing policies and scheduling algorithms and report the simulation result. It is shown that a small number of input queues at each input port can dramatically improve the performance under burst multicast traffic in buffered crossbar switches. Under this architecture, it is feasible to design simple queuing policies and scheduling algorithms for high speed switches while keeping high performance and small size of buffer within crossbar. Index terms—scheduling; multicast; buffered crossbar switch I.
Scheduling in Switches with Small Internal Buffers
- Proc. IEEE Global Comm. Conf. (Globecom
, 2005
"... Abstract — Unbuffered crossbars or switching fabrics contain no internal buffers, and function using only input (VOQ) and possibly output queues. Schedulers for such switches are complex, and introduce increased delay at medium loads, because they have to admit at most one cell per input and per out ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
(Show Context)
Abstract — Unbuffered crossbars or switching fabrics contain no internal buffers, and function using only input (VOQ) and possibly output queues. Schedulers for such switches are complex, and introduce increased delay at medium loads, because they have to admit at most one cell per input and per output, during each time slot. Buffered crossbars, on the other hand, contain sufficient internal buffering (N 2 buffers) to allow independent schedulers to concurrently forward packets to the same output from any number of inputs. These architectures represent the two extremes in a range of solutions, which we examine here; although intermediate points in this range are of reduced practical interest for crossbars, they are nevertheless quite interesting for switching fabrics, and they may be of interest for optical switches. We find that tolerating two cells per-output per timeslot, using small buffers inside the switch or fabric, suffices for independent and efficient scheduling. First, we introduce a novel “request-grant ” credit protocol, enabling N inputs to share a small switch buffer. Then, we apply this protocol to a switch with N such buffers, one per output, and we consider the resulting scheduling problem. Interestingly, this looks like unbuffered crossbar schedulers, but it is much simpler because it comprises independent, single-resource schedulers that can be pipelined. We show that individual buffer sizes do not need to grow, neither with switch size nor with propagation delay. Through simulations, we study performance as a function of the number of cells allowed per-output per-time-slot. For one cell, the switch performs very close to the iSLIP unbuffered crossbar with one iteration. For more cells, performance improves quickly; for 12 cells, packet delay under (smooth) uniform load is practically as low as ideal output queueing. Under unbalanced load, throughput is superior to buffered crossbars, due to better buffer sharing. 1
Asymptotic performance limits of switches with buffered crossbars
- IEEE Transactions on Information Theory
, 2008
"... Abstract — Input queued switches exploiting buffered crossbars (CICQ switches) are widely considered very promising architectures that outperform input queued (IQ) switches with bufferless switching fabrics both in terms of architectural scalability and performance. Indeed the problem of scheduling ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Abstract — Input queued switches exploiting buffered crossbars (CICQ switches) are widely considered very promising architectures that outperform input queued (IQ) switches with bufferless switching fabrics both in terms of architectural scalability and performance. Indeed the problem of scheduling packets for transfer through the switching fabric is significantly simplified by the presence of internal buffers in the crossbar, which makes possible the adoption of efficient, simple and fully distributed scheduling algorithms. In this paper we study the throughput performance of CICQ switches supporting multicast traffic, showing that, similarly to IQ architectures, also CICQ switches with arbitrarily large number of ports may suffer of significant throughput degradation under “pathological ” multicast traffic patterns. Despite of the