Results 1 - 10
of
18
On the Performance of a Dual Round-Robin Switch
- Proceedings of IEEE INFOCOM
, 2001
"... The Dual Round-Robin Matching (DRRM) switch [2] [3] has a scalable, low complexity architecture which allows for an aggregate bandwidth exceeding 1 Tb/s using current CMOS technology. In this paper we prove that the DRRM switch can achieve 100% throughput under i.i.d. and uniform traffic. The DRRM i ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
The Dual Round-Robin Matching (DRRM) switch [2] [3] has a scalable, low complexity architecture which allows for an aggregate bandwidth exceeding 1 Tb/s using current CMOS technology. In this paper we prove that the DRRM switch can achieve 100% throughput under i.i.d. and uniform traffic. The DRRM is the first practical matching scheme for which this property has been proved. The performance of the DRRM switch is then studied and compared with the iSLIP switch. The delay performance under uniform traffic and the hot-spot throughput of DRRM is better than that of iSLIP, while the throughput of iSLIP under some non-uniform traffic scenarios is slightly higher than that of DRRM. Since throughput drops below 100% under nonuniform traffic, we also examine some variations of the DRRM matching scheme for nonuniform traffic. Keywords--- switching,scheduling,Virtual Output Queueing, Dual Round Robin. I.
Matching Output Queueing with a Multiple Input/Output-Queued Switch
- IEEE/ACM TRANS. NETWORKING
, 2004
"... In this paper, we show that the multiple input/outputqueued (MIOQ) switch proposed in our previous paper [22] can emulate an output-queued switch only with two parallel switches. The MIOQ switch requires no speedup and provides an exact emulation of an output-queued switch with a broad class of serv ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
In this paper, we show that the multiple input/outputqueued (MIOQ) switch proposed in our previous paper [22] can emulate an output-queued switch only with two parallel switches. The MIOQ switch requires no speedup and provides an exact emulation of an output-queued switch with a broad class of service scheduling algorithms including FIFO, weighted fair queueing (WFQ) and strict priority queueing regardless of incoming traffic pattern and switch size. First, we show that an N MIOQ switch with a (2, 2)-dimensional crossbar fabric can exactly emulate an N N output-queued switch. For this purpose, we propose the stable strategic alliance (SSA) algorithm that can produce a stable many-to-many assignment, and then apply it to the scheduling of an MIOQ switch. Next, we prove that a (2, 2)-dimensional crossbar fabric can be implemented by two NN crossbar switches in parallel for an NN MIOQ switch. For a proper operation of two crossbar switches in parallel, each input-output pair matched by the SSA algorithm must be mapped to one of two crossbar switches. For this mapping, we propose a simple algorithm that requires at most 2N steps for all matched input-output pairs. In addition, to relieve the implementation burden of N input buffers being accessed simultaneously, we propose a buffering scheme called redundant buffering which requires two memory devices instead of N physically-separate memories.
The Dual Round Robin Matching Switch with Exhaustive Service
- PROC. OF THE IEEE WORKSHOP ON HIGH PERFORMANCE SWITCHING AND ROUTING. KOBE: IEEE COMMUNICATIONS SOCIETY
, 2002
"... Virtual Output Queuing is widely used by fixed-length highspeed switches to overcome head-of-line blocking. This is done by means of matching algorithms. Maximum matching algorithms have good performance, but their implementation complexity is quite high. Maximal matching algorithms need speedup ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Virtual Output Queuing is widely used by fixed-length highspeed switches to overcome head-of-line blocking. This is done by means of matching algorithms. Maximum matching algorithms have good performance, but their implementation complexity is quite high. Maximal matching algorithms need speedup to guarantee good performance. Iterative algorithms (such as PIM and iSLIP) use multiple iterations to converge on a maximal match. The Dual Round-Robin Matching (DRRM) scheme has performance similar to iSLIP and lower implementation complexity. The objective
A Pipeline-Based Approach for Maximal-Sized Matching Scheduling in Input-Buffered Switches
- IEEE Communications Letters
, 2001
"... This letter proposes an innovative pipeline-based maximal-sized matching scheduling approach, called PMM, for input-buffered switches. It dramatically relaxes the timing constraint for arbitration with a maximal matching scheme. In the PMM approach, arbitration operates in a pipelined manner. Each s ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
This letter proposes an innovative pipeline-based maximal-sized matching scheduling approach, called PMM, for input-buffered switches. It dramatically relaxes the timing constraint for arbitration with a maximal matching scheme. In the PMM approach, arbitration operates in a pipelined manner. Each subscheduler is allowed to take more than one time slot for its matching. Every time slot, one of them provides the matching result. The subscheduler can adopt a pre-existing efficient round-robin-based maximal matching algorithm. We show that PMM provides 100% throughput under uniform traffic since it preserves a desynchronization effect of the round-robin pointers as in the preexisting algorithm. In addition, PMM maintains fairness for best-effort traffic due to the round-robin-based arbitration.
An Efficient Scheduling Algorithm for CIOQ Switches with Space-Division Multiplexing Expansion
, 2003
"... Recently, CIOQ switches have attracted interest from both academic and industrial communities due to their ability of achieving 100% throughput and perfectly emulating OQ switch performance with a small speedup factor S. To achieve a speedup factor S, a conventional CIOQ switch requires the switch m ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Recently, CIOQ switches have attracted interest from both academic and industrial communities due to their ability of achieving 100% throughput and perfectly emulating OQ switch performance with a small speedup factor S. To achieve a speedup factor S, a conventional CIOQ switch requires the switch matrix and the memory to operate S times faster than the line rate. In this paper, we propose to use a CIOQ switch with spacedivision multiplexing expansion and grouped inputs/outputs (SDMG CIOQ switch for short) to achieve speedup while only requiring the switch matrix and the memory to operate at the line rate. The cell scheduling problem for the SDMG CIOQ switch is abstracted as a maximum bipartite k-matching problem. Using fluid model, we prove that any maximal size k-matching algorithm on an SDMG CIOQ switch with an expansion factor 2 can achieve 100% throughput assuming input arrivals satisfy the strong law of large numbers and no inputs/outpus are oversubscribed. We further propose an efficient and starvationfree maximal size k-matching scheduling algorithm, kFRR, for the SDMG CIOQ switch. Simulation results show that kFRR achieves 100% throughput with an expansion factor 2 under two SLLN traffic models, uniform traffic and polarized traffic, confirming our analysis.
On guaranteed smooth switching for buffered crossbar switches
- IEEE/ACM Trans. Networking
, 2008
"... Abstract—Scalability considerations drive the evolution of switch design from output queuing to input queuing and further to combined input and crosspoint queuing (CICQ). However, CICQ switches with credit-based flow control face new challenges of scalability and predictability. In this paper, we pr ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract—Scalability considerations drive the evolution of switch design from output queuing to input queuing and further to combined input and crosspoint queuing (CICQ). However, CICQ switches with credit-based flow control face new challenges of scalability and predictability. In this paper, we propose a novel approach of rate-based smoothed switching, and design a CICQ switch called the smoothed buffered crossbar or sBUX. First, the concept of smoothness is developed from two complementary perspectives of covering and spacing, which, commonly known as fairness and jitter, are unified in the same model. Second, a smoothed multiplexer sMUX is designed that allocates bandwidth among competing flows sharing a link and guarantees almost ideal smoothness for each flow. Third, the buffered crossbar sBUX is designed that uses the scheduler sMUX at each input and output, and a two-cell buffer at each crosspoint. It is proved that sBUX guarantees 100 % throughput for real-time services and almost ideal smoothness for each flow. Fourth, an on-line bandwidth regulator is designed that periodically estimates bandwidth demand and generates admissible allocations, which enables sBUX to support best-effort services. Simulation shows almost 100 % throughput and multi-microsecond average delay. In particular, neither credit-based flow control nor speedup is used, and arbitrary fabric-internal latency is allowed between line cards and the switch core, simplifying the switch implementation. Index Terms—Buffered crossbar, scheduling, smoothness, switches.
Petastar: A Petabit Photonic Packet Switch
- IEEE J. Select. Areas Commun
, 2003
"... Abstract—This paper presents a new petabit photonic packet switch architecture, called PetaStar. Using a new multidimensional photonic multiplexing scheme that includes space, time, wavelength, and subcarrier domains, PetaStar is based on a three-stage Clos-network photonic switch fabric to provide ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract—This paper presents a new petabit photonic packet switch architecture, called PetaStar. Using a new multidimensional photonic multiplexing scheme that includes space, time, wavelength, and subcarrier domains, PetaStar is based on a three-stage Clos-network photonic switch fabric to provide scalable large-dimension switch interconnections with nanosecond reconfiguration speed. Packet buffering is implemented electronically at the input and output port controllers, allowing the central photonic switch fabric to transport high-speed optical signals without electrical-to-optical conversion. Optical time-division multiplexing technology further scales port speed beyond electronic speed up to 160 Gb/s to minimize the fiber connections. To solve output port contention and internal blocking in the three-stage Clos-network switch, we present a new matching scheme, called c-MAC, a concurrent matching algorithm for Clos-network switches. It is highly distributed such that the input–output matching and routing-path finding are concurrently performed by scheduling modules. One feasible architecture for the c-MAC scheme, where a crosspoint switch is used to provide the interconnections between the arbitration modules, is also proposed. With the c-MAC scheme, and an internal speedup of 1.5, PetaStar with a switch size of 6400 6400 and total capacity of 1.024 petabit/s can be achieved at a throughput close to 100 % under various traffic conditions. Index Terms—Clos network, optical time-division multiplexing (OTDM), packet scheduling, photonic switch. I.
Concurrent RoundRobin Dispatching scheme for a Clos-Network Switches,” Prof
- IEEE ICC
, 2001
"... Abstract — A Clos-network switch architecture is attractive because of its scalability. Previously proposed implementable dispatching schemes from the first stage to the second stage, such as random dispatching, are not able to achieve a high throughput unless the internal bandwidth is expanded. Thi ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract — A Clos-network switch architecture is attractive because of its scalability. Previously proposed implementable dispatching schemes from the first stage to the second stage, such as random dispatching, are not able to achieve a high throughput unless the internal bandwidth is expanded. This paper proposes a concurrent round-robin dispatching (CRRD) scheme for a Clos-network switch, to overcome the throughput limitation of the random dispatching scheme. The CRRD scheme provides high switch throughput without expanding internal bandwidth. CRRD implementation is very simple because only simple roundrobin arbiters are adopted. In CRRD, the round-robin arbiters concurrently perform the matching between requesting cells and output links in each first-stage module to dispatch the cells to available second-stage modules. We show that CRRD achieves 100 % throughput under uniform traffic. When the offered load reaches 1.0, the pointers of roundrobin arbiters at the first-stage and second-stage modules are effectively desynchronized and contention is avoided. key words: Packet switch, Clos-network switch, dispatching, arbitration, throughput I.
Quick Birkhoff-von Neumann Decomposition Algorithm for Agile All-Photonic Network Cores”, accepted by 2006
- IEEE International Conference on Communications (ICC 2006
, 2006
"... Abstract—This paper presents a simple and efficient algorithm for timeslot allocation in agile all-photonic network (AAPN) cores working under a time division multiplexing (TDM) mode, called the Quick Birkhoff-von Neumann Decomposition Algorithm (QBvN). The time complexity of QBvN can reach ONη ( ) ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract—This paper presents a simple and efficient algorithm for timeslot allocation in agile all-photonic network (AAPN) cores working under a time division multiplexing (TDM) mode, called the Quick Birkhoff-von Neumann Decomposition Algorithm (QBvN). The time complexity of QBvN can reach ONη ( ) for a N × N switch with a TDM frame size of η. Another version of QBvN, called QBvN-cover, is also proposed to provide guaranteed scheduling with configuration overhead. For QBvNcover, the bound of the number of generated switch configurations is provided and hence the necessary speedup for AAPN cores. Under stream-type, continuous bit rate traffic, QBvN-cover shows superior delay performance compared with other heuristics in the literature. Although QBvN-cover is unlike other BvN algorithms that use a service matrix as input, we show that service matrix construction from traffic demand is necessary for QBvN-cover to perform well.
Constructing Service Matrices for Agile All-Optical Cores
"... A semi-analytical method based on alternate projections on a linear vector space is used to construct a service matrix from a traffic matrix, where the traffic matrix represents the bandwidth requested by the edge nodes and the service matrix represents how the bandwidth will be distributed by the c ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
A semi-analytical method based on alternate projections on a linear vector space is used to construct a service matrix from a traffic matrix, where the traffic matrix represents the bandwidth requested by the edge nodes and the service matrix represents how the bandwidth will be distributed by the core of an optical star network that operates in a Time Division Multiplexing mode. The algorithm iterates over a mathematical expression of complexity O(N 2), where N denotes the number of edge nodes. The complexity of the method is therefore O(kN 2) where k denotes the number of iterations needed to converge. With N large enough one observes that k<<N and hence this expression tends to O(N 2). Results show that the service matrices obtained with this projection method have very high measures of similarity to the original traffic matrix, with an average similarity greater than 95 % for N ≥ 32. The method is robust to inadmissible/bursty traffic and yields equal or improved delay performance in the optical network compared to other allocation methods. 1.

