## Optimal Deterministic Sorting and Routing on Grids and Tori with Diagonals (1996)

### Cached

### Download Links

- [www-fs.informatik.uni-tuebingen.de]
- [flop.informatik.tu-muenchen.de]
- DBLP

### Other Repositories/Bibliography

Citations: | 1 - 1 self |

### BibTeX

@MISC{Kunde96optimaldeterministic,

author = {Manfred Kunde and Rolf Niedermeier and Klaus Reinhardt and Peter Rossmanith},

title = {Optimal Deterministic Sorting and Routing on Grids and Tori with Diagonals},

year = {1996}

}

### OpenURL

### Abstract

We present deterministic sorting and routing algorithms for grids and tori with additional diagonal connections. For large loads (h 12), where each processor has at most h data packets in the beginning and in the end, the sorting problem can be solved in optimal hn=6+o(n) and hn=12+o(n) steps for grids and tori with diagonals, respectively. For smaller loads we present a new concentration technique that yields very fast algorithms for h ! 12. For a load of 1, the theoretically most interesting case, sorting takes only 1:2n + o(n) steps and routing only 1:1n + o(n) steps. For tori we can present optimal algorithms for all loads h 1. The above algorithms all use a constant size memory for all processors and never copy or split packets. If packets may be copied, 1--1 sorting can be done in only in 2 3 n + o(n) on a torus with diagonals. Gaining in general a speedup of 3 by only doubling the number of communication links compared to a grid without diagonals, our work suggests to build ...

### Citations

1306 |
Introduction to parallel algorithms and architectures: array, trees, hypercubes (Section 3
- Leighton
- 1992
(Show Context)
Citation Context ... mapping is the problem to transport for each block 1=m 2 of its elements to each other block of the mesh. The model of computation is the conventional one, where only nearest neighbors exchange data =-=[15, 17]-=-. In general, we disallow replication (that is, copying) of packets in our algorithms. A communication link can in one step transport at most one packet in each direction. Processors may store more th... |

249 |
Universal schemes for parallel communication
- Valiant, Brebner
- 1981
(Show Context)
Citation Context ...centrating data into a smaller area of a grid turns a 1--1 problem into an h--h problem. Since h--h problems were not studied intensively until quite recently [13] (though already Valiant and Brebner =-=[29]-=- considered them as early as in 1981 and others maybe even earlier), data concentration was introduced a short time ago [9]. The first use of concentration was to solve the 1--1 sorting problem in 2:5... |

168 |
Tight Bounds on the Complexity of Parallel Sorting
- Leighton
- 1984
(Show Context)
Citation Context ...ting problems. You can find a more detailed description in the paper that introduced all-to-all mappings [10]. There is a similarity between sorting with all-to-all mappings and Leighton's Columnsort =-=[16]-=-. The general results have been stated in previous work. The sorting method works on arbitrary networks. The problem is how to implement an efficient and fast all-to-all mapping. Our aim here is to re... |

89 |
Sorting on a Mesh-Connected Parallel Computer
- Thompson, Kung
- 1977
(Show Context)
Citation Context ...een studied for more than twenty years. Several 1--1 sorting algorithms exist for buffer size 1, i.e., each processor can store only one packet at each time. The fastest algorithms need 3n+o(n) steps =-=[19, 23, 28]-=-. For buffer size 2, the 1--1 sorting problem can be solved deterministically in 2:5n + o(n) transport steps [9]. Kaklamanis and Krizanc [3] presented a randomized algorithm (with constant buffer size... |

87 |
An Optimal Sorting Algorithm for Mesh Connected Computers
- Schnorr, Shamir
- 1986
(Show Context)
Citation Context ...een studied for more than twenty years. Several 1--1 sorting algorithms exist for buffer size 1, i.e., each processor can store only one packet at each time. The fastest algorithms need 3n+o(n) steps =-=[19, 23, 28]-=-. For buffer size 2, the 1--1 sorting problem can be solved deterministically in 2:5n + o(n) transport steps [9]. Kaklamanis and Krizanc [3] presented a randomized algorithm (with constant buffer size... |

80 |
Systolic arrays (for VLSI
- Kung, Leiserson
(Show Context)
Citation Context ...equip them with additional diagonal connections. In spite of the fact that meshes with diagonals are wellknown and have been used for some applications like matrix multiplication and LU decomposition =-=[14, 27]-=-, very little is known about how to exploit the additional communication links for faster sorting and routing. Equipping grids with additional wrap-around connections often leads to sorting and routin... |

66 |
Data broadcasting in SIMD computers
- Nassimi, Sahni
- 1981
(Show Context)
Citation Context ... Historical remarks In this section, we give a short account to the main points in the history of sorting and routing algorithms on grids (also see [25]). Thompson and Kung [28] and Nassimi and Sahni =-=[20]-=- were the first that presented O(n) steps algorithms for sorting on meshes. In 1986, Schnorr and Shamir [23] presented an optimal 3n+o(n) steps algorithm under the assumption of buffer size 1 (also se... |

53 |
Methods for Message Routing in Parallel Machines
- Leighton
- 1992
(Show Context)
Citation Context ... mapping is the problem to transport for each block 1=m 2 of its elements to each other block of the mesh. The model of computation is the conventional one, where only nearest neighbors exchange data =-=[15, 17]-=-. In general, we disallow replication (that is, copying) of packets in our algorithms. A communication link can in one step transport at most one packet in each direction. Processors may store more th... |

40 |
Concentrated Regular Data Streams on Grids: Sorting and Routing Near to the Bisection Bound
- Kunde
- 1991
(Show Context)
Citation Context ...he same. New results Problem with diagonals Without diagonals 1--1 routing 1:1n 2n Leighton et al. [18] 1--1 sorting 1:2n 2n Kaklamanis and Krizanc [3], Kaufmann et al. [5] 4--4 sorting 1:6n 4n Kunde =-=[9]-=- 8--8 sorting 1:86n 4n Kunde [10], Kaufmann et al. [5] 12--12 sorting 2n 6n Kunde [10], Kaufmann et al. [5] finally contains h packets. We regard the load h = O(1) as a small constant---we don't consi... |

39 | Horizons of Parallel Computation
- Bilardi, Preprata
- 1995
(Show Context)
Citation Context ...the focus of research on parallel computation for many years. Among others, one of the reasons for their popularity lies in their scalability, an important property that many other architectures lack =-=[1, 2, 24, 30]-=-. Routing and sorting are important algorithmic problems studied for mesh architectures because they are the building blocks for many algorithms. For conventional grids of processors with four-neighbo... |

35 |
Matching the bisection bound for routing and sorting on the mesh
- Kaufmann, Rajasekaran, et al.
- 1992
(Show Context)
Citation Context ...s to the h--h sorting problem for hs8, first a hn + o(n) steps algorithm for sorting was given [9]. Later an optimal randomized hn=2 + o(n) steps algorithm matching the bisection bound was discovered =-=[4]-=-. Recently, the first optimal deterministic algorithm was presented [10]. Later, by derandomizing the optimal randomized algorithm, Kaufmann, Sibeyn, and Suel obtained the same algorithm in a differen... |

35 |
Introduction to the configurable, highly parallel computer
- Snyder
- 1982
(Show Context)
Citation Context ...equip them with additional diagonal connections. In spite of the fact that meshes with diagonals are wellknown and have been used for some applications like matrix multiplication and LU decomposition =-=[14, 27]-=-, very little is known about how to exploit the additional communication links for faster sorting and routing. Equipping grids with additional wrap-around connections often leads to sorting and routin... |

32 |
A 2n \Gamma 2 Step Algorithm for Routing in an n \Theta n Array with Constant Size Queues
- Leighton, Makedon, et al.
- 1989
(Show Context)
Citation Context ...d. All other algorithms are deterministic. Except for h = 1 the results for sorting and routing are the same. New results Problem with diagonals Without diagonals 1--1 routing 1:1n 2n Leighton et al. =-=[18]-=- 1--1 sorting 1:2n 2n Kaklamanis and Krizanc [3], Kaufmann et al. [5] 4--4 sorting 1:6n 4n Kunde [9] 8--8 sorting 1:86n 4n Kunde [10], Kaufmann et al. [5] 12--12 sorting 2n 6n Kunde [10], Kaufmann et ... |

30 | Derandomizing Algorithms for Routing and Sorting on Meshes
- Kaufmann, Torsten, et al.
- 1994
(Show Context)
Citation Context ...s for many algorithms. For conventional grids of processors with four-neighborhood, there was a strong research focus until optimal results for deterministic sorting and routing were finally obtained =-=[5, 10]-=-. In this paper, we study grids with eight-neighborhood, that is, grids with diagonals, presenting optimal results for sorting and routing. The standard grid architecture with its four-neighborhood ha... |

30 | Constant Queue Routing on a Mesh
- Rajasekaran, Overholt
- 1991
(Show Context)
Citation Context ...ing, Leighton, Makedon, and Tollis [18] presented an optimal deterministic algorithm (with constant buffer size) that exactly matches the distance bound of 2n \Gamma 2 steps. Rajasekaran and Overholt =-=[22]-=- further reduced the buffer size. We present algorithms for grids with diagonals that need 1:2n + o(n) steps for 1--1 sorting and 1:1n + o(n) steps for 1--1 routing. For grids with diagonals, we summa... |

28 |
Spatial machines: a more realistic approach to parallel computation
- Feldman, Shapiro
- 1992
(Show Context)
Citation Context ...the focus of research on parallel computation for many years. Among others, one of the reasons for their popularity lies in their scalability, an important property that many other architectures lack =-=[1, 2, 24, 30]-=-. Routing and sorting are important algorithmic problems studied for mesh architectures because they are the building blocks for many algorithms. For conventional grids of processors with four-neighbo... |

21 |
Optimal Sorting on Mesh-Connected Processor Arrays
- Kaklamanis, Krizanc
- 1992
(Show Context)
Citation Context ... for h = 1 the results for sorting and routing are the same. New results Problem with diagonals Without diagonals 1--1 routing 1:1n 2n Leighton et al. [18] 1--1 sorting 1:2n 2n Kaklamanis and Krizanc =-=[3]-=-, Kaufmann et al. [5] 4--4 sorting 1:6n 4n Kunde [9] 8--8 sorting 1:86n 4n Kunde [10], Kaufmann et al. [5] 12--12 sorting 2n 6n Kunde [10], Kaufmann et al. [5] finally contains h packets. We regard th... |

14 |
Routing and sorting on meshconnected architectures
- Kunde
- 1988
(Show Context)
Citation Context ...ere the first that presented O(n) steps algorithms for sorting on meshes. In 1986, Schnorr and Shamir [23] presented an optimal 3n+o(n) steps algorithm under the assumption of buffer size 1 (also see =-=[8]-=- for the corresponding lower bound). Schnorr and Shamir's result has been improved for buffer size greater than 1. Introducing concentration techniques, the running time could be improved to 2:5n + o(... |

13 |
Interconnect Length in Multicomputers
- Locality
- 1988
(Show Context)
Citation Context ...the focus of research on parallel computation for many years. Among others, one of the reasons for their popularity lies in their scalability, an important property that many other architectures lack =-=[1, 2, 24, 30]-=-. Routing and sorting are important algorithmic problems studied for mesh architectures because they are the building blocks for many algorithms. For conventional grids of processors with four-neighbo... |

12 | Randomized multi-packet routing and sorting on meshes
- Kaufmann, Sibeyn
- 1996
(Show Context)
Citation Context ...stic h--h sorting and routing, provided that hs12. This gives an acceleration factor of 3 and also matches the bisection bound. For wrap-around meshes (or tori) without diagonals, Kaufmann and Sibeyn =-=[6]-=- presented a randomized h--h sorting algorithm with hn=4 + o(n) steps for hs8. There exist equally fast deterministic algorithms [5, 10]. Both algorithms match asymptotically the respective bisection ... |

12 |
k−k Routing, k−k Sorting, and Cut Through Routing on the Mesh
- Rajasekaran
- 1991
(Show Context)
Citation Context ...1 and compare them with the so far known best results on grids without diagonals. In the table we omit all sublinear terms because they are of no importance for the asymptotic complexity. Rajasekaran =-=[21]-=- and also Kaufmann and Sibeyn [6] invented randomized Table 2: Comparison of selected results for tori with and without diagonals. We omit sublinear terms. All algorithms are deterministic. Except for... |

10 |
The Distance Bound for Sorting on Mesh Connected Processor Arrays is Tight
- Ma, Sen, et al.
- 1986
(Show Context)
Citation Context ...een studied for more than twenty years. Several 1--1 sorting algorithms exist for buffer size 1, i.e., each processor can store only one packet at each time. The fastest algorithms need 3n+o(n) steps =-=[19, 23, 28]-=-. For buffer size 2, the 1--1 sorting problem can be solved deterministically in 2:5n + o(n) transport steps [9]. Kaklamanis and Krizanc [3] presented a randomized algorithm (with constant buffer size... |

8 | Faster Sorting and Routing on Grids with Diagonals
- Kunde, Niedermeier, et al.
- 1994
(Show Context)
Citation Context ...dditional diagonal links, but the bisection bound for such an architecture is hn=4. By way of contrast, we beat this bound and obtain hn=6 algorithms with diagonal links. Our work and its predecessor =-=[12]-=- have inspired related work [7, 26]. We use a sorting method that is mainly based on all-to-all mappings [10]. This method was the breakthrough to deterministic algorithms that match the bisection bou... |

8 | Overview of mesh results
- Sibeyn
- 1995
(Show Context)
Citation Context ...n the grid [12]. The reason for this is the symmetry of the torus (no center, no borders). This symmetry leads to simple algorithms. A torus without diagonals can be embedded into a grid with delay 2 =-=[25]-=-. In this section, we show that there is also an embedding for tori with diagonals into grids. Again the delay is 2. The rough idea is to fold the torus two times, bringing together 4 processors each ... |

7 |
Gossiping on Grids and Tori: Sorting and Routing Match the Bisection Bound Deterministically
- Kunde, Block
- 1993
(Show Context)
Citation Context ...m for sorting was given [9]. Later an optimal randomized hn=2 + o(n) steps algorithm matching the bisection bound was discovered [4]. Recently, the first optimal deterministic algorithm was presented =-=[10]-=-. Later, by derandomizing the optimal randomized algorithm, Kaufmann, Sibeyn, and Suel obtained the same algorithm in a different way [5]. For meshes with diagonals, first a result better than the bis... |

7 | Optimal Average Case Sorting on Arrays
- Kunde, Niedermeier, et al.
- 1995
(Show Context)
Citation Context ... using completely new techniques. For deterministic average case sorting for grids and tori with and without diagonals, one can obtain results that, in general, are twice as fast as in the worst case =-=[11]-=-. 8 Conclusion Doubling the capacity of each individual communication link in a mesh obviously leads to twice as fast algorithms. By adding diagonal connections, we not only doubled the overall capaci... |

5 |
k--k) Routing on multidimensional meshconnected arrays
- Kunde, Tensi
- 1991
(Show Context)
Citation Context ... Loads Using Concentration Techniques Concentrating data into a smaller area of a grid turns a 1--1 problem into an h--h problem. Since h--h problems were not studied intensively until quite recently =-=[13]-=- (though already Valiant and Brebner [29] considered them as early as in 1981 and others maybe even earlier), data concentration was introduced a short time ago [9]. The first use of concentration was... |

5 |
Physical parallel devices are not much faster than sequential ones
- Schorr
- 1983
(Show Context)
Citation Context |

2 | Routing on triangles, tori and honeycombs
- Sibeyn
(Show Context)
Citation Context ...d the buffer size is 9 [7]. Thus we have optimal h--h sorting algorithms for tori for all h. Recently, Sibeyn independently discovered an optimal sorting algorithm for tori with diagonals for large h =-=[26]-=-. We also show that the requirement of no data replication is necessary for the lower bound of Krizanc and Narayanan: We show that copying of packets enables sorting in 2n=3 + o(n) time while still us... |

1 |
Sorting and selection on arrays with diagonal connections
- Krizanc, Narayanan
- 1994
(Show Context)
Citation Context ...r bound of Krizanc and Narayanan. They showed that even the 1--1 sorting problem takes at least n \Gamma o(n) steps on a torus with diagonals if data packets cannot be copied and the buffer size is 9 =-=[7]-=-. Thus we have optimal h--h sorting algorithms for tori for all h. Recently, Sibeyn independently discovered an optimal sorting algorithm for tori with diagonals for large h [26]. We also show that th... |

1 | A 2n \Gamma 2 step algorithm for routing in an n\Thetan array with constant size queues - Leighton, Makedon, et al. - 1989 |