Results 1  10
of
35
Deterministic Sorting in Nearly Logarithmic Time on the Hypercube and Related Computers
 Journal of Computer and System Sciences
, 1996
"... This paper presents a deterministic sorting algorithm, called Sharesort, that sorts n records on an nprocessor hypercube, shuffleexchange, or cubeconnected cycles in O(log n (log log n) 2 ) time in the worst case. The algorithm requires only a constant amount of storage at each processor. Th ..."
Abstract

Cited by 67 (10 self)
 Add to MetaCart
This paper presents a deterministic sorting algorithm, called Sharesort, that sorts n records on an nprocessor hypercube, shuffleexchange, or cubeconnected cycles in O(log n (log log n) 2 ) time in the worst case. The algorithm requires only a constant amount of storage at each processor. The fastest previous deterministic algorithm for this problem was Batcher's bitonic sort, which runs in O(log 2 n) time. Supported by an NSERC postdoctoral fellowship, and DARPA contracts N0001487K825 and N00014 89J1988. 1 Introduction Given n records distributed uniformly over the n processors of some fixed interconnection network, the sorting problem is to route the record with the ith largest associated key to processor i, 0 i ! n. One of the earliest parallel sorting algorithms is Batcher's bitonic sort [3], which runs in O(log 2 n) time on the hypercube [10], shuffleexchange [17], and cubeconnected cycles [14]. More recently, Leighton [9] exhibited a boundeddegree,...
Sorting Selection and Routing on the Array with Reconfigurable Optical Buses
"... In this paper we present efficient algorithms for sorting, selection and packet routing on the AROB (Array with Reconfigurable Optical Buses) model. ..."
Abstract

Cited by 31 (5 self)
 Add to MetaCart
In this paper we present efficient algorithms for sorting, selection and packet routing on the AROB (Array with Reconfigurable Optical Buses) model.
Parametric Binary Dissection
 Institute for Computer
, 1993
"... Binary dissection is widely used to partition nonuniform domains over parallel computers. This algorithm does not consider the perimeter, surface area, or aspect ratio of the regions being generated and can yield decompositions that have poor communication to computation ratio. Parametric Binar ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
Binary dissection is widely used to partition nonuniform domains over parallel computers. This algorithm does not consider the perimeter, surface area, or aspect ratio of the regions being generated and can yield decompositions that have poor communication to computation ratio. Parametric Binary Dissection (PBD) is a new algorithm in which each cut is chosen to minimize load + \Theta(shape). In a 2 (or 3) dimensional problem, load is the amount of computation to be performed in a subregion and shape could refer to the perimeter (respectively surface) of that subregion. Shape is a measure of communication overhead and the parameter permits us to trade off load imbalance against communication overhead. When is zero, the algorithm reduces to plain binary dissection. This algorithm can be used to partition graphs embedded in 2 or 3d. Here load is the number of nodes in a subregion, shape the number of edges that leave that subregion, and the ratio of time to communicate o...
Optimal Parallel Sorting in MultiLevel Storage
 IN PROCEEDINGS OF THE 5TH ANNUAL ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 1994
"... We adapt the Sharesort algorithm of Cypher and Plaxton to run on various parallel models of multilevel storage, and analyze its resulting performance. Sharesort was originally defined in the context of sorting n records on an nprocessor hypercubic network. In that context, it is not known whether ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
We adapt the Sharesort algorithm of Cypher and Plaxton to run on various parallel models of multilevel storage, and analyze its resulting performance. Sharesort was originally defined in the context of sorting n records on an nprocessor hypercubic network. In that context, it is not known whether Sharesort is asymptotically optimal. Nonetheless, we find that Sharesort achieves optimal time bounds for parallel sorting in multilevel storage, under a variety of models that have been defined in the literature.
Optimal Routing of Parentheses on the Hypercube
 IN PROCEEDINGS OF THE SYMPOSIUM ON PARALLEL ARCHITECTURES AND ALGORITHMS
, 1994
"... We consider a new class of routing requests or partial permutations for which we give optimal online routing algorithms on the hypercube and shuffleexchange network. For wellformed words of parentheses our algorithm establishes communication between all matching pairs in logarithmic time. It can ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
We consider a new class of routing requests or partial permutations for which we give optimal online routing algorithms on the hypercube and shuffleexchange network. For wellformed words of parentheses our algorithm establishes communication between all matching pairs in logarithmic time. It can be applied to the membership problem for Dyck languages and a number of problems for algebraic expressions.
Least common ancestor networks
 Proceedings of the 7 th International Parallel Processing Symposium (IPPS
, 1993
"... ..."
Routing on butterfly networks with random faults
 In Proc. of the 36th IEEE Symp. on Foundations of Computer Science (FOCS
, 1995
"... ..."
Optimal SelfRouting of LinearComplement Permutations in Hypercubes
 In Proceedings of the 5th Distributed Memory Conference
, 1990
"... In this paper we describe an algorithm to route the class of linearcomplement permutations on Hypercube SIMD computers. The class of linearcomplement permutations are extremely useful in devising storage schemes for parallel array access. The proposed algorithm is selfrouting and minimal, that is, ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
In this paper we describe an algorithm to route the class of linearcomplement permutations on Hypercube SIMD computers. The class of linearcomplement permutations are extremely useful in devising storage schemes for parallel array access. The proposed algorithm is selfrouting and minimal, that is, the path established by the algorithm between each pair of source and destination processors is via a minimal path using only the destination processor address. Furthermore, the algorithm requires only the optimal number of routing steps to realize any linearcomplement permutation. The best known previous routing algorithms for the Hypercubes are for the class of bitpermutecomplement permutations, a subset of the class of linearcomplement permutations. Those algorithms are either nonoptimal or not selfrouting. The algorithm presented is selfrouting, optimal, and it routes a larger class of permutations. Also, this algorithm can route the class of linearcomplement permutations in mult...
DivideandConquer Algorithms on the Hypercube
 Theoretical Computer Science
, 1993
"... We show how to implement divideandconquer algorithms without undue overhead on a wide class of networks. We give an optimal generic divideandconquer implementation on hypercubes for the class of divideandconquer algorithms for which the total size of the subproblems on any level of the recursi ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
We show how to implement divideandconquer algorithms without undue overhead on a wide class of networks. We give an optimal generic divideandconquer implementation on hypercubes for the class of divideandconquer algorithms for which the total size of the subproblems on any level of the recursion does not exceed the parent problem size. For this implementation, appropriately sized subcubes have to be allocated to the subproblems generated by the dividesteps. We take care that these allocation steps do not cause any unbalanced distribution of work, and that, asymptotically, they do not increase the running time. Variants of our generic algorithm also work for the butterfly network and, by a general simulation, for the class of hypercubic networks, including the shuffleexchange and the cubeconnectedcycles network. Our results can also be applied to optimally solve various types of routing problems. Keywords: Theory of Parallel and Distributed Computation, Algorithms and Data St...
Latin Cubes and Parallel Array Access
, 1994
"... The problem of efficiently storing a ddimensional array into multiple memory modules of a shared memory machine is an important problem in parallel processing. In this paper, we consider the problem for the threedimensional arrays. More specifically, given an array A of size n \Theta n \Theta n a ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
The problem of efficiently storing a ddimensional array into multiple memory modules of a shared memory machine is an important problem in parallel processing. In this paper, we consider the problem for the threedimensional arrays. More specifically, given an array A of size n \Theta n \Theta n and a shared memory machine with n memory modules, we show how to store A so that no two elements within any row, any column, any diagonal of a face of A and main subarrays of A are stored in the same memory module. Our scheme thus achieves no memory conflicts when the processors of the shared memory machine simultaneously access elements within a row, column, subarray, etc. We also show how to store A efficiently, if diagonals of A are also required to be accessed conflictfree in addition to rows and columns. We finally show storage schemes that allow diagonal faces to be accessed conflictfree along with rows and columns. All of our schemes use latin cubes. Our schemes allow fast address...