Results 1  10
of
18
Deterministic Sorting in Nearly Logarithmic Time on the Hypercube and Related Computers
 Journal of Computer and System Sciences
, 1996
"... This paper presents a deterministic sorting algorithm, called Sharesort, that sorts n records on an nprocessor hypercube, shuffleexchange, or cubeconnected cycles in O(log n (log log n) 2 ) time in the worst case. The algorithm requires only a constant amount of storage at each processor. Th ..."
Abstract

Cited by 68 (10 self)
 Add to MetaCart
(Show Context)
This paper presents a deterministic sorting algorithm, called Sharesort, that sorts n records on an nprocessor hypercube, shuffleexchange, or cubeconnected cycles in O(log n (log log n) 2 ) time in the worst case. The algorithm requires only a constant amount of storage at each processor. The fastest previous deterministic algorithm for this problem was Batcher's bitonic sort, which runs in O(log 2 n) time. Supported by an NSERC postdoctoral fellowship, and DARPA contracts N0001487K825 and N00014 89J1988. 1 Introduction Given n records distributed uniformly over the n processors of some fixed interconnection network, the sorting problem is to route the record with the ith largest associated key to processor i, 0 i ! n. One of the earliest parallel sorting algorithms is Batcher's bitonic sort [3], which runs in O(log 2 n) time on the hypercube [10], shuffleexchange [17], and cubeconnected cycles [14]. More recently, Leighton [9] exhibited a boundeddegree,...
Workpreserving emulations of fixedconnection networks
 21st ACM Symp. on Theory of Computing
, 1989
"... Abstract. In this paper, we study the problem of emulating T G steps of an N Gnode guest network, G, on an N Hnode host network, H. We call an emulation workpreserving if the time required by the host, T H,isO(T GN G/N H), because then both the guest and host networks perform the same total work ..."
Abstract

Cited by 45 (16 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper, we study the problem of emulating T G steps of an N Gnode guest network, G, on an N Hnode host network, H. We call an emulation workpreserving if the time required by the host, T H,isO(T GN G/N H), because then both the guest and host networks perform the same total work (i.e., processortime product), �(T GN G), to within a constant factor. We say that an emulation occurs in realtime if T H � O(T G), because then the host emulates the guest with constant slowdown. In addition to describing several workpreserving and realtime emulations, we also provide a general model in which lower bounds can be proved. Some of the more interesting and diverse consequences of this work include: (1) a proof that a linear array can emulate a (much larger) butterfly in a workpreserving fashion, but that a butterfly cannot emulate an expander (of any size) in a workpreserving fashion, (2) a proof that a butterfly can emulate a shuffleexchange network in a realtime workpreserving fashion, and vice versa, (3) a proof that a butterfly can emulate a mesh (or an array of higher, but fixed, dimension) in a realtime workpreserving fashion, even though any O(1)to1 embedding of an Nnode mesh in an Nnode butterfly has dilation �(log N), and
Products of Networks With Logarithmic Diameter and Fixed Degree
 IEEE Transactions on Parallel and Distributed Systems
, 1995
"... This paper first presents some general properties of product networks pertinent to parallel architectures and then focuses on three case studies. These are products of complete binary trees, shuffleexchange, and de Bruijn networks. It is shown that all of these are powerful architectures for par ..."
Abstract

Cited by 20 (8 self)
 Add to MetaCart
This paper first presents some general properties of product networks pertinent to parallel architectures and then focuses on three case studies. These are products of complete binary trees, shuffleexchange, and de Bruijn networks. It is shown that all of these are powerful architectures for parallel computation, as evidenced by their ability to efficiently emulate numerous other architectures. In particular, rdimensional grids, and rdimensional meshes of trees can be embedded efficiently in products of these graphs, i.e. either as a subgraph or with small constant dilation and congestion. In addition, the shuffleexchange network can be embedded in rdimensional product of shuffle exchange networks with dilation cost 2r and congestion cost 2. Similarly, the de Bruijn network can be embedded in rdimensional product of de Bruijn networks with dilation cost r and congestion cost 4. Moreover, it is well known that shuffleexchange and de Bruijn graphs can emulate the hypercu...
Areaefficient architectures for the Viterbi algorithm part I: Theory
 IEEE Trans. Communications
, 1993
"... AbstractIn the previous paper, we established the theoretical foundations of a new class of areaefficient architectures for the Viterbi algorithm. In this paper, we will show areaefficient architectures for practical codes to illustrate the design procedures and demonstrate the favorable areati ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
AbstractIn the previous paper, we established the theoretical foundations of a new class of areaefficient architectures for the Viterbi algorithm. In this paper, we will show areaefficient architectures for practical codes to illustrate the design procedures and demonstrate the favorable areatime tradeoff results. Three examples from convolutional codes, matchedspectralnull (MSN) trellis codes, and Ungerboeck codes will be presented. We will also discuss the application of our areaefficient techniques to codes with a very large numbers of states, codes with timevarying trellises, and a programmable Viterbi decoder. I.
Fibonacci cubes—a class of selfsimilar graphs
 Fibonacci Quart
, 1993
"... The Fibonacci cube [6] is a new class of graphs that are inspired by the famous numbers. Because of the rich properties of the Fibonacci numbers [1], the graph also shows interesting properties. For a graph with AT nodes, it is known [6] that the diameter, the edge connectivity, and the node connect ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
The Fibonacci cube [6] is a new class of graphs that are inspired by the famous numbers. Because of the rich properties of the Fibonacci numbers [1], the graph also shows interesting properties. For a graph with AT nodes, it is known [6] that the diameter, the edge connectivity, and the node connectivity of the Fibonacci cube are in the order of 0(log N), which are similar to
A class of scalable optical interconnection networks through discrete broadcastselect multidomain WDM
 Proc. IEEE INFOCOM, 392–399
, 1994
"... ..."
(Show Context)
Improved Compressions of CubeConnected Cycles Networks
"... We present a new technique for the embedding of large cubeconnected cycles networks (CCC) into smaller ones, a problem that arises when algorithms designed for an architecture of an ideal size are to be executed on an existing architecture of a fixed size. Using the new embedding strategy, we show ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We present a new technique for the embedding of large cubeconnected cycles networks (CCC) into smaller ones, a problem that arises when algorithms designed for an architecture of an ideal size are to be executed on an existing architecture of a fixed size. Using the new embedding strategy, we show that the CCC of dimension l can be embedded into the CCC of dimension k with dilation 1 and optimum load for any k; l 2 IN , k 8, such that 5 3 + c k ! l k 2, c k = 4k + 3 3 \Delta 2 2=3k , thus improving known results. Our embedding technique also leads to improved dilation 1 embeddings in the case 3 2 ! l k 5 3 + c k .
Embedding complete binary trees into butterfly networks
 IEEE Trans. Comput
, 1991
"... ..."
(Show Context)
Compressing CubeConnected Cycles and Butterfly Networks
 Proc. 2nd IEEE Symposium on Parallel and Distributed Processing
, 1990
"... We consider the simulation of large cubeconnected cycles (CCC) and large butterfly networks (BFN) on smaller ones, a problem that arises when algorithms designed for an architecture of an ideal size are to be executed on an existing architecture of a fixed size. We show that large CCC's and BF ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
(Show Context)
We consider the simulation of large cubeconnected cycles (CCC) and large butterfly networks (BFN) on smaller ones, a problem that arises when algorithms designed for an architecture of an ideal size are to be executed on an existing architecture of a fixed size. We show that large CCC's and BFN 's can be embedded into smaller networks of the same type with (a) dilation 2 and optimum load, (b) dilation 1 and optimum load in most cases, (c) dilation 1 and nearly optimum load in all cases. Our results show that large CCC's and BFN 's can be simulated very efficiently on smaller ones. Additionally, we implemented our algorithm for compressing CCC's and ran several experiments on a Transputer network, which showed that our technique also behaves very well from a practical point of view. A preliminary version of these results appears in: Proc. 2nd IEEE Symposium on Parallel and Distributed Processing (1990), pp. 858865. y This work was supported by grant Mo 285/41 from the German Re...
Load Balanced Tree Embeddings
, 1991
"... When an nprocessor architecture T is embedded into an mprocessor architecture H with n ? m and every processor of H simulates at least bn=mc and at most dn=me processors of T , the embedding has a balanced processor load. We present efficient embeddings with a balanced load for the case when both ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
When an nprocessor architecture T is embedded into an mprocessor architecture H with n ? m and every processor of H simulates at least bn=mc and at most dn=me processors of T , the embedding has a balanced processor load. We present efficient embeddings with a balanced load for the case when both architectures are complete binary trees. We show that T can be embedded into H with a dilation of 1 and a congestion of at most minfd n m e; 2 log ng. We also consider embeddings that achieve a balanced l/i load; i.e., every processor of H simulates at most d n+1 2m e leaves and at most d n\Gamma1 2m e interior processors of T . We present an embedding that achieves a balanced l/i load, a dilation of 2dlog log me + 1 and a congestion of O(log n): We show that every embedding strategy achieving a balanced l/i load must have a dilation of at least 3. We also consider the embedding problem when every edge of T has a weight associated with it. Keywords Graph embeddings, binary tree ne...