Results 1  10
of
47
Deterministic Sorting in Nearly Logarithmic Time on the Hypercube and Related Computers
 Journal of Computer and System Sciences
, 1996
"... This paper presents a deterministic sorting algorithm, called Sharesort, that sorts n records on an nprocessor hypercube, shuffleexchange, or cubeconnected cycles in O(log n (log log n) 2 ) time in the worst case. The algorithm requires only a constant amount of storage at each processor. Th ..."
Abstract

Cited by 67 (10 self)
 Add to MetaCart
This paper presents a deterministic sorting algorithm, called Sharesort, that sorts n records on an nprocessor hypercube, shuffleexchange, or cubeconnected cycles in O(log n (log log n) 2 ) time in the worst case. The algorithm requires only a constant amount of storage at each processor. The fastest previous deterministic algorithm for this problem was Batcher's bitonic sort, which runs in O(log 2 n) time. Supported by an NSERC postdoctoral fellowship, and DARPA contracts N0001487K825 and N00014 89J1988. 1 Introduction Given n records distributed uniformly over the n processors of some fixed interconnection network, the sorting problem is to route the record with the ith largest associated key to processor i, 0 i ! n. One of the earliest parallel sorting algorithms is Batcher's bitonic sort [3], which runs in O(log 2 n) time on the hypercube [10], shuffleexchange [17], and cubeconnected cycles [14]. More recently, Leighton [9] exhibited a boundeddegree,...
The Complexity of Computation on the Parallel Random Access Machine
, 1993
"... PRAMs also approximate the situation where communication to and from shared memory is much more expensive than local operations, for example, where each processor is located on a separate chip and access to shared memory is through a combining network. Not surprisingly, abstract PRAMs can be much m ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
PRAMs also approximate the situation where communication to and from shared memory is much more expensive than local operations, for example, where each processor is located on a separate chip and access to shared memory is through a combining network. Not surprisingly, abstract PRAMs can be much more powerful than restricted instruction set PRAMs. THEOREM 21.16 Any function of n variables can be computed by an abstract EROW PRAM in O(log n) steps using n= log 2 n processors and n=2 log 2 n shared memory cells. PROOF Each processor begins by reading log 2 n input values and combining them into one large value. The information known by processors are combined in a binarytreelike fashion. In each round, the remaining processors are grouped into pairs. In each pair, one processor communicates the information it knows about the input to the other processor and then leaves the computation. After dlog 2 ne rounds, one processor knows all n input values. Then this processor computes th...
Basic Operations on the OTISMesh Optoelectronic Computer
 IEEE Transactions on Parallel and Distributed Systems
, 1999
"... In this paper we develop algorithms for some basic operations  broadcast, window broadcast, prefix sum, data sum, rank, shift, data accumulation, consecutive sum, adjacent sum, concentrate, distribute, generalize, sorting, random access read and write  on the OTISMesh [1] model. These operations ..."
Abstract

Cited by 31 (5 self)
 Add to MetaCart
In this paper we develop algorithms for some basic operations  broadcast, window broadcast, prefix sum, data sum, rank, shift, data accumulation, consecutive sum, adjacent sum, concentrate, distribute, generalize, sorting, random access read and write  on the OTISMesh [1] model. These operations are useful in the development of efficient algorithms for numerous applications [2].
Parallel Algorithmic Techniques for Combinatorial Computation
 Ann. Rev. Comput. Sci
, 1988
"... this paper and supplied many helpful comments. This research was supported in part by NSF grants DCR8511713, CCR8605353, and CCR8814977, and by DARPA contract N0003984C0165. ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
this paper and supplied many helpful comments. This research was supported in part by NSF grants DCR8511713, CCR8605353, and CCR8814977, and by DARPA contract N0003984C0165.
Image Processing On The OTISMesh Optoelectronic Computer
 IEEE Transactions on Parallel and Distributed Systems
, 2000
"... We develop algorithms for histogramming, histogram modification, Hough transform, and image shrinking and expanding on an OTISMesh optoelectronic computer. Our algorithm for the Hough transform is based upon a mesh algorithm for the Hough transform which is also developed in this paper. This new me ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
We develop algorithms for histogramming, histogram modification, Hough transform, and image shrinking and expanding on an OTISMesh optoelectronic computer. Our algorithm for the Hough transform is based upon a mesh algorithm for the Hough transform which is also developed in this paper. This new mesh algorithm improves upon the mesh Hough transform algorithms of [4] and [14].
Optimal Routing of Parentheses on the Hypercube
 IN PROCEEDINGS OF THE SYMPOSIUM ON PARALLEL ARCHITECTURES AND ALGORITHMS
, 1994
"... We consider a new class of routing requests or partial permutations for which we give optimal online routing algorithms on the hypercube and shuffleexchange network. For wellformed words of parentheses our algorithm establishes communication between all matching pairs in logarithmic time. It can ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
We consider a new class of routing requests or partial permutations for which we give optimal online routing algorithms on the hypercube and shuffleexchange network. For wellformed words of parentheses our algorithm establishes communication between all matching pairs in logarithmic time. It can be applied to the membership problem for Dyck languages and a number of problems for algebraic expressions.
Scalable Data Parallel Implementations of Object Recognition using Geometric Hashing
, 1994
"... Object recognition involves identifying known objects in a given scene. It plays a key role in image understanding. Geometric hashing has been proposed as a technique for modelbased object recognition in occluded scenes. However, parallel techniques are needed to realize real time vision systems em ..."
Abstract

Cited by 11 (7 self)
 Add to MetaCart
Object recognition involves identifying known objects in a given scene. It plays a key role in image understanding. Geometric hashing has been proposed as a technique for modelbased object recognition in occluded scenes. However, parallel techniques are needed to realize real time vision systems employing geometric hashing. In this paper, we present scalable parallel algorithms for object recognition using geometric hashing. We define a realistic abstract model of CM5 in which explicit cost is associated with data routing and synchronization. We develop a loadbalancing technique that results in scalable processortime optimal algorithms for performing a probe on this model. Given a model of CM5 with P PNs and a set S of feature points in a scene, a probe of the recognition phase can be performed in O( jV (S)j P ) time, where V (S) is the set of votes cast by feature points in S. This algorithm is scalable in the range 1 P jV (S)j 1 3 . On a mesh processor array of size p P...
Scattering and Gathering Messages in Networks of Processors
, 1993
"... The operations of scattering and gathering in a network of processors involve one processor of the network  call it P 0  communicating with all other processors. In scattering, P 0 sends (possibly) distinct messages to all other processors; in gathering, the other processors send (possibly) di ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
The operations of scattering and gathering in a network of processors involve one processor of the network  call it P 0  communicating with all other processors. In scattering, P 0 sends (possibly) distinct messages to all other processors; in gathering, the other processors send (possibly) distinct messages to P 0 . We consider networks that are trees of processors; we present algorithms for scattering messages from and gathering messages to the processor that resides at the root of the tree. The algorithms are: ffl quite general, in that the messages transmitted can differ arbitrarily in length; ffl quite strong, in that they send messages along noncolliding paths, hence do not require any buffering or queuing mechanisms in the processors; ffl quite efficient: the algorithms for scattering in general trees are optimal, the algorithm for gathering in a path is optimal, and the algorithms for gathering in general trees are nearly optimal. Our algorithms can easily be converte...
Interconnection networks using shuffles
 IEEE Computers
, 1981
"... techniques allow severalprocessors within a multiprocessing ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
techniques allow severalprocessors within a multiprocessing
Optimal SelfRouting of LinearComplement Permutations in Hypercubes
 In Proceedings of the 5th Distributed Memory Conference
, 1990
"... In this paper we describe an algorithm to route the class of linearcomplement permutations on Hypercube SIMD computers. The class of linearcomplement permutations are extremely useful in devising storage schemes for parallel array access. The proposed algorithm is selfrouting and minimal, that is, ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
In this paper we describe an algorithm to route the class of linearcomplement permutations on Hypercube SIMD computers. The class of linearcomplement permutations are extremely useful in devising storage schemes for parallel array access. The proposed algorithm is selfrouting and minimal, that is, the path established by the algorithm between each pair of source and destination processors is via a minimal path using only the destination processor address. Furthermore, the algorithm requires only the optimal number of routing steps to realize any linearcomplement permutation. The best known previous routing algorithms for the Hypercubes are for the class of bitpermutecomplement permutations, a subset of the class of linearcomplement permutations. Those algorithms are either nonoptimal or not selfrouting. The algorithm presented is selfrouting, optimal, and it routes a larger class of permutations. Also, this algorithm can route the class of linearcomplement permutations in mult...