Results 1  10
of
10
Optimal Doubly Logarithmic Parallel Algorithms Based On Finding All Nearest Smaller Values
, 1993
"... The all nearest smaller values problem is defined as follows. Let A = (a 1 ; a 2 ; : : : ; an ) be n elements drawn from a totally ordered domain. For each a i , 1 i n, find the two nearest elements in A that are smaller than a i (if such exist): the left nearest smaller element a j (with j ! i) a ..."
Abstract

Cited by 37 (7 self)
 Add to MetaCart
The all nearest smaller values problem is defined as follows. Let A = (a 1 ; a 2 ; : : : ; an ) be n elements drawn from a totally ordered domain. For each a i , 1 i n, find the two nearest elements in A that are smaller than a i (if such exist): the left nearest smaller element a j (with j ! i) and the right nearest smaller element a k (with k ? i). We give an O(log log n) time optimal parallel algorithm for the problem on a CRCW PRAM. We apply this algorithm to achieve optimal O(log log n) time parallel algorithms for four problems: (i) Triangulating a monotone polygon, (ii) Preprocessing for answering range minimum queries in constant time, (iii) Reconstructing a binary tree from its inorder and either preorder or postorder numberings, (vi) Matching a legal sequence of parentheses. We also show that any optimal CRCW PRAM algorithm for the triangulation problem requires \Omega\Gammauir log n) time. Dept. of Computing, King's College London, The Strand, London WC2R 2LS, England. ...
Parallel Priority Queues
, 1991
"... This paper introduces the Parallel Priority Queue (PPQ) abstract data type. A PPQ stores a set of integervalued items and provides operations such as insertion of n new items or deletion of the n smallest ones. Algorithms for realizing PPQ operations on an nprocessor CREWPRAM are based on two new ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
This paper introduces the Parallel Priority Queue (PPQ) abstract data type. A PPQ stores a set of integervalued items and provides operations such as insertion of n new items or deletion of the n smallest ones. Algorithms for realizing PPQ operations on an nprocessor CREWPRAM are based on two new data structures, the nBandwidthHeap (nH) and the nBandwidth LeftistHeap (nL), that are obtained as extensions of the well known sequential binaryheap and leftistheap, respectively. Using these structures, it is shown that insertion of n new items in a PPQ of m elements can be performed in parallel time O(h + log n), where h = log m n , while deletion of the n smallest items can be performed in time O(h + log log n). Keywords Data structures, parallel algorithms, analysis of algorithms, heaps, PRAM model. This work has been partly supported by the Ministero della Pubblica Istruzione of Italy and by the C.N.R. project "Sistemi Informatici e Calcolo Parallelo" y Istituto di Ela...
Structural Parallel Algorithmics
, 1991
"... The first half of the paper is a general introduction which emphasizes the central role that the PRAM model of parallel computation plays in algorithmic studies for parallel computers. Some of the collective knowledgebase on nonnumerical parallel algorithms can be characterized in a structural way ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
The first half of the paper is a general introduction which emphasizes the central role that the PRAM model of parallel computation plays in algorithmic studies for parallel computers. Some of the collective knowledgebase on nonnumerical parallel algorithms can be characterized in a structural way. Each structure relates a few problems and technique to one another from the basic to the more involved. The second half of the paper provides a bird'seye view of such structures for: (1) list, tree and graph parallel algorithms; (2) very fast deterministic parallel algorithms; and (3) very fast randomized parallel algorithms. 1 Introduction Parallelism is a concern that is missing from "traditional" algorithmic design. Unfortunately, it turns out that most efficient serial algorithms become rather inefficient parallel algorithms. The experience is that the design of parallel algorithms requires new paradigms and techniques, offering an exciting intellectual challenge. We note that it had...
Thinking in parallel: Some basic dataparallel algorithms and techniques
 In use as class notes since
, 1993
"... Copyright 19922009, Uzi Vishkin. These class notes reflect the theorertical part in the Parallel ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Copyright 19922009, Uzi Vishkin. These class notes reflect the theorertical part in the Parallel
Communication Efficient BSP Algorithm for All Nearest Smaller Values Problem
, 2001
"... We present a BSP (Bulk Synchronous Parallel) algorithm for solving the All Nearest Smaller Values Problem (ANSVP), a fundamental problem in both graph theory and computational geometry. Our algorithm achieves optimal sequential computation time and uses only three communication supersteps. In the wo ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We present a BSP (Bulk Synchronous Parallel) algorithm for solving the All Nearest Smaller Values Problem (ANSVP), a fundamental problem in both graph theory and computational geometry. Our algorithm achieves optimal sequential computation time and uses only three communication supersteps. In the worst case, each communication phase takes no more than an (n/p + p)relation, where p is the number of the processors. In addition, our averagecase analysis shows that, on random inputs, the expected communication requirements for all three steps are bounded above by a prelation, which is independent of the problem size n. Experiments have been carried out on an SGI Origin 2000 with 32 R10000 processors and a SUN Enterprise 4000 multiprocessing server supporting 8 UltraSPARC processors, using the MPI libraries. The results clearly demonstrate the communication eciency and load balancing for computation.
Fast integer merging on the EREW PRAM
 In Proc. 19th Intl. Coll. on Automata, Languages, and Programming
, 1992
"... Abstract. We investigate the complexity of merging sequences of small integers on the EREW PRAM. Our most surprising result is that two sorted sequences of n bits each can be merged in O(log log n) time. More generally, we describe an algorithm to merge two sorted sequences of n integers drawn from ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. We investigate the complexity of merging sequences of small integers on the EREW PRAM. Our most surprising result is that two sorted sequences of n bits each can be merged in O(log log n) time. More generally, we describe an algorithm to merge two sorted sequences of n integers drawn from the set {0,..., m − 1} in O(log log n + log m) time using an optimal number of processors. No sublogarithmic merging algorithm for this model of computation was previously known. The algorithm not only produces the merged sequence, but also computes the rank of each input element in the merged sequence. On the other hand, we show a lower bound of Ω(log min{n, m}) on the time needed to merge two sorted sequences of length n each with elements in the set {0,..., m − 1}, implying that our merging algorithm is as fast as possible for m = (log n) Ω(1). If we impose an additional stability condition requiring the ranks of each input sequence to form an increasing sequence, then the time complexity of the problem becomes Θ(log n), even for m = 2. Stable merging is thus harder than nonstable merging. 1
CommunicationEfficient Bulk Synchronous Parallel Algorithms
, 2001
"... Communication has been pointed out to be the major bottleneck for the performance of parallel algorithms. Theoretical parallel models such as PRAM have long been questioned due to the fact that the theoretical algorithmic efficiency does not provide a satisfactory performance prediction when algorit ..."
Abstract
 Add to MetaCart
Communication has been pointed out to be the major bottleneck for the performance of parallel algorithms. Theoretical parallel models such as PRAM have long been questioned due to the fact that the theoretical algorithmic efficiency does not provide a satisfactory performance prediction when algorithms are implemented on commercially available parallel machines. This is mainly because these models do not provide a reasonable scheme for measuring the communication overhead. Recently several practical parallel models aiming at achieving portability and scalability of parallel algorithms have been widely discussed. Among them, the Bulk Synchronous Parallel (BSP) model has received much attention as a bridging model for parallel computation, as it generally better addresses practical concerns like communication and synchronization. The BSP model has been used in a number of application areas, primarily in scientific computing. Yet, very little work has been done on problems generally considered to be irregularly structured, which usually result in highly datadependent communication patterns and make it difficult to achieve communication efficiency. Typical examples are fundamental problems in graph theory and computational geometry, which are important as a vast number of interesting problems in many fields are defined in terms of v them. Thus practical and communicationefficient parallel algorithms for solving these problems are important. In this dissertation, we present scalable parallel algorithms for some fundamental problems in graph theory and computational geometry. In addition to the time complexity analysis, we also present some techniques for worstcase and averagecase communication complexity analyses. Experimental studies have been performed on two differ...
AverageCase CommunicationOptimal Parallel Parenthesis Matching
"... We provide the rst nontrivial lower bound, p , where p is the number of the processors and n is the data size, on the averagecase communication volume, , required to solve the parenthesis matching problem and present a parallel algorithm that takes linear (optimal) computation time and o ..."
Abstract
 Add to MetaCart
We provide the rst nontrivial lower bound, p , where p is the number of the processors and n is the data size, on the averagecase communication volume, , required to solve the parenthesis matching problem and present a parallel algorithm that takes linear (optimal) computation time and optimal expected message volume, + p.
ExploItIng ParallelIsm In SImulatIons
"... 1. lntrod uction The simulation of a discrete el·ent system is traditionally regarded as the process of generating an operation pat h that represents the system state as a function of time. This normally entails the use of a global clock and an event list. In the last few years, much effort has been ..."
Abstract
 Add to MetaCart
1. lntrod uction The simulation of a discrete el·ent system is traditionally regarded as the process of generating an operation pat h that represents the system state as a function of time. This normally entails the use of a global clock and an event list. In the last few years, much effort has been devoted to the task of splittin g the simulation process into a number of subprocesses and executing the latter in parallel on different processors [1,2, 3,4,5). For example, when simulating a queueing network, the ide~,might be to allocate each processor to a node, or a group of nodes, and let it handle ' the corresponding events, taking care of possible interactions with other processors. At best, the degree of parallelism obtained by such an approach will be equal to the number of nodes, and in general may be much smaller [4, 5J. We propose new methods that do not limit the degree of parallelism in this way. The concepts of "time " and "event " are no longer present explicitly, and the necessity for the event list disappears. In section 3, we consider the problem of simulating a long run of a first in, first out (FIFO) G/G/I queue [6J, using P processors. A simple algorithm is presented for computing the arrival and departures times of the first 11 ' jobs in time proportional to 11' / P,