Results 1  10
of
31
Provably efficient scheduling for languages with finegrained parallelism
 IN PROC. SYMPOSIUM ON PARALLEL ALGORITHMS AND ARCHITECTURES
, 1995
"... Many highlevel parallel programming languages allow for finegrained parallelism. As in the popular worktime framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A ..."
Abstract

Cited by 85 (24 self)
 Add to MetaCart
(Show Context)
Many highlevel parallel programming languages allow for finegrained parallelism. As in the popular worktime framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A common concern in executing such programs is to schedule tasks to processors dynamically so as to minimize not only the execution time, but also the amount of space (memory) needed. Without careful scheduling, the parallel execution on p processors can use a factor of p or larger more space than a sequential implementation of the same program. This paper first identifies a class of parallel schedules that are provably efficient in both time and space. For any
SpaceEfficient Scheduling of Parallelism with Synchronization Variables
"... Recent work on scheduling algorithms has resulted in provable bounds on the space taken by parallel computations in relation to the space taken by sequential computations. The results for online versions of these algorithms, however, have been limited to computations in which threads can only synchr ..."
Abstract

Cited by 27 (9 self)
 Add to MetaCart
(Show Context)
Recent work on scheduling algorithms has resulted in provable bounds on the space taken by parallel computations in relation to the space taken by sequential computations. The results for online versions of these algorithms, however, have been limited to computations in which threads can only synchronize with ancestor or sibling threads. Such computations do not include languages with futures or userspecified synchronization constraints. Here we extend the results to languages with synchronization variables. Such languages include languages with futures, such as Multilisp and Cool, as well as other languages such asid. The main result is an online scheduling algorithm which, given a computation with w work (total operations), synchronizations, d depth (critical path) and s1 sequential space, will run in O(w=p + log(pd)=p + d log(pd)) time and s1 + O(pd log(pd)) space, on a pprocessor crcw pram with a fetchandadd primitive. This includes all time and space costs for both the computation and the scheduler. The scheduler is nonpreemptive in the sense that it will only move a thread if the thread suspends on a synchronization, forks a new thread, or exceeds a threshold when allocating space. For the special case where the computation is a planar graph with lefttoright synchronization edges, the scheduling algorithm can be implemented in O(w=p+d log p) time and s1 + O(pd log p) space. These are the first nontrivial space bounds described for such languages.
On Parallel Hashing and Integer Sorting
, 1991
"... The problem of sorting n integers from a restricted range [1::m], where m is superpolynomial in n, is considered. An o(n log n) randomized algorithm is given. Our algorithm takes O(n log log m) expected time and O(n) space. (Thus, for m = n polylog(n) we have an O(n log log n) algorithm.) The al ..."
Abstract

Cited by 25 (8 self)
 Add to MetaCart
The problem of sorting n integers from a restricted range [1::m], where m is superpolynomial in n, is considered. An o(n log n) randomized algorithm is given. Our algorithm takes O(n log log m) expected time and O(n) space. (Thus, for m = n polylog(n) we have an O(n log log n) algorithm.) The algorithm is parallelizable. The resulting parallel algorithm achieves optimal speed up. Some features of the algorithm make us believe that it is relevant for practical applications. A result of independent interest is a parallel hashing technique. The expected construction time is logarithmic using an optimal number of processors, and searching for a value takes O(1) time in the worst case. This technique enables drastic reduction of space requirements for the price of using randomness. Applicability of the technique is demonstrated for the parallel sorting algorithm, and for some parallel string matching algorithms. The parallel sorting algorithm is designed for a strong and non standard mo...
PeertoPeer Systems for Prefix Search
 In Proceedings of the Symposium on Principles of Distributed Computing
, 2003
"... This paper presents a general methodology for building messagepassing peertopeer systems capable of performing prefix search for arbitrary userdefined names. Our methodology allows to achieve even load distribution, high faulttolerance, and lowcongestion concurrent query execution. This is th ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
This paper presents a general methodology for building messagepassing peertopeer systems capable of performing prefix search for arbitrary userdefined names. Our methodology allows to achieve even load distribution, high faulttolerance, and lowcongestion concurrent query execution. This is the first known peertopeer system for prefix search with such properties. The essence of this methodology is a plug and play paradigm for designing a peertopeer system as a modular composition of arbitrary concurrent data structures.
Fully Dynamic Search Trees for an Extension of the BSP Model
 In Proc. 8th ACM Symp. on Parallel Algorithms and Architectures
, 1996
"... We present parallel algorithms that maintain a 23 tree under insertions and deletions. The algorithms are designed for an extension of Valiant's BSP model, BSP*, that rewards blockwise communication, i.e. better use of bandwidth of routers and reduction of the overhead involved in communicatio ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
(Show Context)
We present parallel algorithms that maintain a 23 tree under insertions and deletions. The algorithms are designed for an extension of Valiant's BSP model, BSP*, that rewards blockwise communication, i.e. better use of bandwidth of routers and reduction of the overhead involved in communication. The BSP*model is introduced by Baumker et al. in [2]. Our analysis of the data structure goes beyond standard asymptotic analysis: We use Valiant's notion of coptimality. Intuitively coptimal algorithms tend to speedup p=c with growing input size (p denotes the number of processors), where the communication time is asymptotically smaller than the computation time. Our first approach allows 1optimal searching and amortized coptimal insertion and deletion for a small constant c. The second one allows 2optimal searching, and coptimal deletion and insertion for a small constant c. Both results hold with probability 1 \Gamma o(1) for wide ranges of BSP* parameters, where the ranges beco...
Pipelining with Futures
, 1997
"... Pipelining has been used in the design of many PRAM algorithms to reduce their asymptotic running time. Paul, Vishkin, and Wagener (PVW) used the approach in a parallel implementation of 23 trees. The approach was later used by Cole in the first O(lg n) time sorting algorithm on the PRAM not based ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
(Show Context)
Pipelining has been used in the design of many PRAM algorithms to reduce their asymptotic running time. Paul, Vishkin, and Wagener (PVW) used the approach in a parallel implementation of 23 trees. The approach was later used by Cole in the first O(lg n) time sorting algorithm on the PRAM not based on the AKS sorting network, and has since been used to improve the time of several other algorithms. Although the approach has improved the asymptotic time of many algorithms, there are two practical problems: maintaining the pipeline is quite complicated for the programmer, and the pipelining forces highly synchronous code execution. Synchronous execution is less practical on asynchronous machines and makes it difficult to modify a schedule to use less memory or to take better advantage of locality.
Multisearch Techniques: Parallel Data Structures on MeshConnected Computers
 Journal of Parallel and Distributed Computing
, 1994
"... The {\em multisearch problem} is defined as follows. Given a data structure $D$ modeled as a graph with $n$ constantdegree nodes, perform $O(n)$ searches on $D$. Let $r$ be the length of the longest search path associated with a search process, and assume that the paths are determined ``online&apo ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
The {\em multisearch problem} is defined as follows. Given a data structure $D$ modeled as a graph with $n$ constantdegree nodes, perform $O(n)$ searches on $D$. Let $r$ be the length of the longest search path associated with a search process, and assume that the paths are determined ``online''. That is, the search paths may overlap arbitrarily. In this paper, we solve the multisearch problem for certain classes of graphs in $O(\sqrt{n} + {r} \frac{\sqrt{n}}{\log n})$ time on a $\sqrt{n} \times \sqrt{n}$ meshconnected computer. For many data structures, the search path traversed when answering one search query has length $r=O(\log n)$. For these cases, our algorithm processes $O(n)$ such queries in asymptotically optimal $\Theta(\sqrt{n})$ time. The classes of graphs we consider contain many of the important data structures that arise in practice, ranging from simple trees to Kirkpatrick hierarchical search DAGs. Multisearch is a useful abstraction that can be used to implement parallel versions of standard sequential data structures on a mesh. As example applications, we consider a variety of parallel online tree traversals, as well as hierarchical representations of polyhedra and its myriad of applications (linespolyhedron intersection queries, multiple tangent plane determination, intersecting convex polyhedra, and threedimensional convex hull).
Thinking in parallel: Some basic dataparallel algorithms and techniques
 In use as class notes since
, 1993
"... Copyright 19922009, Uzi Vishkin. These class notes reflect the theorertical part in the Parallel ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
Copyright 19922009, Uzi Vishkin. These class notes reflect the theorertical part in the Parallel
Blocking in Parallel Multisearch Problems (Extended Abstract)
, 1998
"... ) Wolfgang Dittrich Bosch Telecom GmbH, UCON/ERS Gerberstrae 33 71522 Backnang, Germany Wolfgang.Dittrich@bk.bosch.de David Hutchinson y School of Computer Science Carleton University Ottawa, Canada K1S 5B6 hutchins@scs.carleton.ca Anil Maheshwari z School of Computer Science Carleto ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
) Wolfgang Dittrich Bosch Telecom GmbH, UCON/ERS Gerberstrae 33 71522 Backnang, Germany Wolfgang.Dittrich@bk.bosch.de David Hutchinson y School of Computer Science Carleton University Ottawa, Canada K1S 5B6 hutchins@scs.carleton.ca Anil Maheshwari z School of Computer Science Carleton University Ottawa, Canada K1S 5B6 maheshwa@scs.carleton.ca Abstract External memory (EM) algorithms are designed for computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Blockwise access to data is a central theme in the design of efficient EM algorithms. A similar requirement arises in the transmission of data between processors in certain parallel computation models such as BSP* and CGM, for which blockwise communication is a crucial issue. We consider multisearch problems, where a large number of queries are to be simultaneously processed and satisfied by navigating through large data structures on parallel ...
Parallel Computational Geometry: An approach using randomization
 IN HANDBOOK OF COMPUTATIONAL GCOMETRY, EDITED BY J.R. SACK AND
, 1998
"... We describe very general methods for designing efficient parallel algorithms for problems in computational geometry. Although our main focus is the PRAM, we provide strong evidence that these techniques yield equally efficient algorithms in more concrete computing models like Butterfly networks. ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We describe very general methods for designing efficient parallel algorithms for problems in computational geometry. Although our main focus is the PRAM, we provide strong evidence that these techniques yield equally efficient algorithms in more concrete computing models like Butterfly networks. The algorithms exploit random sampling and randomized techniques that result in very general strategies for solving a wide class of fundamental problems from computational geometry like convex hulls, voronoi diagrams, triangulation, pointlocation and arrangements. Our description emphasizes the algorithmic techniques rather than a detailed treatment of the individual problems.