Results 1  10
of
27
Provably efficient scheduling for languages with finegrained parallelism
 IN PROC. SYMPOSIUM ON PARALLEL ALGORITHMS AND ARCHITECTURES
, 1995
"... Many highlevel parallel programming languages allow for finegrained parallelism. As in the popular worktime framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A ..."
Abstract

Cited by 82 (25 self)
 Add to MetaCart
Many highlevel parallel programming languages allow for finegrained parallelism. As in the popular worktime framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A common concern in executing such programs is to schedule tasks to processors dynamically so as to minimize not only the execution time, but also the amount of space (memory) needed. Without careful scheduling, the parallel execution on p processors can use a factor of p or larger more space than a sequential implementation of the same program. This paper first identifies a class of parallel schedules that are provably efficient in both time and space. For any
SpaceEfficient Scheduling of Parallelism with Synchronization Variables
"... Recent work on scheduling algorithms has resulted in provable bounds on the space taken by parallel computations in relation to the space taken by sequential computations. The results for online versions of these algorithms, however, have been limited to computations in which threads can only synchr ..."
Abstract

Cited by 28 (10 self)
 Add to MetaCart
Recent work on scheduling algorithms has resulted in provable bounds on the space taken by parallel computations in relation to the space taken by sequential computations. The results for online versions of these algorithms, however, have been limited to computations in which threads can only synchronize with ancestor or sibling threads. Such computations do not include languages with futures or userspecified synchronization constraints. Here we extend the results to languages with synchronization variables. Such languages include languages with futures, such as Multilisp and Cool, as well as other languages such asid. The main result is an online scheduling algorithm which, given a computation with w work (total operations), synchronizations, d depth (critical path) and s1 sequential space, will run in O(w=p + log(pd)=p + d log(pd)) time and s1 + O(pd log(pd)) space, on a pprocessor crcw pram with a fetchandadd primitive. This includes all time and space costs for both the computation and the scheduler. The scheduler is nonpreemptive in the sense that it will only move a thread if the thread suspends on a synchronization, forks a new thread, or exceeds a threshold when allocating space. For the special case where the computation is a planar graph with lefttoright synchronization edges, the scheduling algorithm can be implemented in O(w=p+d log p) time and s1 + O(pd log p) space. These are the first nontrivial space bounds described for such languages.
On Parallel Hashing and Integer Sorting
, 1991
"... The problem of sorting n integers from a restricted range [1::m], where m is superpolynomial in n, is considered. An o(n log n) randomized algorithm is given. Our algorithm takes O(n log log m) expected time and O(n) space. (Thus, for m = n polylog(n) we have an O(n log log n) algorithm.) The al ..."
Abstract

Cited by 25 (9 self)
 Add to MetaCart
The problem of sorting n integers from a restricted range [1::m], where m is superpolynomial in n, is considered. An o(n log n) randomized algorithm is given. Our algorithm takes O(n log log m) expected time and O(n) space. (Thus, for m = n polylog(n) we have an O(n log log n) algorithm.) The algorithm is parallelizable. The resulting parallel algorithm achieves optimal speed up. Some features of the algorithm make us believe that it is relevant for practical applications. A result of independent interest is a parallel hashing technique. The expected construction time is logarithmic using an optimal number of processors, and searching for a value takes O(1) time in the worst case. This technique enables drastic reduction of space requirements for the price of using randomness. Applicability of the technique is demonstrated for the parallel sorting algorithm, and for some parallel string matching algorithms. The parallel sorting algorithm is designed for a strong and non standard mo...
Fully Dynamic Search Trees for an Extension of the BSP Model
 In Proc. 8th ACM Symp. on Parallel Algorithms and Architectures
, 1996
"... We present parallel algorithms that maintain a 23 tree under insertions and deletions. The algorithms are designed for an extension of Valiant's BSP model, BSP*, that rewards blockwise communication, i.e. better use of bandwidth of routers and reduction of the overhead involved in communication. Th ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
We present parallel algorithms that maintain a 23 tree under insertions and deletions. The algorithms are designed for an extension of Valiant's BSP model, BSP*, that rewards blockwise communication, i.e. better use of bandwidth of routers and reduction of the overhead involved in communication. The BSP*model is introduced by Baumker et al. in [2]. Our analysis of the data structure goes beyond standard asymptotic analysis: We use Valiant's notion of coptimality. Intuitively coptimal algorithms tend to speedup p=c with growing input size (p denotes the number of processors), where the communication time is asymptotically smaller than the computation time. Our first approach allows 1optimal searching and amortized coptimal insertion and deletion for a small constant c. The second one allows 2optimal searching, and coptimal deletion and insertion for a small constant c. Both results hold with probability 1 \Gamma o(1) for wide ranges of BSP* parameters, where the ranges beco...
PeertoPeer Systems for Prefix Search
 In Proceedings of the Symposium on Principles of Distributed Computing
, 2003
"... This paper presents a general methodology for building messagepassing peertopeer systems capable of performing prefix search for arbitrary userdefined names. Our methodology allows to achieve even load distribution, high faulttolerance, and lowcongestion concurrent query execution. This is th ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
This paper presents a general methodology for building messagepassing peertopeer systems capable of performing prefix search for arbitrary userdefined names. Our methodology allows to achieve even load distribution, high faulttolerance, and lowcongestion concurrent query execution. This is the first known peertopeer system for prefix search with such properties. The essence of this methodology is a plug and play paradigm for designing a peertopeer system as a modular composition of arbitrary concurrent data structures.
Multisearch Techniques: Parallel Data Structures on MeshConnected Computers
 Journal of Parallel and Distributed Computing
, 1994
"... The {\em multisearch problem} is defined as follows. Given a data structure $D$ modeled as a graph with $n$ constantdegree nodes, perform $O(n)$ searches on $D$. Let $r$ be the length of the longest search path associated with a search process, and assume that the paths are determined ``online''. ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
The {\em multisearch problem} is defined as follows. Given a data structure $D$ modeled as a graph with $n$ constantdegree nodes, perform $O(n)$ searches on $D$. Let $r$ be the length of the longest search path associated with a search process, and assume that the paths are determined ``online''. That is, the search paths may overlap arbitrarily. In this paper, we solve the multisearch problem for certain classes of graphs in $O(\sqrt{n} + {r} \frac{\sqrt{n}}{\log n})$ time on a $\sqrt{n} \times \sqrt{n}$ meshconnected computer. For many data structures, the search path traversed when answering one search query has length $r=O(\log n)$. For these cases, our algorithm processes $O(n)$ such queries in asymptotically optimal $\Theta(\sqrt{n})$ time. The classes of graphs we consider contain many of the important data structures that arise in practice, ranging from simple trees to Kirkpatrick hierarchical search DAGs. Multisearch is a useful abstraction that can be used to implement parallel versions of standard sequential data structures on a mesh. As example applications, we consider a variety of parallel online tree traversals, as well as hierarchical representations of polyhedra and its myriad of applications (linespolyhedron intersection queries, multiple tangent plane determination, intersecting convex polyhedra, and threedimensional convex hull).
Thinking in parallel: Some basic dataparallel algorithms and techniques
 In use as class notes since
, 1993
"... Copyright 19922009, Uzi Vishkin. These class notes reflect the theorertical part in the Parallel ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Copyright 19922009, Uzi Vishkin. These class notes reflect the theorertical part in the Parallel
Pipelining with Futures
, 1997
"... Pipelining has been used in the design of many PRAM algorithms to reduce their asymptotic running time. Paul, Vishkin, and Wagener (PVW) used the approach in a parallel implementation of 23 trees. The approach was later used by Cole in the first O(lg n) time sorting algorithm on the PRAM not based ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
Pipelining has been used in the design of many PRAM algorithms to reduce their asymptotic running time. Paul, Vishkin, and Wagener (PVW) used the approach in a parallel implementation of 23 trees. The approach was later used by Cole in the first O(lg n) time sorting algorithm on the PRAM not based on the AKS sorting network, and has since been used to improve the time of several other algorithms. Although the approach has improved the asymptotic time of many algorithms, there are two practical problems: maintaining the pipeline is quite complicated for the programmer, and the pipelining forces highly synchronous code execution. Synchronous execution is less practical on asynchronous machines and makes it difficult to modify a schedule to use less memory or to take better advantage of locality.
Blocking in Parallel Multisearch Problems (Extended Abstract)
, 1998
"... ) Wolfgang Dittrich Bosch Telecom GmbH, UCON/ERS Gerberstrae 33 71522 Backnang, Germany Wolfgang.Dittrich@bk.bosch.de David Hutchinson y School of Computer Science Carleton University Ottawa, Canada K1S 5B6 hutchins@scs.carleton.ca Anil Maheshwari z School of Computer Science Carleto ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
) Wolfgang Dittrich Bosch Telecom GmbH, UCON/ERS Gerberstrae 33 71522 Backnang, Germany Wolfgang.Dittrich@bk.bosch.de David Hutchinson y School of Computer Science Carleton University Ottawa, Canada K1S 5B6 hutchins@scs.carleton.ca Anil Maheshwari z School of Computer Science Carleton University Ottawa, Canada K1S 5B6 maheshwa@scs.carleton.ca Abstract External memory (EM) algorithms are designed for computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Blockwise access to data is a central theme in the design of efficient EM algorithms. A similar requirement arises in the transmission of data between processors in certain parallel computation models such as BSP* and CGM, for which blockwise communication is a crucial issue. We consider multisearch problems, where a large number of queries are to be simultaneously processed and satisfied by navigating through large data structures on parallel ...
Explicit implementation of a parallel dictionary
 SONDERFORSCHUNGSBEREICH 124 VLSI ENTWURFSMETHODEN UND PARALLELITAT, UNIV. SAARBRUCKEN
, 1995
"... This report gives a complete implementation of a parallel dictionary based on 23 trees, using the parallel PRAM programming language fork. The implementations include procedures for creating and manipulating dictionary items by individual processors, searching in dictionaries, and parallel procedur ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
This report gives a complete implementation of a parallel dictionary based on 23 trees, using the parallel PRAM programming language fork. The implementations include procedures for creating and manipulating dictionary items by individual processors, searching in dictionaries, and parallel procedures for inserting and deleting collections of items. Procedures implementing parallel dictionary constructors and destructors are also provided. The implementations use the special fork constructs fork, for forming smaller groups of synchronously executing processors, and farm, designating an asynchronous context where strict statement level synchrony need not be enforced by the compiler. Both constructs add to the expressiveness and efficiency of the language. The parallel dictionary is a classical example of the use of the pipelining technique. Based on the concrete implementations given, a new language construct for implementing pipelined algorithms is proposed. Initial measurements performed using a simulator for the SBPRAM being built at the University of Saarbrucken shows that the parallel incremental insertion and deletion operations are expensive compared to the batch operations of dictionary construction and destruction. An efficient parallel dictionary should therefore use the incremental operations only when the existing dictionary is large compared to the number of items that is to be inserted or deleted. More careful experiments are needed to determine when to switch. The parallel dictionary is part of the PAD library of PRAM algorithms and data structures.