Results 1 - 10
of
25
Provably efficient scheduling for languages with fine-grained parallelism
- IN PROC. SYMPOSIUM ON PARALLEL ALGORITHMS AND ARCHITECTURES
, 1995
"... Many high-level parallel programming languages allow for fine-grained parallelism. As in the popular work-time framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A ..."
Abstract
-
Cited by 68 (22 self)
- Add to MetaCart
Many high-level parallel programming languages allow for fine-grained parallelism. As in the popular work-time framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A common concern in executing such programs is to schedule tasks to processors dynamically so as to minimize not only the execution time, but also the amount of space (memory) needed. Without careful scheduling, the parallel execution on p processors can use a factor of p or larger more space than a sequential implementation of the same program. This paper first identifies a class of parallel schedules that are provably efficient in both time and space. For any
Parallel Algorithms for Higher-Dimensional Convex Hulls
"... We give fast randomized and deterministic parallel meth-ods for constructing convex hulls in IR d, for any fixed d. Our methods are for the weakest shared-memory model,the EREW PRAM, and have optimal work bounds (with high probability for the randomized methods). In partic-ular, we show that the co ..."
Abstract
-
Cited by 42 (14 self)
- Add to MetaCart
We give fast randomized and deterministic parallel meth-ods for constructing convex hulls in IR d, for any fixed d. Our methods are for the weakest shared-memory model,the EREW PRAM, and have optimal work bounds (with high probability for the randomized methods). In partic-ular, we show that the convex hull of n points in IRd canbe constructed in O(log n) time using O(n log n + nbd=2c)work, with high probability. We also show that it can be constructed deterministically in O(log2 n) time using O(n log n) work for d = 3 and in O(log n) time using O(nbd=2c logc(dd=2e\Gamma bd=2c) n) work, for d * 4, where c? 0is a constant, which is optimal for even d * 4. We also showhow to make our 3-dimensional methods output-sensitive with only a small increase in running time.These methods can be applied to other problems as well. A variation of the convex hull algorithm for even dimen-sions deterministically constructs a (1=r)-cutting of n hy-perplanes in IR d in O(log n) time using optimal O(nrd\Gamma 1) work; when r = n, we obtain their arrangement and a pointlocation data structure for it. With appropriate modifications, our deterministic 3-dimensional convex hull algorithmcan be used to compute, in the same resource bounds, the intersection of n balls of equal radius in R³. This leads to asequential algorithm for computing the diameter of a point set in IR3 with running time O(n log³ n), which is arguablysimpler than an algorithm with the same running time by Brönnimann et al.
Space-Efficient Scheduling of Parallelism with Synchronization Variables
"... Recent work on scheduling algorithms has resulted in provable bounds on the space taken by parallel computations in relation to the space taken by sequential computations. The results for online versions of these algorithms, however, have been limited to computations in which threads can only synchr ..."
Abstract
-
Cited by 28 (10 self)
- Add to MetaCart
Recent work on scheduling algorithms has resulted in provable bounds on the space taken by parallel computations in relation to the space taken by sequential computations. The results for online versions of these algorithms, however, have been limited to computations in which threads can only synchronize with ancestor or sibling threads. Such computations do not include languages with futures or user-specified synchronization constraints. Here we extend the results to languages with synchronization variables. Such languages include languages with futures, such as Multilisp and Cool, as well as other languages such asid. The main result is an online scheduling algorithm which, given a computation with w work (total operations), synchronizations, d depth (critical path) and s1 sequential space, will run in O(w=p + log(pd)=p + d log(pd)) time and s1 + O(pd log(pd)) space, on a p-processor crcw pram with a fetch-and-add primitive. This includes all time and space costs for both the computation and the scheduler. The scheduler is non-preemptive in the sense that it will only move a thread if the thread suspends on a synchronization, forks a new thread, or exceeds a threshold when allocating space. For the special case where the computation is a planar graph with left-to-right synchronization edges, the scheduling algorithm can be implemented in O(w=p+d log p) time and s1 + O(pd log p) space. These are the first nontrivial space bounds described for such languages.
On Parallel Hashing and Integer Sorting
, 1991
"... The problem of sorting n integers from a restricted range [1::m], where m is superpolynomial in n, is considered. An o(n log n) randomized algorithm is given. Our algorithm takes O(n log log m) expected time and O(n) space. (Thus, for m = n polylog(n) we have an O(n log log n) algorithm.) The al ..."
Abstract
-
Cited by 24 (9 self)
- Add to MetaCart
The problem of sorting n integers from a restricted range [1::m], where m is superpolynomial in n, is considered. An o(n log n) randomized algorithm is given. Our algorithm takes O(n log log m) expected time and O(n) space. (Thus, for m = n polylog(n) we have an O(n log log n) algorithm.) The algorithm is parallelizable. The resulting parallel algorithm achieves optimal speed up. Some features of the algorithm make us believe that it is relevant for practical applications. A result of independent interest is a parallel hashing technique. The expected construction time is logarithmic using an optimal number of processors, and searching for a value takes O(1) time in the worst case. This technique enables drastic reduction of space requirements for the price of using randomness. Applicability of the technique is demonstrated for the parallel sorting algorithm, and for some parallel string matching algorithms. The parallel sorting algorithm is designed for a strong and non standard mo...
Fully Dynamic Search Trees for an Extension of the BSP Model
- In Proc. 8th ACM Symp. on Parallel Algorithms and Architectures
, 1996
"... We present parallel algorithms that maintain a 2-3 tree under insertions and deletions. The algorithms are designed for an extension of Valiant's BSP model, BSP*, that rewards blockwise communication, i.e. better use of bandwidth of routers and reduction of the overhead involved in communication. Th ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
We present parallel algorithms that maintain a 2-3 tree under insertions and deletions. The algorithms are designed for an extension of Valiant's BSP model, BSP*, that rewards blockwise communication, i.e. better use of bandwidth of routers and reduction of the overhead involved in communication. The BSP*-model is introduced by Baumker et al. in [2]. Our analysis of the data structure goes beyond standard asymptotic analysis: We use Valiant's notion of c-optimality. Intuitively c-optimal algorithms tend to speedup p=c with growing input size (p denotes the number of processors), where the communication time is asymptotically smaller than the computation time. Our first approach allows 1-optimal searching and amortized c-optimal insertion and deletion for a small constant c. The second one allows 2-optimal searching, and c-optimal deletion and insertion for a small constant c. Both results hold with probability 1 \Gamma o(1) for wide ranges of BSP*- parameters, where the ranges beco...
Peer-to-Peer Systems for Prefix Search
- In Proceedings of the Symposium on Principles of Distributed Computing
, 2003
"... This paper presents a general methodology for building messagepassing peer-to-peer systems capable of performing prefix search for arbitrary user-defined names. Our methodology allows to achieve even load distribution, high fault-tolerance, and low-congestion concurrent query execution. This is th ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
This paper presents a general methodology for building messagepassing peer-to-peer systems capable of performing prefix search for arbitrary user-defined names. Our methodology allows to achieve even load distribution, high fault-tolerance, and low-congestion concurrent query execution. This is the first known peer-to-peer system for prefix search with such properties. The essence of this methodology is a plug and play paradigm for designing a peer-to-peer system as a modular composition of arbitrary concurrent data structures.
Multisearch Techniques: Parallel Data Structures on Mesh-Connected Computers
- Journal of Parallel and Distributed Computing
, 1994
"... The {\em multisearch problem} is defined as follows. Given a data structure $D$ modeled as a graph with $n$ constant-degree nodes, perform $O(n)$ searches on $D$. Let $r$ be the length of the longest search path associated with a search process, and assume that the paths are determined ``on-line''. ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
The {\em multisearch problem} is defined as follows. Given a data structure $D$ modeled as a graph with $n$ constant-degree nodes, perform $O(n)$ searches on $D$. Let $r$ be the length of the longest search path associated with a search process, and assume that the paths are determined ``on-line''. That is, the search paths may overlap arbitrarily. In this paper, we solve the multisearch problem for certain classes of graphs in $O(\sqrt{n} + {r} \frac{\sqrt{n}}{\log n})$ time on a $\sqrt{n} \times \sqrt{n}$ mesh-connected computer. For many data structures, the search path traversed when answering one search query has length $r=O(\log n)$. For these cases, our algorithm processes $O(n)$ such queries in asymptotically optimal $\Theta(\sqrt{n})$ time. The classes of graphs we consider contain many of the important data structures that arise in practice, ranging from simple trees to Kirkpatrick hierarchical search DAGs. Multisearch is a useful abstraction that can be used to implement parallel versions of standard sequential data structures on a mesh. As example applications, we consider a variety of parallel online tree traversals, as well as hierarchical representations of polyhedra and its myriad of applications (lines-polyhedron intersection queries, multiple tangent plane determination, intersecting convex polyhedra, and three-dimensional convex hull).
Thinking in Parallel: Some Basic DataParallel Algorithms and Techniques
- College Park, MD
, 1993
"... PRAM-On-Chip Explicit Multi-Threading (XMT) platform is provided through the XMT home page www.umiacs.umd.edu/users/vishkin/XMT and the class home page. Comments are welcome: please write to me using my last name at umd.edu ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
PRAM-On-Chip Explicit Multi-Threading (XMT) platform is provided through the XMT home page www.umiacs.umd.edu/users/vishkin/XMT and the class home page. Comments are welcome: please write to me using my last name at umd.edu
Pipelining with Futures
, 1997
"... Pipelining has been used in the design of many PRAM algorithms to reduce their asymptotic running time. Paul, Vishkin, and Wagener (PVW) used the approach in a parallel implementation of 2-3 trees. The approach was later used by Cole in the first O(lg n) time sorting algorithm on the PRAM not based ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Pipelining has been used in the design of many PRAM algorithms to reduce their asymptotic running time. Paul, Vishkin, and Wagener (PVW) used the approach in a parallel implementation of 2-3 trees. The approach was later used by Cole in the first O(lg n) time sorting algorithm on the PRAM not based on the AKS sorting network, and has since been used to improve the time of several other algorithms. Although the approach has improved the asymptotic time of many algorithms, there are two practical problems: maintaining the pipeline is quite complicated for the programmer, and the pipelining forces highly synchronous code execution. Synchronous execution is less practical on asynchronous machines and makes it difficult to modify a schedule to use less memory or to take better advantage of locality.
Blocking in Parallel Multisearch Problems (Extended Abstract)
, 1998
"... ) Wolfgang Dittrich Bosch Telecom GmbH, UC-ON/ERS Gerberstrae 33 71522 Backnang, Germany Wolfgang.Dittrich@bk.bosch.de David Hutchinson y School of Computer Science Carleton University Ottawa, Canada K1S 5B6 hutchins@scs.carleton.ca Anil Maheshwari z School of Computer Science Carleto ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
) Wolfgang Dittrich Bosch Telecom GmbH, UC-ON/ERS Gerberstrae 33 71522 Backnang, Germany Wolfgang.Dittrich@bk.bosch.de David Hutchinson y School of Computer Science Carleton University Ottawa, Canada K1S 5B6 hutchins@scs.carleton.ca Anil Maheshwari z School of Computer Science Carleton University Ottawa, Canada K1S 5B6 maheshwa@scs.carleton.ca Abstract External memory (EM) algorithms are designed for computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Block-wise access to data is a central theme in the design of efficient EM algorithms. A similar requirement arises in the transmission of data between processors in certain parallel computation models such as BSP* and CGM, for which block-wise communication is a crucial issue. We consider multisearch problems, where a large number of queries are to be simultaneously processed and satisfied by navigating through large data structures on parallel ...

