Results 1 - 10
of
68
Cilk: An Efficient Multithreaded Runtime System
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1995
"... Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "critical-path length" of a C ..."
Abstract
-
Cited by 430 (34 self)
- Add to MetaCart
Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "critical-path length" of a Cilk computation can be used to model performance accurately. Consequently, a Cilk programmer can focus on reducing the computation's work and critical-path length, insulated from load balancing and other runtime scheduling issues. We also prove that for the class of "fully strict" (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal. The Cilk
Scheduling Multithreaded Computations by Work Stealing
"... This paper studies the problem of efficiently scheduling fully strict (i.e., well-structured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMD-style computation is "work stealing," in which processors needing work steal computa ..."
Abstract
-
Cited by 316 (32 self)
- Add to MetaCart
This paper studies the problem of efficiently scheduling fully strict (i.e., well-structured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMD-style computation is "work stealing," in which processors needing work steal computational threads from other processors. In this paper, we give the first provably good work-stealing scheduler for multithreaded computations with dependencies. Specifically,
Programming languages for distributed computing systems
- ACM Computing Surveys
, 1989
"... When distributed systems first appeared, they were programmed in traditional sequential languages, usually with the addition of a few library procedures for sending and receiving messages. As distributed applications became more commonplace and more sophisticated, this ad hoc approach became less sa ..."
Abstract
-
Cited by 211 (15 self)
- Add to MetaCart
When distributed systems first appeared, they were programmed in traditional sequential languages, usually with the addition of a few library procedures for sending and receiving messages. As distributed applications became more commonplace and more sophisticated, this ad hoc approach became less satisfactory. Researchers all over the world began designing new programming languages specifically for implementing distributed applications. These languages and their history, their underlying principles, their design, and their use are the subject of this paper. We begin by giving our view of what a distributed system is, illustrating with examples to avoid confusion on this important and controversial point. We then describe the three main characteristics that distinguish distributed programming languages from traditional sequential languages, namely, how they deal with parallelism, communication, and partial failures. Finally, we discuss 15 representative distributed languages to give the flavor of each. These examples include languages based on message passing, rendezvous, remote procedure call, objects, and atomic transactions, as well as functional languages, logic languages, and distributed data structure languages. The paper concludes with a comprehensive bibliography listing over 200 papers on nearly 100 distributed programming languages.
Thread scheduling for multiprogrammed multiprocessors
- In Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), Puerto Vallarta
, 1998
"... We present a user-level thread scheduler for shared-memory multiprocessors, and we analyze its performance under multiprogramming. We model multiprogramming with two scheduling levels: our scheduler runs at user-level and schedules threads onto a fixed collection of processes, while below, the opera ..."
Abstract
-
Cited by 130 (5 self)
- Add to MetaCart
We present a user-level thread scheduler for shared-memory multiprocessors, and we analyze its performance under multiprogramming. We model multiprogramming with two scheduling levels: our scheduler runs at user-level and schedules threads onto a fixed collection of processes, while below, the operating system kernel schedules processes onto a fixed collection of processors. We consider the kernel to be an adversary, and our goal is to schedule threads onto processes such that we make efficient use of whatever processor resources are provided by the kernel. Our thread scheduler is a non-blocking implementation of the work-stealing algorithm. For any multithreaded computation with work ¢¤ £ and critical-path length ¢¦ ¥ , and for any number § of processes, our scheduler executes the computation in expected time ¨�©�¢�£���§¤����¢�¥�§���§¤�� � , where §� � is the average number of processors allocated to the computation by the kernel. This time bound is optimal to within a constant factor, and achieves linear speedup whenever § is small relative to the parallelism 1
Paradigms for process interaction in distributed programs
- ACM Computing Surveys
, 1991
"... Distributed computations are concurrent programs in which processes communicate by message passing. Such programs typically execute on network architectures such as networks of workstations ordistributed memory parallel machines (i. e, multicomputers such ashypercubes). Several paradigms—examples or ..."
Abstract
-
Cited by 108 (0 self)
- Add to MetaCart
Distributed computations are concurrent programs in which processes communicate by message passing. Such programs typically execute on network architectures such as networks of workstations ordistributed memory parallel machines (i. e, multicomputers such ashypercubes). Several paradigms—examples or models—for process interaction
ATLAS: An Infrastructure for Global Computing
, 1996
"... In this paper, we present a proposed system architecture for global computing that we call Atlas, and we describe an early prototype that implements several of the mechanisms and policies that comprise the proposed architecture. Atlas is designed to execute parallel multithreaded programs on the net ..."
Abstract
-
Cited by 94 (0 self)
- Add to MetaCart
In this paper, we present a proposed system architecture for global computing that we call Atlas, and we describe an early prototype that implements several of the mechanisms and policies that comprise the proposed architecture. Atlas is designed to execute parallel multithreaded programs on the networked computing resources of the world. The Atlas system is a marriage of existing technologies from Java and Cilk together with some new technologies needed to extend the system into the global domain. 1 Introduction The goal of the Atlas system is to exploit the networked resources of the world as a giant distributed computer, and to develop an infrastructure that exploits idle resources both within and among institutions. The Atlas architecture strives for several desirable properties for a global computing infrastructure. Scalability: It must scale to millions of nodes. Heterogeneity: It must include a variety of platforms and operating systems, all of which must interoperate. Faul...
Scalable Load Balancing Techniques for Parallel Computers
, 1994
"... In this paper we analyze the scalability of a number of load balancing algorithms which can be applied to problems that have the following characteristics : the work done by a processor can be partitioned into independent work pieces; the work pieces are of highly variable sizes; and it is not po ..."
Abstract
-
Cited by 89 (16 self)
- Add to MetaCart
In this paper we analyze the scalability of a number of load balancing algorithms which can be applied to problems that have the following characteristics : the work done by a processor can be partitioned into independent work pieces; the work pieces are of highly variable sizes; and it is not possible (or very difficult) to estimate the size of total work at a given processor. Such problems require a load balancing scheme that distributes the work dynamically among different processors. Our goal here is to determine the most scalable load balancing schemes for different architectures such as hypercube, mesh and network of workstations. For each of these architectures, we establish lower bounds on the scalability of any possible load balancing scheme. We present the scalability analysis of a number of load balancing schemes that have not been analyzed before. This gives us valuable insights into their relative performance for different problem and architectural characteristi...
The Power of Two Random Choices: A Survey of Techniques and Results
- in Handbook of Randomized Computing
, 2000
"... ITo motivate this survey, we begin with a simple problem that demonstrates a powerful fundamental idea. Suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random. Then the maximum load, or the largest number of balls in any bin, is approximately ..."
Abstract
-
Cited by 79 (2 self)
- Add to MetaCart
ITo motivate this survey, we begin with a simple problem that demonstrates a powerful fundamental idea. Suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random. Then the maximum load, or the largest number of balls in any bin, is approximately log n= log log n with high probability. Now suppose instead that the balls are placed sequentially, and each ball is placed in the least loaded of d 2 bins chosen independently and uniformly at random. Azar, Broder, Karlin, and Upfal showed that in this case, the maximum load is log log n= log d + (1) with high probability [ABKU99]. The important implication of this result is that even a small amount of choice can lead to drastically different results in load balancing. Indeed, having just two random choices (i.e.,...
Network Based Concurrent Computing on the PVM System
- Journal of Concurrency: Practice and Experience
, 1991
"... Concurrent computing environments based on loosely coupled networks have proven effective as resources for multiprocessing. Experiences with and enhancements to PVM (Parallel Virtual Machine) are described in this paper. PVM is a software system that allows the utilization of a heterogeneous networ ..."
Abstract
-
Cited by 57 (5 self)
- Add to MetaCart
Concurrent computing environments based on loosely coupled networks have proven effective as resources for multiprocessing. Experiences with and enhancements to PVM (Parallel Virtual Machine) are described in this paper. PVM is a software system that allows the utilization of a heterogeneous network of parallel and serial computers as a single computational resource. This report also describes an interactive graphical interface to PVM, and porting and performance results from production applications. 1. Introduction Concurrent computing environments based on networks of computers can be an effective, viable, and economically attractive complement to hardware multiprocessors. A case in point is the number field sieve project of Lenstra and Manasse [1], whose most recent milestone is the factoring of the ninth Fermat number (148 digits) using over 1,000 computers worldwide. Although such large scale use of network-based concurrent computing may be rare, there exist many examples of thi...
On the Efficiency of Parallel Backtracking
, 1992
"... It is known that isolated executions of parallel backtrack search exhibit speedup anomalies. In this paper we present analytical models and experimental results on the average case behavior of parallel backtracking. We consider two types of backtrack search algorithms: (i) simple backtracking (wh ..."
Abstract
-
Cited by 45 (6 self)
- Add to MetaCart
It is known that isolated executions of parallel backtrack search exhibit speedup anomalies. In this paper we present analytical models and experimental results on the average case behavior of parallel backtracking. We consider two types of backtrack search algorithms: (i) simple backtracking (which does not use any heuristic information) ; (ii) heuristic backtracking (which uses heuristics to order and prune search). We present analytical models to compare the average number of nodes visited in sequential and parallel search for each case. For simple backtracking, we show that the average speedup obtained is (i) linear when distribution of solutions is uniform and (ii) superlinear when distribution of solutions is non-uniform. For heuristic backtracking, the average speedup obtained is at least linear (i.e., either linear or superlinear), and the speedup obtained on a subset of instances (called difficult instances) is superlinear. We also present experimental results over ...

