Results 1 - 10
of
16
Cilk: An Efficient Multithreaded Runtime System
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1995
"... Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "critical-path length" of a C ..."
Abstract
-
Cited by 430 (34 self)
- Add to MetaCart
Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "critical-path length" of a Cilk computation can be used to model performance accurately. Consequently, a Cilk programmer can focus on reducing the computation's work and critical-path length, insulated from load balancing and other runtime scheduling issues. We also prove that for the class of "fully strict" (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal. The Cilk
Scheduling Multithreaded Computations by Work Stealing
"... This paper studies the problem of efficiently scheduling fully strict (i.e., well-structured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMD-style computation is "work stealing," in which processors needing work steal computa ..."
Abstract
-
Cited by 316 (32 self)
- Add to MetaCart
This paper studies the problem of efficiently scheduling fully strict (i.e., well-structured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMD-style computation is "work stealing," in which processors needing work steal computational threads from other processors. In this paper, we give the first provably good work-stealing scheduler for multithreaded computations with dependencies. Specifically,
Detecting Data Races in Cilk Programs that Use Locks
, 1998
"... When two parallel threads holding no locks in common access the same memory location and at least one of the threads modifies the location, a “data race ” occurs, which is usually a bug. This paper describes the algorithms and strategies used by a debugging tool, called the Nondeterminator-2, which ..."
Abstract
-
Cited by 75 (10 self)
- Add to MetaCart
When two parallel threads holding no locks in common access the same memory location and at least one of the threads modifies the location, a “data race ” occurs, which is usually a bug. This paper describes the algorithms and strategies used by a debugging tool, called the Nondeterminator-2, which checks for data races in programs coded in the Cilk multithreaded language. Like its predecessor, the Nondeterminator, which checks for simple “determinacy” races, the Nondeterminator-2 is a debugging tool, not a verifier, since it checks for data races only in the computation generated by a serial execution of the program on a given input. We give an algorithm, ALL-SETS, that determines whether the computation generated by a serial execution of a Cilk program on a given input contains a race. For a program that runs serially in time T, accesses V shared memory locations, uses a total of n locks, and holds at most k n locks simultaneously, ALL-SETS runs in O(nkT α(V;V))time and O(nk α(V;V)) V)space, where α is Tarjan’s functional inverse of Ackermann’s function. Since ALL-SETS may be too inefficient in the worst case, we propose a much more efficient algorithm which can be used to detect races in programs that obey the “umbrella ” locking discipline, a programming methodology that is more flexible than similar disciplines proposed in the literature. We present an algorithm, BRELLY, which detects violations of the umbrella discipline in O(kT time Keywords using O(kV)space. We also prove that any “abelian ” Cilk program, one whose critical sections commute, produces a determinate final state if it is deadlock free and if it generates any computation which is datarace free. Thus, the Nondeterminator-2’s two algorithms can verify the determinacy of a deadlock-free abelian program running on a given input.
Scheduling Large-Scale Parallel Computations on Networks of Workstations
- In Proceedings of the Third International Symposium on High Performance Distributed Computing
, 1994
"... Workstation networks are an underutilized yet valuable resource for solving large-scale parallel problems. In this paper, we present "idle-initiated" techniques for efficiently scheduling large-scale parallel computations on workstation networks. By "idle-initiated," we mean that idle computers acti ..."
Abstract
-
Cited by 44 (6 self)
- Add to MetaCart
Workstation networks are an underutilized yet valuable resource for solving large-scale parallel problems. In this paper, we present "idle-initiated" techniques for efficiently scheduling large-scale parallel computations on workstation networks. By "idle-initiated," we mean that idle computers actively search out work to do rather than wait for work to be assigned. The idleinitiated scheduler operates at both the macro and the micro levels. On the macro level, a computer without work joins an ongoing parallel computation as a participant. On the micro level, a participant without work "steals" work from some other participant of the same computation. We have implemented these scheduling techniques in Phish, a portable system for running dynamic parallel applications on a network of workstations. 1 Introduction Even with the annual exponential improvements in microprocessor speed, a large body of problems cannot be solved in a reasonable time on a single computer. One method of reduc...
The Cilk System for Parallel Multithreaded Computing
, 1996
"... Although cost-effective parallel machines are now commercially available, the widespread use of parallel processing is still being held back, due mainly to the troublesome nature of parallel programming. In particular, it is still diiticult to build eiticient implementations of parallel applications ..."
Abstract
-
Cited by 39 (1 self)
- Add to MetaCart
Although cost-effective parallel machines are now commercially available, the widespread use of parallel processing is still being held back, due mainly to the troublesome nature of parallel programming. In particular, it is still diiticult to build eiticient implementations of parallel applications whose communication patterns are either highly irregular or dependent upon dynamic information. Multithreading has become an increasingly popular way to implement these dynamic, asynchronous, concurrent programs. Cilk (pronounced "silk") is our C-based multithreaded computing system that provides provably good performance guarantees. This thesis describes the evolution of the Cilk language and runtime system, and describes applications which affected the evolution of the system.
Programming Parallel Applications in Cilk
- SINEWS: SIAM News
, 1997
"... Cilk (pronounced "silk") is a C-based language for multithreaded parallel programming. Cilk makes it easy to program irregular parallel applications, especially as compared with data-parallel or message-passing programming systems. A Cilk programmer need not worry about protocols and load balanci ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Cilk (pronounced "silk") is a C-based language for multithreaded parallel programming. Cilk makes it easy to program irregular parallel applications, especially as compared with data-parallel or message-passing programming systems. A Cilk programmer need not worry about protocols and load balancing, which are handled by Cilk's provably efficient runtime system. Many regular and irregular Cilk applications run nearly as fast on one processor as comparable C programs, and they scale well to many processors. 1 Background and goals Cilk is an algorithmic multithreaded language. The philosophy behind Cilk is that a programmer should concentrate on structuring his program to expose parallelism and exploit locality, leaving the runtime system with the responsibility of scheduling the computation to run efficiently on a given platform. Cilk's runtime system takes care of details like load balancing and communication protocols. Unlike other multithreaded languages, however, Cilk is algor...
Adaptive Work Stealing with Parallelism Feedback
"... Abstract We present an adaptive work-stealing thread scheduler, A-STEAL, for fork-join multithreaded jobs, like those written using the Cilk multithreaded language or the Hood work-stealinglibrary. The A-STEAL algorithm is appropriate for large parallel servers where many jobs share a common multipr ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract We present an adaptive work-stealing thread scheduler, A-STEAL, for fork-join multithreaded jobs, like those written using the Cilk multithreaded language or the Hood work-stealinglibrary. The A-STEAL algorithm is appropriate for large parallel servers where many jobs share a common multiprocessorresource and in which the number of processors available to a particular job may vary during the job's execution. A-STEALprovides continual parallelism feedback to a job scheduler in the form of processor requests, and the job must adapt its ex-ecution to the processors allotted to it. Assuming that the job scheduler never allots any job more processors than requestedby the job's thread scheduler, A-STEAL guarantees that the job completes in near-optimal time while utilizing at least a con-stant fraction of the allotted processors. Our analysis models the job scheduler as the thread sched-uler's adversary, challenging the thread scheduler to be robust to the system environment and the job scheduler's administra-tive policies. We analyze the performance of A-STEAL using "trim analysis, " which allows us to prove that our thread sched-uler performs poorly on at most a small number of time steps, while exhibiting near-optimal behavior on the vast majority.To be precise, suppose that a job has work T1 and critical-path length T1. On a machine with P processors, A-STEALcompletes the job in expected O(T1/eP + T1 + L lg P) timesteps, where L is the length of a scheduling quantum and ePdenotes the O(T1 + L lg P)-trimmed availability. This quan-tity is the average of the processor availability over all but
Using Cilk to Write Multiprocessor Chess Programs
- The Journal of the International Computer Chess Association
, 2001
"... This paper overviews the Cilk language, illustrating how Cilk supports the programming of parallel game-tree search and other chess mechanisms ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper overviews the Cilk language, illustrating how Cilk supports the programming of parallel game-tree search and other chess mechanisms
Cilk-4.0 (Beta 1) Reference Manual
"... This document describes Cilk-4.0, a language for multithreaded parallel programming based on ANSI C. Cilk is designed for computations with dynamic, highly asynchronous parallelism, which can be difficult to write in data-parallel or messagepassing style. Divide-and-conquer algorithms and tree searc ..."
Abstract
- Add to MetaCart
This document describes Cilk-4.0, a language for multithreaded parallel programming based on ANSI C. Cilk is designed for computations with dynamic, highly asynchronous parallelism, which can be difficult to write in data-parallel or messagepassing style. Divide-and-conquer algorithms and tree search are examples of computations that are particularly well suited to Cilk. (See, for example, our world-class

