Results 1 - 10
of
23
Cilk: An Efficient Multithreaded Runtime System
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1995
"... Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "critical-path length" of a C ..."
Abstract
-
Cited by 430 (34 self)
- Add to MetaCart
Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "critical-path length" of a Cilk computation can be used to model performance accurately. Consequently, a Cilk programmer can focus on reducing the computation's work and critical-path length, insulated from load balancing and other runtime scheduling issues. We also prove that for the class of "fully strict" (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal. The Cilk
The Implementation of the Cilk-5 Multithreaded Language
- In Proceedings of the SIGPLAN '98 Conference on Program Language Design and Implementation
, 1998
"... The fifth release of the multithreaded language Cilk uses a provably good "work-stealing" scheduling algorithm similar to the first system, but the language has been completely redesigned and the runtime system completely reengineered. The efficiency of the new implementation was aided by a clear st ..."
Abstract
-
Cited by 248 (20 self)
- Add to MetaCart
The fifth release of the multithreaded language Cilk uses a provably good "work-stealing" scheduling algorithm similar to the first system, but the language has been completely redesigned and the runtime system completely reengineered. The efficiency of the new implementation was aided by a clear strategy that arose from a theoretical analysis of the scheduling algorithm: concentrate on minimizing overheads that contribute to the work, even at the expense of overheads that contribute to the critical path. Although it may seem counterintuitive to move overheads onto the critical path, this "work-first" principle has led to a portable Cilk-5 implementation in which the typical cost of spawning a parallel thread is only between 2 and 6 times the cost of a C function call on a variety of contemporary machines. Many Cilk programs run on one processor with virtually no degradation compared to equivalent C programs. This paper describes how the work-first principle was exploited in the design...
Adaptive and Reliable Parallel Computing on Networks of Workstations
, 1996
"... In this paper, we present the design of Cilk-NOW, a runtime system that adaptively and reliably executes functional Cilk programs in parallel on a network of UNIX workstations. Cilk (pronounced “silk”) is a parallel multithreaded extension of the C language, and all Cilk runtime systems employ a pro ..."
Abstract
-
Cited by 60 (1 self)
- Add to MetaCart
In this paper, we present the design of Cilk-NOW, a runtime system that adaptively and reliably executes functional Cilk programs in parallel on a network of UNIX workstations. Cilk (pronounced “silk”) is a parallel multithreaded extension of the C language, and all Cilk runtime systems employ a provably efficient threadscheduling algorithm. Cilk-NOW is such a runtime system, and in addition, Cilk-NOW automatically delivers adaptive and reliable execution for a functional subset of Cilk programs. By adaptive execution, we mean that each Cilk program dynamically utilizes a changing set of otherwise-idle workstations. By reliable execution, we mean that the Cilk-NOW system as a whole and each executing Cilk program are able to tolerate machine and network faults. Cilk-NOW provides these features while programs remain fault oblivious, meaning that Cilk programmers need not code for fault tolerance. Throughout this paper, we focus on end-to-end design decisions, and we show how these decisions allow the design to exploit high-level algorithmic properties of the Cilk programming model in order to simplify and streamline the implementation.
Dag-consistent distributed shared memory
- IN PROCEEDINGS OF THE 10TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM (IPPS
, 1996
"... We introduce dag consistency, a relaxed consistency model for distributed shared memory which is suitable for multithreaded programming. We have implemented dag-consistency in software for the Cilk multithreaded runtime system running on a Connection Machine CM5. Our implementation includes a dag-co ..."
Abstract
-
Cited by 57 (13 self)
- Add to MetaCart
We introduce dag consistency, a relaxed consistency model for distributed shared memory which is suitable for multithreaded programming. We have implemented dag-consistency in software for the Cilk multithreaded runtime system running on a Connection Machine CM5. Our implementation includes a dag-consistent distributed cactus stack for storage allocation. We provide empirical evidence of the flexibility and efficiency of dag consistency for applications that include blocked matrix multiplication, Strassen’s matrix multiplication algorithm, and a Barnes-Hut code. Although Cilk schedules the executions of these programs dynamically, their performances are competitive with statically scheduled implementations in the literature. We also prove that the number FP of page faults incurred by a user program running onPprocessors can be related to the numberF1of page faults running serially by the formula FP F1+2Cs, where C is the cache size andsis the number of thread migrations executed by Cilk’s scheduler.
The Cilk System for Parallel Multithreaded Computing
, 1996
"... Although cost-effective parallel machines are now commercially available, the widespread use of parallel processing is still being held back, due mainly to the troublesome nature of parallel programming. In particular, it is still diiticult to build eiticient implementations of parallel applications ..."
Abstract
-
Cited by 39 (1 self)
- Add to MetaCart
Although cost-effective parallel machines are now commercially available, the widespread use of parallel processing is still being held back, due mainly to the troublesome nature of parallel programming. In particular, it is still diiticult to build eiticient implementations of parallel applications whose communication patterns are either highly irregular or dependent upon dynamic information. Multithreading has become an increasingly popular way to implement these dynamic, asynchronous, concurrent programs. Cilk (pronounced "silk") is our C-based multithreaded computing system that provides provably good performance guarantees. This thesis describes the evolution of the Cilk language and runtime system, and describes applications which affected the evolution of the system.
Compiler Technology for Portable Checkpoints
, 1998
"... We have implemented a prototype compiler called porch that transforms C programs into C programs supporting portable checkpoints. Portable checkpoints capture the state of a computation in a machine-independent format that allows the transfer of computations across binary incompatible machines. We i ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
We have implemented a prototype compiler called porch that transforms C programs into C programs supporting portable checkpoints. Portable checkpoints capture the state of a computation in a machine-independent format that allows the transfer of computations across binary incompatible machines. We introduce sourceto -source compilation techniques for generating code to save and recover from such portable checkpoints automatically. These techniques instrument a program with code that maps the state of a computation into a machine-independent representation and vice versa. In particular, the following problems are addressed: (1) providing stack environment portability, (2) enabling conversion of complex data types, and (3) rendering pointers portable. Experimental results show that the overhead of checkpointing is reasonably small, even if data representation conversion is required for portability. 1 Introduction This paper presents a source-to-souce compiler technique that transforms s...
Portable High-Performance Programs
, 1999
"... right notice and this permission notice are preserved on all copies. ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
right notice and this permission notice are preserved on all copies.
Programming Parallel Applications in Cilk
- SINEWS: SIAM News
, 1997
"... Cilk (pronounced "silk") is a C-based language for multithreaded parallel programming. Cilk makes it easy to program irregular parallel applications, especially as compared with data-parallel or message-passing programming systems. A Cilk programmer need not worry about protocols and load balanci ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Cilk (pronounced "silk") is a C-based language for multithreaded parallel programming. Cilk makes it easy to program irregular parallel applications, especially as compared with data-parallel or message-passing programming systems. A Cilk programmer need not worry about protocols and load balancing, which are handled by Cilk's provably efficient runtime system. Many regular and irregular Cilk applications run nearly as fast on one processor as comparable C programs, and they scale well to many processors. 1 Background and goals Cilk is an algorithmic multithreaded language. The philosophy behind Cilk is that a programmer should concentrate on structuring his program to expose parallelism and exploit locality, leaving the runtime system with the responsibility of scheduling the computation to run efficiently on a given platform. Cilk's runtime system takes care of details like load balancing and communication protocols. Unlike other multithreaded languages, however, Cilk is algor...
Dynamic processor allocation for adaptively parallel jobs
- Massachusetts Institute of technology
, 2004
"... This thesis addresses the problem of scheduling multiple, concurrent, adaptively parallel jobs on a multiprogrammed shared-memory multiprocessor. Adaptively parallel jobs are jobs for which the number of processors that can be used without waste varies during execution. We focus on the specific case ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This thesis addresses the problem of scheduling multiple, concurrent, adaptively parallel jobs on a multiprogrammed shared-memory multiprocessor. Adaptively parallel jobs are jobs for which the number of processors that can be used without waste varies during execution. We focus on the specific case of parallel jobs that are scheduled using a randomized work-stealing algorithm, as is used in the Cilk multithreaded language. We begin by developing a theoretical model for two-level scheduling systems, or those in which the operating system allocates processors to jobs, and the jobs schedule their threads on the processors. To analyze the performance of a job scheduling algorithm, we model the operating system as an adversary. We show that a greedy scheduler achieves an execution time that is within a factor of 2 of optimal under these conditions. Guided by our model, we present a randomized work-stealing algorithm for adaptively parallel jobs, algorithm WSAP, which takes a unique approach to estimating the processor desire of a job. We show that attempts to directly measure a

