Results 1 - 10
of
13
Cilk: An Efficient Multithreaded Runtime System
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1995
"... Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "critical-path length" of a C ..."
Abstract
-
Cited by 430 (34 self)
- Add to MetaCart
Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "critical-path length" of a Cilk computation can be used to model performance accurately. Consequently, a Cilk programmer can focus on reducing the computation's work and critical-path length, insulated from load balancing and other runtime scheduling issues. We also prove that for the class of "fully strict" (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal. The Cilk
Scheduling Large-Scale Parallel Computations on Networks of Workstations
- In Proceedings of the Third International Symposium on High Performance Distributed Computing
, 1994
"... Workstation networks are an underutilized yet valuable resource for solving large-scale parallel problems. In this paper, we present "idle-initiated" techniques for efficiently scheduling large-scale parallel computations on workstation networks. By "idle-initiated," we mean that idle computers acti ..."
Abstract
-
Cited by 44 (6 self)
- Add to MetaCart
Workstation networks are an underutilized yet valuable resource for solving large-scale parallel problems. In this paper, we present "idle-initiated" techniques for efficiently scheduling large-scale parallel computations on workstation networks. By "idle-initiated," we mean that idle computers actively search out work to do rather than wait for work to be assigned. The idleinitiated scheduler operates at both the macro and the micro levels. On the macro level, a computer without work joins an ongoing parallel computation as a participant. On the micro level, a participant without work "steals" work from some other participant of the same computation. We have implemented these scheduling techniques in Phish, a portable system for running dynamic parallel applications on a network of workstations. 1 Introduction Even with the annual exponential improvements in microprocessor speed, a large body of problems cannot be solved in a reasonable time on a single computer. One method of reduc...
The Cilk System for Parallel Multithreaded Computing
, 1996
"... Although cost-effective parallel machines are now commercially available, the widespread use of parallel processing is still being held back, due mainly to the troublesome nature of parallel programming. In particular, it is still diiticult to build eiticient implementations of parallel applications ..."
Abstract
-
Cited by 39 (1 self)
- Add to MetaCart
Although cost-effective parallel machines are now commercially available, the widespread use of parallel processing is still being held back, due mainly to the troublesome nature of parallel programming. In particular, it is still diiticult to build eiticient implementations of parallel applications whose communication patterns are either highly irregular or dependent upon dynamic information. Multithreading has become an increasingly popular way to implement these dynamic, asynchronous, concurrent programs. Cilk (pronounced "silk") is our C-based multithreaded computing system that provides provably good performance guarantees. This thesis describes the evolution of the Cilk language and runtime system, and describes applications which affected the evolution of the system.
Massively Parallel Chess
- In Proceedings of the Third DIMACS Parallel Implementation Challenge, Rutgers
, 1994
"... Computer chess provides a good testbed for understanding dynamic MIMD-style computations. To investigate the programming issues, we engineered a parallel chess program called *Socrates, which running on the NCSA's 512 processor CM-5, tied for third in the 1994 ACM International Computer Chess Champi ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
Computer chess provides a good testbed for understanding dynamic MIMD-style computations. To investigate the programming issues, we engineered a parallel chess program called *Socrates, which running on the NCSA's 512 processor CM-5, tied for third in the 1994 ACM International Computer Chess Championship. *Socrates uses the Jamboree algorithm to search game trees in parallel and uses the Cilk 1.0 language and run-time system to express and to schedule the computation. In order to obtain good performance for chess, we use several mechanisms not directly provided by Cilk, such as aborting computations and directly accessing the active message layer to implement a global transposition table distributed across the processors. We found that we can use the critical path C and the total work W to predict the performance of our chess programs. Empirically *Socrates runs in time T ß 0:95C+1:09W=P on P processors. For best-ordered uniform trees of height h and degree d the average available pa...
Runtime Support For Portable Distributed Data Structures
, 1995
"... Multipol is a library of distributed data structures designed for irregular applications, including those with asynchronous communication patterns. In this paper, we describe the Multipol runtime layer, which provides an efficient and portable abstraction underlying the data structures. It contains ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
Multipol is a library of distributed data structures designed for irregular applications, including those with asynchronous communication patterns. In this paper, we describe the Multipol runtime layer, which provides an efficient and portable abstraction underlying the data structures. It contains a thread system to express computations with varying degrees of parallelism and to support multiple threads per processor for hiding communication latency. To simplify programming in a multithreaded environment, Multipol threads are small, finite-length computations that are executed atomically. Rather than enforcing a single scheduling policy on threads, users may write their own schedulers or choose one of the schedulers provided by Multipol. The system is designed for distributed memory architectures and performs communication optimizations such as message aggregation to improve efficiency on machines with high communication startup overhead. The runtime system currently runs on the Think...
Using Cilk to Write Multiprocessor Chess Programs
- The Journal of the International Computer Chess Association
, 2001
"... This paper overviews the Cilk language, illustrating how Cilk supports the programming of parallel game-tree search and other chess mechanisms ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper overviews the Cilk language, illustrating how Cilk supports the programming of parallel game-tree search and other chess mechanisms
Macro-Level Scheduling in the Cilk Network of Workstations Environment
, 1996
"... The term "macro-level scheduling" refers to finding and recruiting idle workstations and allocating them to various adaptively parallel applications. In this thesis, I have designed and implemented a macro-level scheduler for the Cilk Network of Workstations environment. Cilk-NOW provides the "micro ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The term "macro-level scheduling" refers to finding and recruiting idle workstations and allocating them to various adaptively parallel applications. In this thesis, I have designed and implemented a macro-level scheduler for the Cilk Network of Workstations environment. Cilk-NOW provides the "micro-level scheduling" needed to allow programs to be executed adaptively in parallel on an unreliable network of workstations. This macro-level scheduler is designed to be hassle-free and easy to use and customize. It can tolerate network faults, and it can recover from workstation failures. Idleness can be defined in a highly flexible way, in order to minimize the chances of bothering a workstation's owner, but without losing valuable computation time. The security mechanism employed by the macroscheduler can be adapted to make running unauthorized code essentially as difficult as any given system's existing remote execution protocol makes it. This scheduler is also fair, in that it assigns jo...
Cilk 3.0 (Alpha 1) Reference Manual
, 1995
"... This document describes Cilk 3.0, an ANSI C-based language and runtime system for multithreaded programming. Cilk is designed for computations with dynamic, highly asynchronous, tree-like parallelism, which are typically difficult to write in data-parallel or message-passing style. Divide-and-conque ..."
Abstract
- Add to MetaCart
This document describes Cilk 3.0, an ANSI C-based language and runtime system for multithreaded programming. Cilk is designed for computations with dynamic, highly asynchronous, tree-like parallelism, which are typically difficult to write in data-parallel or message-passing style. Divide-and-conquer algorithms and backtracking search are examples of tree-like computations that are particularly well-suited to Cilk. 1.1 What is Included in Cilk 3.0
A Comparison of Different Message-Passing Paradigms for the Parallelization of Two Irregular Applications
, 1994
"... . We present experimental results for parallelizing two breadth-first search-based applications on Thinking Machine's CM-5 by using two different message-passing paradigms, one based on send/receive and the other based on active messages. The parallelization of these applications requires fine-grain ..."
Abstract
- Add to MetaCart
. We present experimental results for parallelizing two breadth-first search-based applications on Thinking Machine's CM-5 by using two different message-passing paradigms, one based on send/receive and the other based on active messages. The parallelization of these applications requires fine-grained communication. Our results show that the active messages-based implementation gives significant improvement over the send/receive-based implementation. The improvements can largely be attributed to the lower latency of the active messages implementation. Keywords: Message-passing, Synchronous and asynchronous send/receive, Active messages, Wolff cluster algorithm, Lee's maze-routing algorithm, Overlapping communication with computation, Message coalescing 1. Introduction Two different message-passing paradigms are available on Thinking Machine's CM5 for point-to-point communication. One is the classical message-passing paradigm using send/receive functions, and the other uses the acti...

