Adaptive and Reliable Parallel Computing on Networks of Workstations (1996)
Cached
Download Links
| Citations: | 60 - 1 self |
BibTeX
@MISC{Blumofe96adaptiveand,
author = {Robert D. Blumofe and Philip A. Lisiecki},
title = {Adaptive and Reliable Parallel Computing on Networks of Workstations},
year = {1996}
}
Years of Citing Articles
OpenURL
Abstract
In this paper, we present the design of Cilk-NOW, a runtime system that adaptively and reliably executes functional Cilk programs in parallel on a network of UNIX workstations. Cilk (pronounced “silk”) is a parallel multithreaded extension of the C language, and all Cilk runtime systems employ a provably efficient threadscheduling algorithm. Cilk-NOW is such a runtime system, and in addition, Cilk-NOW automatically delivers adaptive and reliable execution for a functional subset of Cilk programs. By adaptive execution, we mean that each Cilk program dynamically utilizes a changing set of otherwise-idle workstations. By reliable execution, we mean that the Cilk-NOW system as a whole and each executing Cilk program are able to tolerate machine and network faults. Cilk-NOW provides these features while programs remain fault oblivious, meaning that Cilk programmers need not code for fault tolerance. Throughout this paper, we focus on end-to-end design decisions, and we show how these decisions allow the design to exploit high-level algorithmic properties of the Cilk programming model in order to simplify and streamline the implementation.







