Results 1 - 10
of
19
Cilk: An Efficient Multithreaded Runtime System
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1995
"... Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "critical-path length" of a C ..."
Abstract
-
Cited by 430 (34 self)
- Add to MetaCart
Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "critical-path length" of a Cilk computation can be used to model performance accurately. Consequently, a Cilk programmer can focus on reducing the computation's work and critical-path length, insulated from load balancing and other runtime scheduling issues. We also prove that for the class of "fully strict" (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal. The Cilk
The Cilk System for Parallel Multithreaded Computing
, 1996
"... Although cost-effective parallel machines are now commercially available, the widespread use of parallel processing is still being held back, due mainly to the troublesome nature of parallel programming. In particular, it is still diiticult to build eiticient implementations of parallel applications ..."
Abstract
-
Cited by 39 (1 self)
- Add to MetaCart
Although cost-effective parallel machines are now commercially available, the widespread use of parallel processing is still being held back, due mainly to the troublesome nature of parallel programming. In particular, it is still diiticult to build eiticient implementations of parallel applications whose communication patterns are either highly irregular or dependent upon dynamic information. Multithreading has become an increasingly popular way to implement these dynamic, asynchronous, concurrent programs. Cilk (pronounced "silk") is our C-based multithreaded computing system that provides provably good performance guarantees. This thesis describes the evolution of the Cilk language and runtime system, and describes applications which affected the evolution of the system.
Runtime Support for Multicore Haskell
"... Purely functional programs should run well on parallel hardware because of the absence of side effects, but it has proved hard to realise this potential in practice. Plenty of papers describe promising ideas, but vastly fewer describe real implementations with good wall-clock performance. We describ ..."
Abstract
-
Cited by 18 (5 self)
- Add to MetaCart
Purely functional programs should run well on parallel hardware because of the absence of side effects, but it has proved hard to realise this potential in practice. Plenty of papers describe promising ideas, but vastly fewer describe real implementations with good wall-clock performance. We describe just such an implementation, and quantitatively explore some of the complex design tradeoffs that make such implementations hard to build. Our measurements are necessarily detailed and specific, but they are reproducible, and we believe that they offer some general insights. 1.
Procs and Locks: A Portable Multiprocessing Platform for Standard ML of New Jersey
, 2000
"... This paper describes the platform's design, implementation, and performance. ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
This paper describes the platform's design, implementation, and performance.
Threads Yield Continuations
- Lisp and Symbolic Computation
, 1997
"... . Just as a traditional continuation represents the rest of a computation from a given point in the computation, a subcontinuation represents the rest of a subcomputation from a given point in the subcomputation. Subcontinuations are more expressive than traditional continuations and have been show ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
. Just as a traditional continuation represents the rest of a computation from a given point in the computation, a subcontinuation represents the rest of a subcomputation from a given point in the subcomputation. Subcontinuations are more expressive than traditional continuations and have been shown to be useful for controlling tree-structured concurrency, yet they have previously been implemented only on uniprocessors. This article describes a concurrent implementation of one-shot subcontinuations. Like oneshot continuations, one-shot subcontinuations are first-class but may be invoked at most once, a restriction obeyed by nearly all programs that use continuations. The techniques used to implement one-shot subcontinuations may be applied directly to other one-shot continuation mechanisms and may be generalized to support multi-shot continuations as well. A novel feature of the implementation is that continuations are implemented in terms of threads. Because the implementation model ...
Abstract Lightweight Concurrency Primitives for GHC
"... The Glasgow Haskell Compiler (GHC) has quite sophisticated support for concurrency in its runtime system, which is written in lowlevel C code. As GHC evolves, the runtime system becomes increasingly complex, error-prone, difficult to maintain and difficult to add new concurrency features. This paper ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
The Glasgow Haskell Compiler (GHC) has quite sophisticated support for concurrency in its runtime system, which is written in lowlevel C code. As GHC evolves, the runtime system becomes increasingly complex, error-prone, difficult to maintain and difficult to add new concurrency features. This paper presents an alternative approach to implement concurrency in GHC. Rather than hard-wiring all kinds of concurrency features, the runtime system is a thin substrate providing only a small set of concurrency primitives, and the remaining concurrency features are implemented in software libraries written in Haskell. This design improves the safety of concurrency support; it also provides more customizability of concurrency features, which can be developed as Haskell library packages and deployed modularly. Categories and Subject Descriptors D.1.1 [Programming Techniques]:
Virtual Topologies: A New Concurrency Abstraction for High-Level Parallel Languages
- DIMACS Workshop on Interconnection Networks and Mapping and Scheduling Parallel Computations
, 1994
"... ion for High-Level Parallel Languages (Preliminary Report) James Philbin 1 , Suresh Jagannathan 1 , Rajiv Mirani 2 1 Computer Science Division, NEC Research Institute, 4 Independence Way, fphilbin|sureshg@research.nj.nec.com 2 Department of Computer Science, Yale University, New Haven, ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
ion for High-Level Parallel Languages (Preliminary Report) James Philbin 1 , Suresh Jagannathan 1 , Rajiv Mirani 2 1 Computer Science Division, NEC Research Institute, 4 Independence Way, fphilbin|sureshg@research.nj.nec.com 2 Department of Computer Science, Yale University, New Haven, CT mirani@cs.yale.edu Abstract. We present a new concurrency abstraction and implementation technique for high-level (symbolic) parallel languages that allows significant programmer control over load-balancing and mapping of fine-grained lightweight threads. Central to our proposal is the notion of a virtual topology. A virtual topology defines a relation over a collection of virtual processors, and a mapping of those processors to a set of physical processors; processor topologies configured as trees, graphs, butterflies, and meshes are some well-known examples. A virtual topology need not have any correlation with a physical one; it is intended to capture the interconnection stru...
A Portable Multiprocessor Interface for Standard ML of New Jersey
- Carnegie Mellon University
, 1992
"... We have designed a portable interface between shared-memory multiprocessors and Standard ML of New Jersey. The interface is based on the conventional kernel thread model and provides facilities that can be used to implement user-level thread packages. The interface supports experimentation with diff ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We have designed a portable interface between shared-memory multiprocessors and Standard ML of New Jersey. The interface is based on the conventional kernel thread model and provides facilities that can be used to implement user-level thread packages. The interface supports experimentation with different thread scheduling policies and synchronization constructs. It has been ported to three different multiprocessors and used to construct a general purpose, user-level thread package. In this paper, we discuss the interface and its implementation and performance, with emphasis on the Silicon Graphics 4D/380S multiprocessor. 1 Supported in part by a National Science Foundation Graduate Fellowship. 2 Supported in part by NSF grant CCR-9002786. This research was sponsored in part by the Defense Advanced Research Projects Agency, CSTO, under the title "The Fox Project: Advanced Development of Systems Software", ARPA Order No. 8313, issued by ESD/AVS under Contract No. F19628-91-C-0168. Th...
TS/Scheme: Distributed Data Structures in Lisp
, 1994
"... . We describe a parallel object-oriented dialect of Scheme called ts/scheme that provides a simple and expressive interface for building asynchronous parallel programs. The main component in ts/scheme's coordination framework is an abstraction that serves the role of a distributed data structure. D ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
. We describe a parallel object-oriented dialect of Scheme called ts/scheme that provides a simple and expressive interface for building asynchronous parallel programs. The main component in ts/scheme's coordination framework is an abstraction that serves the role of a distributed data structure. Distributed data structures are an extension of conventional data structures insofar as many tasks may simultaneously access and update their contents according to a well-defined serialization protocol. The semantics of these structures also specifies that consumers which attempt to access an as-of-yet undefined element are to block until a producer provides a value. ts/scheme permits the construction of two basic kinds of distributed data structures, those accessed by content, and those accessed by name. These structures can be further specialized and composed to yield a number of other synchronization abstractions. Our intention is to provide an efficient medium for expressing concurrency a...

