Results 1 - 10
of
41
Cilk: An Efficient Multithreaded Runtime System
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1995
"... Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "critical-path length" of a C ..."
Abstract
-
Cited by 430 (34 self)
- Add to MetaCart
Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "critical-path length" of a Cilk computation can be used to model performance accurately. Consequently, a Cilk programmer can focus on reducing the computation's work and critical-path length, insulated from load balancing and other runtime scheduling issues. We also prove that for the class of "fully strict" (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal. The Cilk
CRL: High-Performance All-Software Distributed Shared Memory
, 1995
"... This paper introduces the C Region Library (CRL), a new all-software distributed shared memory (DSM) system. CRL requires no special compiler, hardware, or operating system support beyond the ability to send and receive messages. It provides a simple, portable shared address space programming model ..."
Abstract
-
Cited by 191 (11 self)
- Add to MetaCart
This paper introduces the C Region Library (CRL), a new all-software distributed shared memory (DSM) system. CRL requires no special compiler, hardware, or operating system support beyond the ability to send and receive messages. It provides a simple, portable shared address space programming model that is capable of delivering good performance on a wide range of multiprocessor and distributed system architectures. We have developed CRL implementations for two platforms: the CM-5, a commercial multicomputer, and the MIT Alewife machine, an experimental multiprocessor offering efficient support for both message passing and shared memory. We present results for up to 128 processors on the CM-5 and up to 32 processors on Alewife. In a set of controlled experiments, we demonstrate that CRL is the first all-software DSM system capable of delivering performance competitive with hardware DSMs. CRL achieves speedups within 30% of those provided by Alewife's native support for shared memory, eve...
Supporting Dynamic Data Structures on Distributed-Memory Machines
, 1995
"... this article, we describe an execution model for supporting programs that use pointer-based dynamic data structures. This model uses a simple mechanism for migrating a thread of control based on the layout of heap-allocated data and introduces parallelism using a technique based on futures and lazy ..."
Abstract
-
Cited by 143 (8 self)
- Add to MetaCart
this article, we describe an execution model for supporting programs that use pointer-based dynamic data structures. This model uses a simple mechanism for migrating a thread of control based on the layout of heap-allocated data and introduces parallelism using a technique based on futures and lazy task creation. We intend to exploit this execution model using compiler analyses and automatic parallelization techniques. We have implemented a prototype system, which we call Olden, that runs on the Intel iPSC/860 and the Thinking Machines CM-5. We discuss our implementation and report on experiments with five benchmarks.
A Comparison of Architectural Support for Messaging in the TMC CM-5 and the Cray T3D
- In Proceedings of the International Symposium on Computer Architecture
, 1995
"... Programming models based on messaging continue to be an important programming model for parallel machines. Messaging costs are strongly influenced by a machine's network interface architecture. We examine the impact of architectural support for messaging in two machines -- the TMC CM-5 and the Cray ..."
Abstract
-
Cited by 55 (15 self)
- Add to MetaCart
Programming models based on messaging continue to be an important programming model for parallel machines. Messaging costs are strongly influenced by a machine's network interface architecture. We examine the impact of architectural support for messaging in two machines -- the TMC CM-5 and the Cray T3D -- by exploring the design and performance of several messaging implementations. The additional features in the T3D support remote operations: memory access, fetch-and-increment, atomic swaps, and prefetch. Experiments on the CM-5 show that requiring processor involvement for message reception can increase the communication overheads from 60% to 300% for moderate variations in computation grain size at the destination. In contrast, the T3D hardware for remote operations decouples message reception from processor activity, producing high-performance messaging independent of computation grain size or variability. In addition, hardware support for a shared address space in the T3D can be us...
ICC++ -- A C++ Dialect for High Performance Parallel Computing
- In Proceedings of the 2nd International Symposium on Object Technologies for Advanced Software
, 1996
"... ICC++ is a new C++ concurrent dialect which allows sequential/parallel program versions to be maintained with single source, the construction of concurrent data abstractions, convenient expression of irregular and fine-grained concurrency, and supports high performance implementations. ICC++ prov ..."
Abstract
-
Cited by 55 (10 self)
- Add to MetaCart
ICC++ is a new C++ concurrent dialect which allows sequential/parallel program versions to be maintained with single source, the construction of concurrent data abstractions, convenient expression of irregular and fine-grained concurrency, and supports high performance implementations. ICC++ provides annotations for potential concurrency, facilitating both sharing source with sequential programs and grain size tuning for efficient execution. ICC++ has a notion of object consistency which can be extended structurally and procedurally to implement larger data abstractions. Finally, ICC++ integrates arrays into the object system and hence the concurrency model. In short, ICC++ addresses concurrency and its relation to abstractions -- whether they are implemented by single objects, several objects, or object collections. The design of the language, its rationale, and current status are all described. Keywords concurrent object-oriented programming, concurrent languages, parallel...
Lazy Threads: Implementing a Fast Parallel Call
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1996
"... In this paper we describe lazy threads, a new approach for implementing multi-threaded execution models on conventional machines. We show how they can implement a parallel call at nearly the efficiency of a sequential call. The central idea is to specialize the representation of a parallel call so t ..."
Abstract
-
Cited by 50 (3 self)
- Add to MetaCart
In this paper we describe lazy threads, a new approach for implementing multi-threaded execution models on conventional machines. We show how they can implement a parallel call at nearly the efficiency of a sequential call. The central idea is to specialize the representation of a parallel call so that it can execute as a parallel-ready sequential call. This allows excess parallelism to degrade into sequential calls with the attendant efficient stack management and direct transfer of control and data, yet a call that truly needs to execute in parallel, gets its own thread of control. The efficiency of lazy threads is achieved through a careful attention to storage management and a code generation strategy that allows us to represent potential parallel work with no overhead.
Obtaining Sequential Efficiency for Concurrent Object-Oriented Languages
- In Proceedings of the ACM Symposium on the Principles of Programming Languages
, 1995
"... Concurrent object-oriented programming (COOP) languages focus the abstraction and encapsulation power of abstract data types on the problem of concurrency control. In particular, pure fine-grained concurrent object-oriented languages (as opposed to hybrid or data parallel) provides the programmer wi ..."
Abstract
-
Cited by 47 (15 self)
- Add to MetaCart
Concurrent object-oriented programming (COOP) languages focus the abstraction and encapsulation power of abstract data types on the problem of concurrency control. In particular, pure fine-grained concurrent object-oriented languages (as opposed to hybrid or data parallel) provides the programmer with a simple, uniform, and flexible model while exposing maximum concurrency. While such languages promise to greatly reduce the complexity of large-scale concurrent programming, the popularity of these languages has been hampered by efficiency which is often many orders of magnitude less than that of comparable sequential code. We present a sufficient set of techniques which enables the efficiency of fine-grained concurrent object-oriented languages to equal that of traditional sequential languages (like C) when the required data is available. These techniques are empirically validated by the application to a COOP implementation of the Livermore Loops. 1 Introduction The increasing use of ...
The Concert System -- Compiler and Runtime Support for Efficient, Fine-Grained Concurrent Object-Oriented Programs
, 1993
"... The introduction of concurrency complicates the already difficult task of large-scale programming. Concurrent object-oriented languages provide a mechanism, encapsulation, for managing the increased complexity of large-scale concurrent programs, thereby reducing the difficulty of large scale conc ..."
Abstract
-
Cited by 47 (12 self)
- Add to MetaCart
The introduction of concurrency complicates the already difficult task of large-scale programming. Concurrent object-oriented languages provide a mechanism, encapsulation, for managing the increased complexity of large-scale concurrent programs, thereby reducing the difficulty of large scale concurrent programming. In particular, fine-grained object-oriented approaches provide modularity through encapsulation while exposing large degrees of concurrency. Though fine-grained concurrent object-oriented languages are attractive from a programming perspective, they have historically suffered from poor efficiency. The goal of the Concert project is to develop portable, efficient implementations of finegrained concurrent object-oriented languages. Our approach incorporates careful program analysis and information management at every stage from the compiler to the runtime system. In this document, we outline the basic elements of the Concert approach. In particular, we discuss progr...
Efficient Java RMI for parallel programming
- ACM Transactions on Programming Languages and Systems (TOPLAS
, 2001
"... Java offers interesting opportunities for parallel computing. In particular, Java Remote Method Invocation (RMI) provides a flexible kind of remote procedure call (RPC) that supports polymorphism. Sun’s RMI implementation achieves this kind of flexibility at the cost of a major runtime overhead. The ..."
Abstract
-
Cited by 45 (12 self)
- Add to MetaCart
Java offers interesting opportunities for parallel computing. In particular, Java Remote Method Invocation (RMI) provides a flexible kind of remote procedure call (RPC) that supports polymorphism. Sun’s RMI implementation achieves this kind of flexibility at the cost of a major runtime overhead. The goal of this article is to show that RMI can be implemented efficiently, while still supporting polymorphism and allowing interoperability with Java Virtual Machines (JVMs). We study a new approach for implementing RMI, using a compiler-based Java system called Manta. Manta uses a native (static) compiler instead of a just-in-time compiler. To implement RMI efficiently, Manta exploits compile-time type information for generating specialized serializers. Also, it uses an efficient RMI protocol and fast low-level communication protocols. A difficult problem with this approach is how to support polymorphism and interoperability. One of the consequences of polymorphism is that an RMI implementation must be able to download remote classes into an application during runtime. Manta solves this problem by using a dynamic bytecode compiler, which is capable of compiling and linking bytecode into a running application. To allow interoperability with JVMs, Manta also implements the Sun RMI protocol (i.e., the standard RMI protocol), in addition to its own protocol.
The Cilk System for Parallel Multithreaded Computing
, 1996
"... Although cost-effective parallel machines are now commercially available, the widespread use of parallel processing is still being held back, due mainly to the troublesome nature of parallel programming. In particular, it is still diiticult to build eiticient implementations of parallel applications ..."
Abstract
-
Cited by 39 (1 self)
- Add to MetaCart
Although cost-effective parallel machines are now commercially available, the widespread use of parallel processing is still being held back, due mainly to the troublesome nature of parallel programming. In particular, it is still diiticult to build eiticient implementations of parallel applications whose communication patterns are either highly irregular or dependent upon dynamic information. Multithreading has become an increasingly popular way to implement these dynamic, asynchronous, concurrent programs. Cilk (pronounced "silk") is our C-based multithreaded computing system that provides provably good performance guarantees. This thesis describes the evolution of the Cilk language and runtime system, and describes applications which affected the evolution of the system.

