Results 1 - 10
of
30
Application-Specific Protocols for User-Level Shared Memory
- In Proceedings of Supercomputing '94
, 1994
"... Recent distributed shared memory (DSM) systems and proposed shared-memory machines have implemented some or all of their cache coherence protocols in software. One way to exploit the flexibility of this software is to tailor a coherence protocol to match an application's communication patterns and m ..."
Abstract
-
Cited by 84 (24 self)
- Add to MetaCart
Recent distributed shared memory (DSM) systems and proposed shared-memory machines have implemented some or all of their cache coherence protocols in software. One way to exploit the flexibility of this software is to tailor a coherence protocol to match an application's communication patterns and memory semantics. This paper presents evidence that this approach can lead to large performance improvements. It shows that application-specific protocols substantially improved the performance of three application programs---appbt, em3d, and barnes---over carefully tuned transparent shared memory implementations. The speed-ups were obtained on Blizzard, a fine-grained DSM system running on a 32-node Thinking Machines CM-5. 1 Introduction A shared address space is central to many parallel languages and models of parallel computation. It provides the global names for data that enable a proces- This work is supported in part by NSF PYI/NYI Awards MIP-8957278, CCR-9157366, and CCR-9357779,...
Dag-consistent distributed shared memory
- IN PROCEEDINGS OF THE 10TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM (IPPS
, 1996
"... We introduce dag consistency, a relaxed consistency model for distributed shared memory which is suitable for multithreaded programming. We have implemented dag-consistency in software for the Cilk multithreaded runtime system running on a Connection Machine CM5. Our implementation includes a dag-co ..."
Abstract
-
Cited by 57 (13 self)
- Add to MetaCart
We introduce dag consistency, a relaxed consistency model for distributed shared memory which is suitable for multithreaded programming. We have implemented dag-consistency in software for the Cilk multithreaded runtime system running on a Connection Machine CM5. Our implementation includes a dag-consistent distributed cactus stack for storage allocation. We provide empirical evidence of the flexibility and efficiency of dag consistency for applications that include blocked matrix multiplication, Strassen’s matrix multiplication algorithm, and a Barnes-Hut code. Although Cilk schedules the executions of these programs dynamically, their performances are competitive with statically scheduled implementations in the literature. We also prove that the number FP of page faults incurred by a user program running onPprocessors can be related to the numberF1of page faults running serially by the formula FP F1+2Cs, where C is the cache size andsis the number of thread migrations executed by Cilk’s scheduler.
Teapot: Language Support for Writing Memory . . .
"... Recent shared-memory parallel computer systems offer the exciting possibility of customizing memory coherence protocols to fit an application's semantics and sharing patterns. Custom protocols have been used to achieve message-passing performance---while retaining the convenient programming model of ..."
Abstract
-
Cited by 53 (6 self)
- Add to MetaCart
Recent shared-memory parallel computer systems offer the exciting possibility of customizing memory coherence protocols to fit an application's semantics and sharing patterns. Custom protocols have been used to achieve message-passing performance---while retaining the convenient programming model of a global address space---and to implement high-level language constructs. Unfortunately, coherence protocols written in a conventional language such as C are difficult to write, debug, understand, or modify. This paper describes Teapot, a small, domain-specific language for writing coherence protocols. Teapot uses continuations to help reduce the complexity of writing protocols. Simple static analysis in the Teapot compiler eliminates much of the overhead of continuations and results in protocols that run nearly as fast as hand-written C code. A Teapot specification can be compiled both to an executable coherence protocol and to input for a model checking system, which permits the specifica...
The Cilk System for Parallel Multithreaded Computing
, 1996
"... Although cost-effective parallel machines are now commercially available, the widespread use of parallel processing is still being held back, due mainly to the troublesome nature of parallel programming. In particular, it is still diiticult to build eiticient implementations of parallel applications ..."
Abstract
-
Cited by 39 (1 self)
- Add to MetaCart
Although cost-effective parallel machines are now commercially available, the widespread use of parallel processing is still being held back, due mainly to the troublesome nature of parallel programming. In particular, it is still diiticult to build eiticient implementations of parallel applications whose communication patterns are either highly irregular or dependent upon dynamic information. Multithreading has become an increasingly popular way to implement these dynamic, asynchronous, concurrent programs. Cilk (pronounced "silk") is our C-based multithreaded computing system that provides provably good performance guarantees. This thesis describes the evolution of the Cilk language and runtime system, and describes applications which affected the evolution of the system.
Tempest: A Substrate for Portable Parallel Programs
- In COMPCON '95
, 1995
"... This paper describes Tempest, a collection of mechanisms for communication and synchronization in parallel programs. With these mechanisms, authors of compilers, libraries, and application programs can exploit---across a wide range of hardware platforms---the best of shared memory, message passing, ..."
Abstract
-
Cited by 36 (11 self)
- Add to MetaCart
This paper describes Tempest, a collection of mechanisms for communication and synchronization in parallel programs. With these mechanisms, authors of compilers, libraries, and application programs can exploit---across a wide range of hardware platforms---the best of shared memory, message passing, and hybrid combinations of the two. Because Tempest provides mechanisms, not policies, programmers can tailor communication to a program's sharing pattern and semantics, rather than restructuring the program to run with the limited communication options offered by existing parallel machines. And since the mechanisms are easily supported on different machines, Tempest provides a portable interface across platforms. This paper describes the Tempest mechanisms, briefly explains how they are used, outlines several implementations on both custom and stock hardware, and presents preliminary performance results that demonstrate the benefits of this approach. 1 Introduction Uniprocessor computers f...
Fine-Grain Distributed Shared Memory on Clusters of Workstations
, 1997
"... Shared memory, one of the most popular models for programming parallel platforms, is becoming ubiquitous both in low-end workstations and high-end servers. With the advent of low-latency networking hardware, clusters of workstations strive to offer the same processing power as high-end servers for a ..."
Abstract
-
Cited by 30 (8 self)
- Add to MetaCart
Shared memory, one of the most popular models for programming parallel platforms, is becoming ubiquitous both in low-end workstations and high-end servers. With the advent of low-latency networking hardware, clusters of workstations strive to offer the same processing power as high-end servers for a fraction of the cost. In such environments, shared memory has been limited to page-based systems that control access to shared memory using the memory's page protection to implement shared memory coherence protocols. Unfortunately, false sharing and fragmentation problems force such systems to resort to weak consistency shared memory models that complicate the shared memory programming model.
Experience with a Language for Writing Coherence Protocols
, 1997
"... In this paper we describe our experience with Teapot [7], a domain-specific language for addressing the cache coherence problem. The cache coherence problem arises when parallel and distributed computing systems make local replicas of shared data for reasons of scalability and performance. ..."
Abstract
-
Cited by 24 (7 self)
- Add to MetaCart
In this paper we describe our experience with Teapot [7], a domain-specific language for addressing the cache coherence problem. The cache coherence problem arises when parallel and distributed computing systems make local replicas of shared data for reasons of scalability and performance.
Portable High-Performance Programs
, 1999
"... right notice and this permission notice are preserved on all copies. ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
right notice and this permission notice are preserved on all copies.
A Compiler-Directed Distributed Shared Memory System
- in Proceedings of the 9th International Conference on SuperComputing
, 1995
"... With the advent of high-performance microprocessors and high-speed local area networks, networks of workstations (NOW) are now capable of delivering sustained computational performance comparable to that of a supercomputer. Distributed shared virtual memory (DSVM) offers a programming abstraction th ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
With the advent of high-performance microprocessors and high-speed local area networks, networks of workstations (NOW) are now capable of delivering sustained computational performance comparable to that of a supercomputer. Distributed shared virtual memory (DSVM) offers a programming abstraction that hides the underlying networking details and thus greatly improves parallel programming productivity on a NOW environment. This paper describes the architecture and design of a novel DSVM system called Locust that can effectively take advantage of the data dependency information gathered at compile time to reduce performance overhead. Locust features a generation-based cache coherence scheme that greatly minimizes unnecessary coherence traffic. It supports object granularity cache coherence, thus eliminating the false sharing problem. In addition, Locust exploits data-dependency information to support producer-initiated data transfer and affinity scheduling. Finally, Locust provides a ...

