Results 1 - 10
of
106
Scope Consistency : A Bridge between Release Consistency and Entry Consistency
- In Proceedings of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures
, 1996
"... The large granularity of communication and coherence in shared virtual memory systems causes problems with false sharing and extra communication. Relaxed memory consistency models have been used to alleviate these problems, but at a cost in programming complexity. Release Consistency (RC) and Lazy R ..."
Abstract
-
Cited by 170 (12 self)
- Add to MetaCart
(Show Context)
The large granularity of communication and coherence in shared virtual memory systems causes problems with false sharing and extra communication. Relaxed memory consistency models have been used to alleviate these problems, but at a cost in programming complexity. Release Consistency (RC) and Lazy Release Consistency (LRC) are accepted to offer a reasonable tradeoff between performance and programming complexity. Entry Consistency (EC) offers a more relaxed consistency model, but it requires explicit association of shared data objects with synchronization variables. The programming burden of providing such associations can be substantial. This paper proposes a new consistency model for shared virtual memory, called Scope Consistency (ScC), which offers most of the potential performance advantages of the EC model without requiring explicit bindings between data and synchronization variables. Instead, ScC dynamically detects the bindings implied by the programmer allowing a programming i...
Brazos: A Third Generation DSM System
- IN PROCEEDINGS OF THE 1ST USENIX WINDOWS NT SYMPOSIUM
, 1997
"... Brazos is a third generation distributed shared memory (DSM) system designed for x86 machines running Microsoft Windows NT 4.0. Brazos is unique among existing systems in its use of selective multicast, a software-only implementation of scope consistency, and several adaptive runtime performance tun ..."
Abstract
-
Cited by 80 (10 self)
- Add to MetaCart
Brazos is a third generation distributed shared memory (DSM) system designed for x86 machines running Microsoft Windows NT 4.0. Brazos is unique among existing systems in its use of selective multicast, a software-only implementation of scope consistency, and several adaptive runtime performance tuning mechanisms. The Brazos runtime system is multithreaded, allowing the overlap of computation with the long communication latencies typically associated with software DSM systems. Brazos also supports multithreaded user-code execution, allowing programs to take advantage of the local tightly-coupled shared memory available on multiprocessor PC servers, while transparently interacting with remote "virtual" shared memory. Brazos currently runs on a cluster of Compaq Proliant 1500 multiprocessor servers connected by a 100 Mbps FastEthernet. This paper describes the Brazos design and implementation, and compares its performance running five scientific applications to the performance of Solaris...
The Architectural Design of Globe: A Wide-Area Distributed System
, 1997
"... . Developing large-scale wide-area applications requires an infrastructure that is presently lacking entirely. Currently, applications have to be built on top of raw communication services, such as TCP connections. All additional services, including those for naming, replication, migration, persiste ..."
Abstract
-
Cited by 69 (8 self)
- Add to MetaCart
. Developing large-scale wide-area applications requires an infrastructure that is presently lacking entirely. Currently, applications have to be built on top of raw communication services, such as TCP connections. All additional services, including those for naming, replication, migration, persistence, fault tolerance, and security, have to be implemented for each application anew. Not only is this a waste of effort, it also makes interoperability between different applications difficult or even impossible. We present a novel, object-based framework for developing wide-area distributed applications. The framework is based on the concept of a distributed shared object, which has the characteristic feature that its state can be physically distributed across multiple machines at the same time. All implementation aspects, including communication protocols, replication strategies, and distribution and migration of state, are part of an object and are hidden behind its interface. The curren...
The Mungi single-address-space operating system
- Software— Practice and Experience
, 1998
"... Abstract Single-address-space operating systems (SASOS) are an attractive model for making the best use of the wide address space provided by the latest generations of microprocessors. SASOS remove the address space boundaries which make data sharing between processes difficult and expensive in trad ..."
Abstract
-
Cited by 65 (17 self)
- Add to MetaCart
(Show Context)
Abstract Single-address-space operating systems (SASOS) are an attractive model for making the best use of the wide address space provided by the latest generations of microprocessors. SASOS remove the address space boundaries which make data sharing between processes difficult and expensive in traditional operating systems. They offer the potential of significant performance advantages for applications where sharing is important, such as object-oriented databases or persistent programming systems. We have built the Mungi system to demonstrate that a SASOS can offer these performance advantages without resorting to special hardware. Mungi is a very "pure " SASOS, featuring an unintrusive protection model based on sparse capabilities, a fast protected procedure call mechanism, and uses shared memory as the exclusive inter-process communication mechanism, as well as for I/O. The simplicity of our model makes it easy to implement it efficiently on conventional architectures.
Home-based Shared Virtual Memory
, 1998
"... In this dissertation, I investigate how to improve the performance of shared virtual memory (SVM) by examining consistency models, protocols, hardware support and applications. The main conclusion of this research is that the performance of shared virtual memory can be significantly improved when pe ..."
Abstract
-
Cited by 57 (4 self)
- Add to MetaCart
In this dissertation, I investigate how to improve the performance of shared virtual memory (SVM) by examining consistency models, protocols, hardware support and applications. The main conclusion of this research is that the performance of shared virtual memory can be significantly improved when performance-enhancing techniques from all these areas are combined. This dissertation proposes home-based lazy release consistency as a simple, effective, and scalable way to build shared virtual memory systems. In home-based protocols each shared page has a home to which all writes are propagated and from which all copies are derived. Two home-based protocols are described, implemented and evaluated on two hardware and software platforms: Automatic Update Release Consistency (AURC), which requires hardware support for fine-grained remote writes (automatic updates), and Homebased Lazy Release Consistency (HLRC), which is implemented exclusively in software. The dissertation investigates the ...
Design of the Munin Distributed Shared Memory System
"... Software distributed shared memory (DSM) is a software abstraction of shared memory on a distributed memory machine. The key problem in building an e cient DSM system is to reduce the amount of communication needed to keep the distributed memories consistent. The Munin DSM system incorporates a numb ..."
Abstract
-
Cited by 52 (1 self)
- Add to MetaCart
(Show Context)
Software distributed shared memory (DSM) is a software abstraction of shared memory on a distributed memory machine. The key problem in building an e cient DSM system is to reduce the amount of communication needed to keep the distributed memories consistent. The Munin DSM system incorporates a number of novel techniques for doing so, including the use of multiple consistency protocols and support for multiple concurrent writer protocols. Due to these, and other, features, Munin is able to achieve high performance on a variety of numerical applications This paper contains a detailed description of the design and implementation of the Munin prototype, with special emphasis given to its novel write shared protocol. Furthermore, it describes a number of lessons that we learned from our experience with the prototype implementation that are relevant to the implementation of future DSMs.
Distributed Shared Memory: Where we are and where . . .
"... It has been almost ten years since the birth of the first distributed shared memory (DSM) system, Ivy. While significant progress has been made in the area of improving the performance of DSM and DSM has been the focus of several dozen PhD theses, its overall impact on "real" users and app ..."
Abstract
-
Cited by 47 (1 self)
- Add to MetaCart
It has been almost ten years since the birth of the first distributed shared memory (DSM) system, Ivy. While significant progress has been made in the area of improving the performance of DSM and DSM has been the focus of several dozen PhD theses, its overall impact on "real" users and applications has been small. The goal of this paper is to present our position on what remains to be done before DSM will have a significant impact on real applications. More specifically, we reflect on what we believe have been the major advances in the area, what the important outstanding problems are, and what work needs to be done. Finally, we describe amodest step towards solving these problems, the Quarks DSM system.
MultiView and MilliPage -- fine-grain sharing in page-based DSMs
- IN PROCEEDINGS OF THE THIRD USENIX SYMPOSIUM ON OPERATING SYSTEM DESIGN AND IMPLEMENTATION
, 1999
"... ..."
(Show Context)
Shared Virtual Memory: Progress and Challenges
- Proceedings of the IEEE
, 1999
"... This paper is a survey of the first 12 years of research in SVM, placing the multi-track flow of ideas and results obtained so far in a comprehensive framework. The contributions indicated in Figure 1 are classified in four categories, each belonging primarily to one layer: relaxed consistency model ..."
Abstract
-
Cited by 42 (4 self)
- Add to MetaCart
(Show Context)
This paper is a survey of the first 12 years of research in SVM, placing the multi-track flow of ideas and results obtained so far in a comprehensive framework. The contributions indicated in Figure 1 are classified in four categories, each belonging primarily to one layer: relaxed consistency models, protocol laziness, architectural support, and applications and application-driven research. A section of the paper is devoted to each category. The last section discusses other important emerging issues related to SVM: the alternative of fine-grained software coherence, hybrid protocols that implement software shared memory across multiple hardware-coherent multiprocessors, and scalability. The paper summarizes comparative performance results from the literature, discusses their limitations, places existing protocols in a framework based on laziness, and identifies the lessons learned so far and some key outstanding questions.
Sensitivity of Parallel Applications to Large Differences in Bandwidth and Latency in Two-Layer Interconnects
- IN FIFTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE
, 1999
"... This paper studies application performance on systems with strongly non-uniform remote memory access. In current generation NUMAs the speed difference between the slowest and fastest link in an interconnect---the "NUMA gap"---is typically less than an order of magnitude, and many convent ..."
Abstract
-
Cited by 40 (11 self)
- Add to MetaCart
This paper studies application performance on systems with strongly non-uniform remote memory access. In current generation NUMAs the speed difference between the slowest and fastest link in an interconnect---the "NUMA gap"---is typically less than an order of magnitude, and many conventional parallel programs achieve good performance. We study how different NUMA gaps influence application performance, up to and including typical wide-area latencies and bandwidths. We find that for gaps larger than those of current generation NUMAs, performance suffers considerably (for applications that were designed for a uniform access interconnect). For many applications, however, performance can be greatly improved with comparatively simple changes: traffic over slow links can be reduced by making communication patterns hierarchical---like the interconnect. We find that in four out of our six applications the size of the gap can be increased by an order of magnitude or more without severel...