Results 1 - 10
of
103
Brazos: A Third Generation DSM System
- IN PROCEEDINGS OF THE 1ST USENIX WINDOWS NT SYMPOSIUM
, 1997
"... Brazos is a third generation distributed shared memory (DSM) system designed for x86 machines running Microsoft Windows NT 4.0. Brazos is unique among existing systems in its use of selective multicast, a software-only implementation of scope consistency, and several adaptive runtime performance tun ..."
Abstract
-
Cited by 67 (11 self)
- Add to MetaCart
Brazos is a third generation distributed shared memory (DSM) system designed for x86 machines running Microsoft Windows NT 4.0. Brazos is unique among existing systems in its use of selective multicast, a software-only implementation of scope consistency, and several adaptive runtime performance tuning mechanisms. The Brazos runtime system is multithreaded, allowing the overlap of computation with the long communication latencies typically associated with software DSM systems. Brazos also supports multithreaded user-code execution, allowing programs to take advantage of the local tightly-coupled shared memory available on multiprocessor PC servers, while transparently interacting with remote "virtual" shared memory. Brazos currently runs on a cluster of Compaq Proliant 1500 multiprocessor servers connected by a 100 Mbps FastEthernet. This paper describes the Brazos design and implementation, and compares its performance running five scientific applications to the performance of Solaris...
Application Restructuring and Performance Portability on Shared Virtual Memory and Hardware-Coherent Multiprocessors
- In Proceedings of the 6th ACM Symposium on Principles and Practice of Parallel Programming
, 1997
"... The performance portability of parallel programs across a wide range of emerging coherent shared address space systems is not well understood. Programs that run well on efficient, hardware cache-coherent systems often do not perform well on less optimal or more commodity-based communication architec ..."
Abstract
-
Cited by 58 (17 self)
- Add to MetaCart
The performance portability of parallel programs across a wide range of emerging coherent shared address space systems is not well understood. Programs that run well on efficient, hardware cache-coherent systems often do not perform well on less optimal or more commodity-based communication architectures. This paper studies this issue of performance portability, with the commodity communication architecture of interest being page-grained shared virtual memory. We begin with applications that perform well on moderate-scale hardware cache-coherent systems, and find that they do not do so well on SVM systems. Then, we examine whether and how the applications can be improved for SVM systems ---through data structuring or algorithmic enhancements---and the nature and difficulty of the optimizations. Finally, we examine the impact of the successful optimizations on hardware-coherent platforms themselves, to see whether they are helpful, harmful or neutral on those platforms. We develop a sys...
Software DSM Protocols that Adapt between Single Writer and Multiple Writer
, 1997
"... We present two software DSM protocols that dynamically adapt between a single writer (SW) and a multiple writer (MW) protocol based on the application 's sharing patterns. The first protocol (WFS) adapts based on write-write false sharing; the second (WFS+WG) based on a combination of write-write f ..."
Abstract
-
Cited by 52 (6 self)
- Add to MetaCart
We present two software DSM protocols that dynamically adapt between a single writer (SW) and a multiple writer (MW) protocol based on the application 's sharing patterns. The first protocol (WFS) adapts based on write-write false sharing; the second (WFS+WG) based on a combination of write-write false sharing and write-granularity. The adaptation is automatic. No user or compiler information is needed. The choice between SW and MW is made on a perpage basis. We measured the performance of our adaptive protocols on an 8-node SPARC cluster connected by a 155 Mbps ATM network. We used 8 applications, covering a broad spectrum in terms of write-write false sharing and write granularity. We compare our adaptive protocols against the TreadMarks MW-only approach and the CVM SW-only approach. Adaptation to writewrite false sharing proves to be the critical performance factor, while adaptation to write-granularity plays only a secondary role in our environment and for the applications conside...
Home-based Shared Virtual Memory
, 1998
"... In this dissertation, I investigate how to improve the performance of shared virtual memory (SVM) by examining consistency models, protocols, hardware support and applications. The main conclusion of this research is that the performance of shared virtual memory can be significantly improved when pe ..."
Abstract
-
Cited by 51 (4 self)
- Add to MetaCart
In this dissertation, I investigate how to improve the performance of shared virtual memory (SVM) by examining consistency models, protocols, hardware support and applications. The main conclusion of this research is that the performance of shared virtual memory can be significantly improved when performance-enhancing techniques from all these areas are combined. This dissertation proposes home-based lazy release consistency as a simple, effective, and scalable way to build shared virtual memory systems. In home-based protocols each shared page has a home to which all writes are propagated and from which all copies are derived. Two home-based protocols are described, implemented and evaluated on two hardware and software platforms: Automatic Update Release Consistency (AURC), which requires hardware support for fine-grained remote writes (automatic updates), and Homebased Lazy Release Consistency (HLRC), which is implemented exclusively in software. The dissertation investigates the ...
Relaxed Consistency and Coherence Granularity in DSM Systems: A Performance Evaluation
- In Proceedings of the 6th ACM Symposium on Principles and Practice of Parallel Programming
, 1997
"... During the past few years, two main approaches have been taken to improve the performance of software shared memory implementations: relaxing consistency models and providing fine-grained access control. Their performance tradeoffs, however, are not well understood. This paper studies these tradeoff ..."
Abstract
-
Cited by 43 (6 self)
- Add to MetaCart
During the past few years, two main approaches have been taken to improve the performance of software shared memory implementations: relaxing consistency models and providing fine-grained access control. Their performance tradeoffs, however, are not well understood. This paper studies these tradeoffs on a platform that provides access control in hardware but runs coherence protocols in software. We compare the performance of three protocols across four coherence granularities, using 12 applications on a 16-node cluster of workstations. Our results show that no single combination of protocol and granularity performs best for all the applications. The combination of a sequentially consistent (SC) protocol and fine granularity works well with 7 of the 12 applications. The combination of a multiple-writer, homebased lazy release consistency (HLRC) protocol and page granularity works well with 8 out of the 12 applications. For applications that suer performance losses in moving to coarser granularity under sequential consistency, the performance can usually be regained quite effectively using relaxed protocols, particularly HLRC. We also find that the HLRC protocol performs substantially better than a single-writer lazy release consistent (SW-LRC) protocol at coarse granularity for many irregular applications. For our applications and platform, when we use the original versions of the applications ported directly from hardware-coherent shared memory, we nd that the SC protocol with 256-byte granularity performs best on average. However, when the best versions of the applications are compared, the balance shifts in favor of HLRC at page granularity.
Fine-Grain Software Distributed Shared Memory on SMP Clusters
, 1997
"... Commercial SMP nodes are an attractive building block for software distributed shared memory systems. The advantages of using SMP nodes include fast communication among processors within the same SMP node and potential gains from clustering where remote data fetched by one processor is used by o ..."
Abstract
-
Cited by 41 (4 self)
- Add to MetaCart
Commercial SMP nodes are an attractive building block for software distributed shared memory systems. The advantages of using SMP nodes include fast communication among processors within the same SMP node and potential gains from clustering where remote data fetched by one processor is used by other processors on the same node. The widespread availability of SMP servers with small numbers of processors has led several researchers to consider their use as building blocks for Shared Virtual Memory (SVM) systems. These systems exploit the SMP cache-coherence hardware to support fine-grain communication within a node, and use software to support communication across nodes at a coarser page-size granularity. Our goal is to explore the use of SMP nodes in the context of the Shasta system. A unique aspect of Shasta compared to most other software distributed shared memory systems is that shared data can be kept coherent at a fine granularity. Shasta implements this coherence by i...
Home-based SVM protocols for SMP clusters: Design and Performance
, 1998
"... As small-scale shared memory multiprocessors proliferate in the market, it is very attractive to construct large-scale systems by connecting smaller multiprocessors together in software using efficient commoditynetwork interfaces and networks. Using a shared virtual memory (SVM) layer for this purpo ..."
Abstract
-
Cited by 40 (11 self)
- Add to MetaCart
As small-scale shared memory multiprocessors proliferate in the market, it is very attractive to construct large-scale systems by connecting smaller multiprocessors together in software using efficient commoditynetwork interfaces and networks. Using a shared virtual memory (SVM) layer for this purpose preserves the attractive shared memory programming abstraction across nodes. In this paper: ffl We describe home-based SVM protocols that support symmetric multiprocessor (SMP) nodes, taking advantage of the intra-node hardware cache coherence and synchronization mechanisms. Our protocols take no special advantage of the network interface and network except as a fast communication link, and as suchare very portable. Wepresentthekey design tradeoffs, discuss our choices, and describe key data structures that enable us to implement these choices quite simply.
Using Network Interface Support to Avoid Asynchronous Protocol Processing in Shared Virtual Memory Systems
- In Proceedings of the 26th International Symposium on Computer Architecture
, 1999
"... The performance of page-based software shared virtual memory (SVM) is still far from that achieved on hardwarecoherent distributed shared memory (DSM) systems. The interrupt cost for asynchronous protocol processing has been found to be a key source of performance loss and complexity. This paper sho ..."
Abstract
-
Cited by 40 (7 self)
- Add to MetaCart
The performance of page-based software shared virtual memory (SVM) is still far from that achieved on hardwarecoherent distributed shared memory (DSM) systems. The interrupt cost for asynchronous protocol processing has been found to be a key source of performance loss and complexity. This paper shows that by providing simple and general support for asynchronous message handling in a commodity network interface (NI), and by altering SVM protocols appropriately, protocol activity can be decoupled from asynchronous message handling and the need for interrupts or polling can be eliminated. The NI mechanisms needed are generic, not SVM-dependent. They also require neither visibility into the node memory system nor code instrumentation to identify memory operations. We prototype the mechanisms and such a synchronous home-based LRC protocol, called GeNIMA (GEneral-purpose Network Interface support in a shared Memory Abstraction), on a cluster of SMPs with a programmable NI, though the mechan...
Shared Virtual Memory: Progress and Challenges
- Proceedings of the IEEE
, 1999
"... This paper is a survey of the first 12 years of research in SVM, placing the multi-track flow of ideas and results obtained so far in a comprehensive framework. The contributions indicated in Figure 1 are classified in four categories, each belonging primarily to one layer: relaxed consistency model ..."
Abstract
-
Cited by 33 (4 self)
- Add to MetaCart
This paper is a survey of the first 12 years of research in SVM, placing the multi-track flow of ideas and results obtained so far in a comprehensive framework. The contributions indicated in Figure 1 are classified in four categories, each belonging primarily to one layer: relaxed consistency models, protocol laziness, architectural support, and applications and application-driven research. A section of the paper is devoted to each category. The last section discusses other important emerging issues related to SVM: the alternative of fine-grained software coherence, hybrid protocols that implement software shared memory across multiple hardware-coherent multiprocessors, and scalability. The paper summarizes comparative performance results from the literature, discusses their limitations, places existing protocols in a framework based on laziness, and identifies the lessons learned so far and some key outstanding questions.
MultiView and MilliPage -- fine-grain sharing in page-based DSMs
- IN PROCEEDINGS OF THE THIRD USENIX SYMPOSIUM ON OPERATING SYSTEM DESIGN AND IMPLEMENTATION
, 1999
"... ..."

