Results 1 - 10
of
24
Performance Evaluation of Two Home-Based Lazy Release Consistency Protocols for Shared Virtual Memory Systems
- In Proceedings of the Operating Systems Design and Implementation Symposium
, 1996
"... This paper investigates the performance of shared virtual memory protocols on large-scale multicomputers. Using experiments on a 64-node Paragon, we show that the traditional Lazy Release Consistency (LRC) protocol does not scale well, because of the large number of messages it requires, the large a ..."
Abstract
-
Cited by 146 (19 self)
- Add to MetaCart
This paper investigates the performance of shared virtual memory protocols on large-scale multicomputers. Using experiments on a 64-node Paragon, we show that the traditional Lazy Release Consistency (LRC) protocol does not scale well, because of the large number of messages it requires, the large amount of memory it consumes for protocol overhead data, and because of the diculty of garbage collecting that data. To achieve more scalable performance, we introduce and evaluate two new protocols. The rst, Home-based LRC (HLRC), is based on the Automatic Update Release Consistency (AURC) protocol. Like AURC, HLRC maintains a home for each page to which all updates are propagated and from which all copies are derived. Unlike AURC, HLRC requires no specialized hardware support. We nd that the use of homes provides substantial improvements in performance and scalability over LRC. Our second protocol, called Overlapped Home-based LRC (OHLRC), takes advantage of the communication processor found on each node of the Paragon to ooad some of the protocol overhead of HLRC from the critical path followed by the compute processor. We nd that OHLRC provides modest improvements over HLRC. We also apply overlapping to the base LRC protocol, with similar results. Our experiments were done using ve of the Splash-2 benchmarks. We report overall execution times, as well as detailed breakdowns of elapsed time, message trac, and memory use for each of the protocols. 1
Scope Consistency : A Bridge between Release Consistency and Entry Consistency
- In Proceedings of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures
, 1996
"... The large granularity of communication and coherence in shared virtual memory systems causes problems with false sharing and extra communication. Relaxed memory consistency models have been used to alleviate these problems, but at a cost in programming complexity. Release Consistency (RC) and Lazy R ..."
Abstract
-
Cited by 135 (12 self)
- Add to MetaCart
The large granularity of communication and coherence in shared virtual memory systems causes problems with false sharing and extra communication. Relaxed memory consistency models have been used to alleviate these problems, but at a cost in programming complexity. Release Consistency (RC) and Lazy Release Consistency (LRC) are accepted to offer a reasonable tradeoff between performance and programming complexity. Entry Consistency (EC) offers a more relaxed consistency model, but it requires explicit association of shared data objects with synchronization variables. The programming burden of providing such associations can be substantial. This paper proposes a new consistency model for shared virtual memory, called Scope Consistency (ScC), which offers most of the potential performance advantages of the EC model without requiring explicit bindings between data and synchronization variables. Instead, ScC dynamically detects the bindings implied by the programmer allowing a programming i...
VMMC-2: Efficient Support for Reliable, Connection-Oriented Communication
- IN PROCEEDINGS OF HOT INTERCONNECTS
, 1997
"... The basic virtual memory-mapped communication (VMMC) model provides protected, direct communication between the sender's and receiver's virtual address spaces, but it does not support high-level connection-oriented communication APIs well. This paper presents VMMC-2, an extension to the basic VMMC.W ..."
Abstract
-
Cited by 71 (18 self)
- Add to MetaCart
The basic virtual memory-mapped communication (VMMC) model provides protected, direct communication between the sender's and receiver's virtual address spaces, but it does not support high-level connection-oriented communication APIs well. This paper presents VMMC-2, an extension to the basic VMMC.We describe the design, implementation, and evaluate the performance of three mechanisms in VMMC-2: (1) a user-managed TLB mechanism for address translation which enables user libraries to dynamically manage the amount of pinned space and requires only driver support from many operating systems# (2) a transfer redirection mechanism whichavoids copying on the receiver 's side# (3) a reliable communication protocol at the data link layer whichavoids copying on the sender's side. Tovalidate our extensions we implemented stream sockets on top of the VMMC-2 running on a Myrinet network of Pentium PCs. This zero-copysockets implementation provides a maximum bandwidth of over 84 Mbytes/s and a one-way latency of 20 µs.
Home-based Shared Virtual Memory
, 1998
"... In this dissertation, I investigate how to improve the performance of shared virtual memory (SVM) by examining consistency models, protocols, hardware support and applications. The main conclusion of this research is that the performance of shared virtual memory can be significantly improved when pe ..."
Abstract
-
Cited by 51 (4 self)
- Add to MetaCart
In this dissertation, I investigate how to improve the performance of shared virtual memory (SVM) by examining consistency models, protocols, hardware support and applications. The main conclusion of this research is that the performance of shared virtual memory can be significantly improved when performance-enhancing techniques from all these areas are combined. This dissertation proposes home-based lazy release consistency as a simple, effective, and scalable way to build shared virtual memory systems. In home-based protocols each shared page has a home to which all writes are propagated and from which all copies are derived. Two home-based protocols are described, implemented and evaluated on two hardware and software platforms: Automatic Update Release Consistency (AURC), which requires hardware support for fine-grained remote writes (automatic updates), and Homebased Lazy Release Consistency (HLRC), which is implemented exclusively in software. The dissertation investigates the ...
Using Network Interface Support to Avoid Asynchronous Protocol Processing in Shared Virtual Memory Systems
- In Proceedings of the 26th International Symposium on Computer Architecture
, 1999
"... The performance of page-based software shared virtual memory (SVM) is still far from that achieved on hardwarecoherent distributed shared memory (DSM) systems. The interrupt cost for asynchronous protocol processing has been found to be a key source of performance loss and complexity. This paper sho ..."
Abstract
-
Cited by 40 (7 self)
- Add to MetaCart
The performance of page-based software shared virtual memory (SVM) is still far from that achieved on hardwarecoherent distributed shared memory (DSM) systems. The interrupt cost for asynchronous protocol processing has been found to be a key source of performance loss and complexity. This paper shows that by providing simple and general support for asynchronous message handling in a commodity network interface (NI), and by altering SVM protocols appropriately, protocol activity can be decoupled from asynchronous message handling and the need for interrupts or polling can be eliminated. The NI mechanisms needed are generic, not SVM-dependent. They also require neither visibility into the node memory system nor code instrumentation to identify memory operations. We prototype the mechanisms and such a synchronous home-based LRC protocol, called GeNIMA (GEneral-purpose Network Interface support in a shared Memory Abstraction), on a cluster of SMPs with a programmable NI, though the mechan...
Experiences with VI Communication for Database Storage
- In Proceedings of the 29th annual international symposium on Computer architecture
, 2002
"... This paper examines how VI–based interconnects can be used to improve I/O path performance between a database server and the storage subsystem. We design and implement a software layer, DSA, that is layered between the application and VI. DSA takes advantage of specific VI features and deals with ma ..."
Abstract
-
Cited by 33 (9 self)
- Add to MetaCart
This paper examines how VI–based interconnects can be used to improve I/O path performance between a database server and the storage subsystem. We design and implement a software layer, DSA, that is layered between the application and VI. DSA takes advantage of specific VI features and deals with many of its shortcomings. We provide and evaluate one kernel–level and two user–level implementations of DSA. These implementations trade transparency and generality for performance at different degrees, and unlike research prototypes are designed to be suitable for real– world deployment. We present detailed measurements using a commercial database management system with both micro-benchmarks and industrial database workloads on a mid–size, 4 CPU, and a large, 32 CPU, database server. Our results show that VI–based interconnects and user– level communication can improve all aspects of the I/O path between the database system and the storage back-end. We also find that to make effective use of VI in I/O intensive environments we need to provide substantial additional functionality than what is currently provided by VI. Finally, new storage APIs that help minimize kernel involvement in the I/O path are needed to fully exploit the benefits of VI–based communication. 1
User-Space Communication: A Quantitative Study
, 1998
"... Powerful commodity systems and networks o#er a promising direction for high performance computing because they are inexpensive and they closely track technology progress. However, high, raw--hardware performance is rarely delivered to the end user. Previous work has shown that the bottleneck in thes ..."
Abstract
-
Cited by 31 (5 self)
- Add to MetaCart
Powerful commodity systems and networks o#er a promising direction for high performance computing because they are inexpensive and they closely track technology progress. However, high, raw--hardware performance is rarely delivered to the end user. Previous work has shown that the bottleneck in these architectures is the overheads imposed by the software communication layer. To reduce these overheads, researchers have proposed a number of user-space communication models. The common feature of these models is that applications have direct access to the network, bypassing the operating system in the common case and thus avoiding the cost of send/receive system calls. In this paper we examine five user--space communication layers, that represent di#erent points in the configuration space: Generic AM, BIP-0.92, FM-2.02, PM-1.2, and VMMC-2. Although these systems support di#erent communication paradigms and employ a variety of di#erent implementation tradeo#s, we are able to quantitatively...
Fast Cluster Failover Using Virtual Memory-Mapped Communication
- In Proc. 13th International Conference on Supercomputing
, 1999
"... This paper proposes a novel way to use virtual memory mapped communication (VMMC) to reduce the failover time on clusters. With the VMMC model, applications' virtual address space can be efficiently mirrored on remote memory either automatically or via explicit messages. When a machine fails, its ap ..."
Abstract
-
Cited by 25 (3 self)
- Add to MetaCart
This paper proposes a novel way to use virtual memory mapped communication (VMMC) to reduce the failover time on clusters. With the VMMC model, applications' virtual address space can be efficiently mirrored on remote memory either automatically or via explicit messages. When a machine fails, its applications can restart from the most recent checkpoints on the failover node with minimal memory copying and disk I/O overhead. This method requires little change to applications' source code. We developed two fast failover protocols: deliberate update failover protocol (DU) and automatic update failover protocol (AU). The rst can run on any system that supports VMMC, whereas the other requires special network interface support. We implemented these two protocols...
D-Stampede: Distributed Programming System for Ubiquitous Computing
- In Proceedings of the 22nd International Conference on Distributed Computing Systems(ICDCS
, 2002
"... We focus on an important problem in the space of ubiquitous computing, namely, programming support for the distributed heterogeneous computing elements that make up this environment. We address the interactive, dynamic, and stream-oriented nature of this application class and develop appropriate com ..."
Abstract
-
Cited by 22 (7 self)
- Add to MetaCart
We focus on an important problem in the space of ubiquitous computing, namely, programming support for the distributed heterogeneous computing elements that make up this environment. We address the interactive, dynamic, and stream-oriented nature of this application class and develop appropriate computational abstractions in the D-Stampede distributed programming system. The key features of DStampede include indexing data streams temporally, correlating different data streams temporally, performing automatic distributed garbage collection of unnecessary stream data, supporting high performance by exploiting hardware parallelism where available, supporting platform and language heterogeneity, and dealing with application level dynamism. We discuss the features of D-Stampede, the programming ease it affords, and its performance.
UTLB: A mechanism for address translation on network interfaces
- In Proceedings of the Eighth International Conference Architectural Support for Programming Languages and Operating Systems ASPLOS
, 1998
"... An important aspect of a high-speed network system is the ability to transfer data directly between the network interface and application buffers. Such a direct data path requires the network interface to “know ” the virtual-to-physical address translation of a user buffer, i.e., the physical memory ..."
Abstract
-
Cited by 20 (7 self)
- Add to MetaCart
An important aspect of a high-speed network system is the ability to transfer data directly between the network interface and application buffers. Such a direct data path requires the network interface to “know ” the virtual-to-physical address translation of a user buffer, i.e., the physical memory location of the buffer. This paper presents an efficient address translation architecture, User-managed TLB (UTLB), which eliminates system calls and device interrupts from the common communication path. UTLB also supports application-specific policies to pin and unpin application memory. We report micro-benchmark results for an implementation on Myrinet PC clusters. A trace-driven analysis is used to compare the UTLB approach with the interrupt-based approach. It is also used to study the effects of UTLB cache size, associativity, and prefetching. Our results show that the UTLB approach delivers robust performance with relatively small translation cache sizes. 1

