Results 1 - 10
of
21
Performance Evaluation of Two Home-Based Lazy Release Consistency Protocols for Shared Virtual Memory Systems
- In Proceedings of the Operating Systems Design and Implementation Symposium
, 1996
"... This paper investigates the performance of shared virtual memory protocols on large-scale multicomputers. Using experiments on a 64-node Paragon, we show that the traditional Lazy Release Consistency (LRC) protocol does not scale well, because of the large number of messages it requires, the large a ..."
Abstract
-
Cited by 146 (19 self)
- Add to MetaCart
This paper investigates the performance of shared virtual memory protocols on large-scale multicomputers. Using experiments on a 64-node Paragon, we show that the traditional Lazy Release Consistency (LRC) protocol does not scale well, because of the large number of messages it requires, the large amount of memory it consumes for protocol overhead data, and because of the diculty of garbage collecting that data. To achieve more scalable performance, we introduce and evaluate two new protocols. The rst, Home-based LRC (HLRC), is based on the Automatic Update Release Consistency (AURC) protocol. Like AURC, HLRC maintains a home for each page to which all updates are propagated and from which all copies are derived. Unlike AURC, HLRC requires no specialized hardware support. We nd that the use of homes provides substantial improvements in performance and scalability over LRC. Our second protocol, called Overlapped Home-based LRC (OHLRC), takes advantage of the communication processor found on each node of the Paragon to ooad some of the protocol overhead of HLRC from the critical path followed by the compute processor. We nd that OHLRC provides modest improvements over HLRC. We also apply overlapping to the base LRC protocol, with similar results. Our experiments were done using ve of the Splash-2 benchmarks. We report overall execution times, as well as detailed breakdowns of elapsed time, message trac, and memory use for each of the protocols. 1
Scope Consistency : A Bridge between Release Consistency and Entry Consistency
- In Proceedings of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures
, 1996
"... The large granularity of communication and coherence in shared virtual memory systems causes problems with false sharing and extra communication. Relaxed memory consistency models have been used to alleviate these problems, but at a cost in programming complexity. Release Consistency (RC) and Lazy R ..."
Abstract
-
Cited by 135 (12 self)
- Add to MetaCart
The large granularity of communication and coherence in shared virtual memory systems causes problems with false sharing and extra communication. Relaxed memory consistency models have been used to alleviate these problems, but at a cost in programming complexity. Release Consistency (RC) and Lazy Release Consistency (LRC) are accepted to offer a reasonable tradeoff between performance and programming complexity. Entry Consistency (EC) offers a more relaxed consistency model, but it requires explicit association of shared data objects with synchronization variables. The programming burden of providing such associations can be substantial. This paper proposes a new consistency model for shared virtual memory, called Scope Consistency (ScC), which offers most of the potential performance advantages of the EC model without requiring explicit bindings between data and synchronization variables. Instead, ScC dynamically detects the bindings implied by the programmer allowing a programming i...
High-Performance LocalArea Communication With Fast Sockets
- In Proceedings of the USENIX Technical Conference
, 1997
"... Modern switched networks such as ATM and Myrinet enable low-latency, high-bandwidth communication. This performance has not been realized by current applications, because of the high processing overheads imposed by existing communications software. These overheads are usually not hidden with large p ..."
Abstract
-
Cited by 62 (2 self)
- Add to MetaCart
Modern switched networks such as ATM and Myrinet enable low-latency, high-bandwidth communication. This performance has not been realized by current applications, because of the high processing overheads imposed by existing communications software. These overheads are usually not hidden with large packets; most network traffic is small. We have developed Fast Sockets, a local-area communication layer that utilizes a high-performance protocol and exports the Berkeley Sockets programming interface. Fast Sockets realizes round-trip transfer times of 60 microseconds and maximum transfer bandwidth of 33 MB/second between two UltraSPARC 1s connected by a Myrinet network. Fast Sockets obtains performance by collapsing protocol layers, using simple buffer management strategies, and utilizing knowledge of packet destinations for direct transfer into user buffers. Using receive posting, we make the Sockets API a single-copy communications layer and enable regular Sockets programs to exploit the performance of modern networks. Fast Sockets transparently reverts to standard TCP/IP protocols for wide-area communication.
Reducing Network Latency Using Subpages in a Global Memory Environment
- In Proceedings of the Seventh Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS VII
, 1996
"... New high-speed networks greatly encourage the use of network memory as a cache for virtual memory and file pages, thereby reducing the need for disk access. Becausepages are the fundamental transfer and access units in remote memory systems, page size is a key performance factor. Recently, page size ..."
Abstract
-
Cited by 29 (8 self)
- Add to MetaCart
New high-speed networks greatly encourage the use of network memory as a cache for virtual memory and file pages, thereby reducing the need for disk access. Becausepages are the fundamental transfer and access units in remote memory systems, page size is a key performance factor. Recently, page sizes of modern processors have been increasing in order to provide more TLB coverage and amortize disk access costs. Unfortunately, for high-speed networks, small transfers are needed to provide low latency. This trend in page size is thus at odds with the use of network memory on high-speed networks. This paper studies the use of subpages as a means of reducing transfer size and latency in a remote-memory environment. Using trace-driven simulation, we show how and why subpages reduce latency and improve performance of programs using network memory. Our results show that memory-intensive applications execute up to 1.8 times faster when executing with 1K-byte subpages, when compared to the same...
Improving the Performance of Shared Virtual Memory on System Area Networks
, 1998
"... As clusters of workstations, uniprocessor or symmetric multiprocessors (SMPs), become important platforms for parallel computing, there is increasing research interest in supporting the attractive, shared address space programming model across them in software. The reason is that it may provide succ ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
As clusters of workstations, uniprocessor or symmetric multiprocessors (SMPs), become important platforms for parallel computing, there is increasing research interest in supporting the attractive, shared address space programming model across them in software. The reason is that it may provide successful low--cost, high--performance alternatives to both tightly--coupled, hardware--coherent distributed shared memory machines and to scalable servers. In both these cases, the clusters are formed with o#--the--self, high--end PCs or workstations and system area networks that track technologies well. Given that a shared memory abstraction is an attractive programming model for this architecture, there has been a lot of research in fast communication on clusters connected with system area networks and in protocols for supporting software shared memory across them. However, the end performance of applications that were written for the more proven hardware--coherent shared memory is still not...
Remote repair of operating system state using backdoors
- In Proceedings of the 1st IEEE International Conference on Autonomic Computing
, 2004
"... Backdoors is a novel architectural approach that enables remote monitoring and recovery/repair of the software state of a system without using its processors or relying on its OS resources. We have implemented a Backdoors prototype in the FreeBSD kernel using Myrinet NICs for remote access to the ta ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Backdoors is a novel architectural approach that enables remote monitoring and recovery/repair of the software state of a system without using its processors or relying on its OS resources. We have implemented a Backdoors prototype in the FreeBSD kernel using Myrinet NICs for remote access to the target machine. In a previous paper we have shown how Backdoors can be used for recovery of useful OS and application state from a failed system. In this paper, we describe how a Backdoors architecture can be used to detect and repair damage to the OS state of a computer system. We present two case studies of remote repair of an OS subject to resource depletion (fork bomb and memory hog) to the point where it cannot perform useful work and local repair is impossible. We show that our prototype detects OS resource exhaustion efficiently and it successfully recovers the affected machine. 1
TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance
, 2002
"... TCP Server is a system architecture aiming to offload network processing from the host(s) running an Internet server. The TCP Server can be executed on a dedicated processor, node, or intelligent network interface using lowoverhead, non-intrusive communication between it and the host(s) running ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
TCP Server is a system architecture aiming to offload network processing from the host(s) running an Internet server. The TCP Server can be executed on a dedicated processor, node, or intelligent network interface using lowoverhead, non-intrusive communication between it and the host(s) running the server application.
Using Embedded Network Processors to Implement Global Memory Management in a Workstation Cluster
"... ..."
Global Memory Management for Workstation Networks
, 1996
"... Global Memory Management for Workstation Networks by Michael Joseph Feeley Chairperson of the Supervisory Committee: Professor Henry M. Levy Department of Computer Science and Engineering Advances in network and processor technology have greatly changed the communication and computational power of ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Global Memory Management for Workstation Networks by Michael Joseph Feeley Chairperson of the Supervisory Committee: Professor Henry M. Levy Department of Computer Science and Engineering Advances in network and processor technology have greatly changed the communication and computational power of local-area workstation networks. However, operating systems still treat workstation networks as a collection of loosely-connected processors, where each workstation acts as an autonomous and independent agent. This operating system structure makes it difficult to exploit the characteristics of current networks, such as low-latency communication, huge primary memories, and high-speed processors, in order to improve the performance of network applications. This dissertation describes the design and implementation of global memory management in a workstation network. Our objective is to use a single, unified, but distributed memory management algorithm at the lowest level of the operating syste...
High Performance Communication Subsystem for Clustering Standard High-Volume Servers Using Gigabit Ethernet
, 2000
"... This paper presents an efficient communication subsystem, DP-II, for clustering standard high-volume (SHV) servers using Gigabit Ethernet. The DP-II employs several light-weight messaging mechanisms to achieve low-latency and high-bandwidth communication. The test shows an 18.32 us single-trip laten ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This paper presents an efficient communication subsystem, DP-II, for clustering standard high-volume (SHV) servers using Gigabit Ethernet. The DP-II employs several light-weight messaging mechanisms to achieve low-latency and high-bandwidth communication. The test shows an 18.32 us single-trip latency and 72.8 MB/s bandwidth on a Gigabit Ethernet network for connecting two Dell PowerEdge 6300 Quad Xeon SMP servers running Linux. To improve the programmability of the DP-II communication subsystem, the development of DPII was based on a concise yet powerful abstract communication model, Directed Point Model, which can be conveniently used to depict the inter-process communication pattern of a parallel task in the cluster environment. In addition, the API of DP-II preserves the syntax and semantics of traditional UNIX I/O operations, which make it easy to use.

