Results 1 - 10
of
68
SoftFLASH: Analyzing the Performance of Clustered Distributed Virtual Shared Memory
- In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems
, 1996
"... One potentially attractive way to build large-scale shared-memory machines is to use small-scale to medium-scale shared-memory machines as clusters that are interconnected with an off-the-shelf network. To create a shared-memory programming environment across the clusters, it is possible to use a vi ..."
Abstract
-
Cited by 76 (0 self)
- Add to MetaCart
One potentially attractive way to build large-scale shared-memory machines is to use small-scale to medium-scale shared-memory machines as clusters that are interconnected with an off-the-shelf network. To create a shared-memory programming environment across the clusters, it is possible to use a virtual shared-memory software layer. Because of the low latency and high bandwidth of the interconnect available within each cluster, there are clear advantages in making the clusters as large as possible. The critical question then becomes whether the latency and bandwidth of the top-level network and the software system are sufficient to support the communication demands generated by the clusters. To explore these questions, we have built an aggressive kernel implementation of a virtual shared-memory system using SGI multiprocessors and 100Mbyte/sec HIPPI interconnects. The system obtains speedups on 32 processors (four nodes, eight
Brazos: A Third Generation DSM System
- IN PROCEEDINGS OF THE 1ST USENIX WINDOWS NT SYMPOSIUM
, 1997
"... Brazos is a third generation distributed shared memory (DSM) system designed for x86 machines running Microsoft Windows NT 4.0. Brazos is unique among existing systems in its use of selective multicast, a software-only implementation of scope consistency, and several adaptive runtime performance tun ..."
Abstract
-
Cited by 67 (11 self)
- Add to MetaCart
Brazos is a third generation distributed shared memory (DSM) system designed for x86 machines running Microsoft Windows NT 4.0. Brazos is unique among existing systems in its use of selective multicast, a software-only implementation of scope consistency, and several adaptive runtime performance tuning mechanisms. The Brazos runtime system is multithreaded, allowing the overlap of computation with the long communication latencies typically associated with software DSM systems. Brazos also supports multithreaded user-code execution, allowing programs to take advantage of the local tightly-coupled shared memory available on multiprocessor PC servers, while transparently interacting with remote "virtual" shared memory. Brazos currently runs on a cluster of Compaq Proliant 1500 multiprocessor servers connected by a 100 Mbps FastEthernet. This paper describes the Brazos design and implementation, and compares its performance running five scientific applications to the performance of Solaris...
Using Multicast and Multithreading to Reduce Communication in Software DSM Systems
, 1998
"... This paper examines the performance benefits of employing multicast communication and application-level multithreading in the Brazos software distributed shared memory (DSM) system. Application-level multithreading in Brazos allows programs to transparently take advantage of available local multipro ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
This paper examines the performance benefits of employing multicast communication and application-level multithreading in the Brazos software distributed shared memory (DSM) system. Application-level multithreading in Brazos allows programs to transparently take advantage of available local multiprocessing. Brazos uses multicast communication to reduce the number of consistency-related messages, and employs two adaptive mechanisms that reduce the detrimental side effects of using multicast communication. We compare three software DSM systems running on identical hardware: (1) a single-threaded point-to-point system, (2) a multithreaded point-to-point system, and (3) Brazos, which incorporates both multithreading and multicast communication. For the six applications studied, multicast and multithreading improve speedup on eight processors by an average of 38%.
Implementing a Caching Service for Distributed CORBA Objects
, 2000
"... . This paper discusses the implementation of CASCADE, a distributed caching service for CORBA objects. Our caching service is fully CORBA compliant, and supports caching of active objects, which include both data and code. It is specifically designed to operate over the Internet by employing a d ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
. This paper discusses the implementation of CASCADE, a distributed caching service for CORBA objects. Our caching service is fully CORBA compliant, and supports caching of active objects, which include both data and code. It is specifically designed to operate over the Internet by employing a dynamically built cache hierarchy. The service architecture is highly configurable with regard to a broad spectrum of application parameters. The main benefits of CASCADE are enhanced availability and service predictability, as well as easy dynamic code deployment and consistency maintenance. 1 Introduction One of the main goals of modern middlewares, and in particular of the CORBA standard [45], is to facilitate the design of interoperable, extensible and portable distributed systems. This is done by standardizing a programming language independent IDL, a large set of useful services, the Generic InterORB Protocol (and its TCP/IP derivative IIOP), and bridges to other common middleware...
MultiJav: A Distributed Shared Memory System Based on Multiple Java Virtual Machines
- In Proceedings of the Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas
, 1998
"... Current distributed shared memory systems suffer from portability problems which hinder popularity. We present a distributed shared memory system as a distributed implementation of the Java Virtual Machine. The proposed system is unique in that it provides a user-friendly, flexible programming model ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Current distributed shared memory systems suffer from portability problems which hinder popularity. We present a distributed shared memory system as a distributed implementation of the Java Virtual Machine. The proposed system is unique in that it provides a user-friendly, flexible programming model based on pure Java. It is an object-based memory system which maintains the synchronization scope as the whole address space, like pagebased systems. MultiJav demonstrates that it is possible to design an efficient, portable, distributed shared memory system for running parallel and distributed applications written in a standard language. Keywords: distributed shared memory, objectbased sharing, Java, memory consistency 1 Introduction Numerous distributed shared memory systems have been designed to promote parallel computing on a cluster of workstations. Through continuous effort of performance optimization, systems of this kind have become a reasonable choice for applications of massive...
Hybrid-DSM: An Efficient Alternative to Pure Software DSM Systems on NUMA Architectures
, 2000
"... Usually, shared memory style programming is being supported on loosely coupled architectures like clusters or networks of workstations by pure software distributed shared memory (DSM) systems. With the Scalable Coherent Interface (SCI), which facilitates communication via hardware DSM, high performa ..."
Abstract
-
Cited by 11 (10 self)
- Add to MetaCart
Usually, shared memory style programming is being supported on loosely coupled architectures like clusters or networks of workstations by pure software distributed shared memory (DSM) systems. With the Scalable Coherent Interface (SCI), which facilitates communication via hardware DSM, high performance PC clusters with NUMA (Non-Uniform-Memory Access) characteristics can be built. By exploiting the remote memory access features of SCI and by utilizing the memory management techniques of software DSM systems, a global virtual memory abstraction on top of an SCI-based PC cluster can be provided (the SCI Virtual Memory or SCI-VM). This hybrid DSM approach is the basis for the ecient implementation of shared memory programming models for not only SCI based systems but also for loosely coupled NUMA architectures in general.
Removing the Overhead from Software-Based Shared Memory
, 2001
"... The implementation presented in this paper---DSZOOM-WF--- is a sequentially consistent, fine-grained distributed software-based shared memory. It demonstrates a protocol-handling overhead below a microsecond for all the actions involved in a remote load operation, to be compared to the fastest imple ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
The implementation presented in this paper---DSZOOM-WF--- is a sequentially consistent, fine-grained distributed software-based shared memory. It demonstrates a protocol-handling overhead below a microsecond for all the actions involved in a remote load operation, to be compared to the fastest implementation to date of around ten microseconds. The all-software protocol is implemented assuming some basic low-level primitives in the cluster interconnect and an operating system bypass functionality, similar to the emerging InfiniBand standard. All interrupt- and/or poll-based asynchronous protocol processing is completely removed by running the entire coherence protocol in the requesting processor. This not only removes the asynchronous overhead, but also makes use of a processor that otherwise would stall. The technique is applicable to both page-based and fine-grain software-based shared memory. DSZOOM-WF consistently demonstrates performance comparable to hardware-based distributed shared memory implementations.
Performance Evaluation of View-Oriented Parallel Programming
- IN: PROC. OF THE 2005 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP05), PP.251-258, IEEE COMPUTER SOCIETY
, 2005
"... This paper evaluates the performance of a novel View-Oriented Parallel Programming style for parallel programming on cluster computers. View-Oriented Parallel Programming is based on Distributed Shared Memory which is friendly and easy for programmers to use. It requires the programmer to divide sha ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
This paper evaluates the performance of a novel View-Oriented Parallel Programming style for parallel programming on cluster computers. View-Oriented Parallel Programming is based on Distributed Shared Memory which is friendly and easy for programmers to use. It requires the programmer to divide shared data into views according to the memory access pattern of the parallel algorithm. One of the advantages of this programming style is that it offers the performance potential for the underlying Distributed Shared Memory system to optimize consistency maintenance. Also it allows the programmer to participate in performance optimization of a program through wise partitioning of the shared data into views. Experimental results demonstrate a significant performance gain of the programs based on the View-Oriented Parallel Programming style.
Empirical Evaluation of Distributed Mutual Exclusion Algorithms
- In International Parallel Processing Symposium
, 1997
"... In this paper, we evaluated various distributed mutual exclusion algorithms on the IBM SP2 machine and the Intel iPSC/860 system. The empirical results are compared in terms of such criteria as the number of message exchanges and the response time. Our results indicate that the Star algorithm [2] ac ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
In this paper, we evaluated various distributed mutual exclusion algorithms on the IBM SP2 machine and the Intel iPSC/860 system. The empirical results are compared in terms of such criteria as the number of message exchanges and the response time. Our results indicate that the Star algorithm [2] achieves the shortest response time in most cases among all the algorithms on a small to medium sized system, when processors request for the critical section many times before involving any barrier synchronization. On the other hand, if every processor enters the critical section only once before encountering a barrier, the improved Ring algorithm [4] is found to outperform others under a heavy load; but the Star algorithm and the CSL algorithm [3] prevail when the request rate becomes light. The best solution to mutual exclusion in distributed memory systems is determined by how participating sites generate their mutual exclusion requests. 1 Introduction Mutual exclusion is achieved by a m...
Efficient Runtime Support for Cluster-Based Distributed Shared Memory Multiprocessors
, 1997
"... Distributed shared memory (DSM) systems provide a shared memory programming paradigm on top of a physically distributed network of computers. The DSM system removes the necessity for programmers to move data explicitly between processors. The principle challenge in the development of an e cient DSM ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
Distributed shared memory (DSM) systems provide a shared memory programming paradigm on top of a physically distributed network of computers. The DSM system removes the necessity for programmers to move data explicitly between processors. The principle challenge in the development of an e cient DSM system lies in reducing the amount of communication necessary to maintain coherence to an absolute minimum. This thesis presents Brazos, a DSM system for use in an environment of symmetric multiprocessor (SMP) personal computers that are networked together by industry-standard 100 Mbps FastEthernet. Brazos is distinguished by its use of application-level multithreading, selective multicast, adaptive runtime mechanisms, and a unique performance history mechanism. Through the detailed analysis of twelve scientific programs, we show that Brazos outperforms the current state-of-the-art software DSM system by an average of 83%, and outperforms a version of the same DSM system that has been altered to take advantage of SMP personal computers by an average of 32%. Our results indicate that networks of commodity personal computers using available PC networks and operating systems can perform comparably on a wide variety of scientific applications to more traditional networks of high-end engineering workstations.

