Results 1 - 10
of
11
In-network cache coherence
- In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
, 2006
"... Abstract — We propose implementing cache coherence protocols within the network, demonstrating how an in-network implementation of the MSI directory-based protocol allows for in-transit optimizations of read and write delay. Our results show 15 % and 24 % savings on average in memory access latency ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
Abstract — We propose implementing cache coherence protocols within the network, demonstrating how an in-network implementation of the MSI directory-based protocol allows for in-transit optimizations of read and write delay. Our results show 15 % and 24 % savings on average in memory access latency for SPLASH-2 parallel benchmarks running on a 4x4 and a 16x16 multiprocessor respectively.
PULC: ParaStation User-Level Communication. Design and Overview
- In Parallel and Distributed Processing
, 1998
"... . PULC is a user-level communication library for workstation clusters. PULC provides a multi-user, multi-programming communication library for user level communication on top of high-speed communication hardware. In this paper, we describe the design of the communication subsystem, a first implement ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
. PULC is a user-level communication library for workstation clusters. PULC provides a multi-user, multi-programming communication library for user level communication on top of high-speed communication hardware. In this paper, we describe the design of the communication subsystem, a first implementation on top of the ParaStation communication card, and benchmark results of this first implementation. PULC removes the operating system from the communication path and offers a multi-process environment with user-space communication. Additionally, we have moved some operating system functionality to the user level to provide higher efficiency and flexibility. Message demultiplexing, protocol processing, hardware interfacing, and mutual exclusion of critical sections are all implemented in user-level. PULC offers the programmer multiple interfaces including TCP user-level sockets, MPI [CGH94], PVM [BDG + 93], and Active Messages [CCHvE96]. Throughput and latency are close to the hardware ...
An Efficient Virtual Network Interface in the FUGU Scalable Workstation
, 1998
"... A scalable workstation is one vision of a mainstream parallel computer: a machine that combines scalable, fine-grain communication facilities for parallel applications with virtual memory and preemptive multiprogramming to support general-purpose workloads. A key challenge in a scalable workstation ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
A scalable workstation is one vision of a mainstream parallel computer: a machine that combines scalable, fine-grain communication facilities for parallel applications with virtual memory and preemptive multiprogramming to support general-purpose workloads. A key challenge in a scalable workstation is the Virtual Network Interface (VNI) problem. The problem is that high performance communication for parallel programming depends on a tight coupling between the application and the network while multiprogramming and virtual memory effects disrupt such coupling. This thesis
Enhancing Distributed Systems with Low-Latency Networking
- In Parallel and Distributed Computing and Networks
, 1998
"... Recently several network technologies which support user-level communication between processes using a shared-memory interface have become available [4, 7]. These technologies offer very low latency, high bandwidth communication by eliminating the need for software protocol stacks. Whilst there has ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Recently several network technologies which support user-level communication between processes using a shared-memory interface have become available [4, 7]. These technologies offer very low latency, high bandwidth communication by eliminating the need for software protocol stacks. Whilst there has been much research on the use of such networks in the context of parallel computing [5, 6, 13], relatively little work has been done on their suitability for distributed applications. This paper describes the work undertaken to integrate the Scalable Coherent Interface (SCI) interconnect with the standard NFS server and a CORBA 2.0 compliant ORB over Linux. It is shown that impressive performance increases can be achieved without modification to either the operating system or the distributed application. Keywords: SCI, CORBA, NFS. 1
The Implementation of Cashmere
"... Cashmere is a software distributed shared memory (SDSM) system designed for today’s high performance cluster architectures. These clusters typically consist of symmetric multiprocessors (SMPs) connected by a low-latency system area network. Cashmere introduces several novel techniques for delegating ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Cashmere is a software distributed shared memory (SDSM) system designed for today’s high performance cluster architectures. These clusters typically consist of symmetric multiprocessors (SMPs) connected by a low-latency system area network. Cashmere introduces several novel techniques for delegating intra-node sharing to the hardware coherence mechanism available within the SMPs, and also for leveraging advanced network features such as remote memory access. The efficacy of the Cashmere design has been borne out through head-to-head comparisons with other well-known, mature SDSMs and with Cashmere variants that do not take advantage of the various hardware features. In this paper, we describe the implementation of the Cashmere SDSM. Our discussion is organized around the core components that comprise Cashmere. We discuss both component interactions and lowlevel implementation details. We hope this paper provides researchers with the background needed to
ClusterNet: An Object-Oriented Cluster Network
"... Abstract. Parallel processing is based on utilizing a group of processors to efficiently solve large problems faster than is possible on a single processor. To accomplish this, the processors must communicate and coordinate with each other through some type of network. However, the only function tha ..."
Abstract
- Add to MetaCart
Abstract. Parallel processing is based on utilizing a group of processors to efficiently solve large problems faster than is possible on a single processor. To accomplish this, the processors must communicate and coordinate with each other through some type of network. However, the only function that most networks support is message routing. Consequently, functions that involve data from a group of processors must be implemented on top of message routing. We propose treating the network switch as a function unit that can receive data from a group of processors, execute operations, and return the result(s) to the appropriate processors. This paper describes how each of the architectural resources that are typically found in a network switch can be better utilized as a centralized function unit. A proof-of-concept prototype called ClusterNet 4EPP has been implemented to demonstrate feasibility of this concept. 1
COMPUTING APPLICATIONS NETWORK TECHNOLOGIES NETWORK TECHNOLOGIES
"... A broad and growing range of possibilities is available to designers of a cluster when choosing an interconnection technology. As the price of network hardware in a cluster can vary from almost free to several thousands of dollars ..."
Abstract
- Add to MetaCart
A broad and growing range of possibilities is available to designers of a cluster when choosing an interconnection technology. As the price of network hardware in a cluster can vary from almost free to several thousands of dollars
Enhancing Distributed Systems with Low-Latency Networking
, 1998
"... Recently several network technologies which support user-level communication between processes using a shared-memory interface have become available [2, 5]. These technologies o#er very low latency, high bandwidth communication by eliminating the need for software protocol stacks. Whilst there h ..."
Abstract
- Add to MetaCart
Recently several network technologies which support user-level communication between processes using a shared-memory interface have become available [2, 5]. These technologies o#er very low latency, high bandwidth communication by eliminating the need for software protocol stacks. Whilst there has been much research on the use of such networks in the context of parallel computing [3, 4, 11], relatively little work has been done on their suitability for distributed applications. This paper describes the work undertaken to integrate the Scalable Coherent Interface (SCI) interconnect with the standard NFS server and a CORBA 2.0 compliant ORB over Linux. It is shown that impressive performance increases can be achieved without modification to either the operating system or the distributed application. Keywords: SCI, CORBA, NFS. 1 Introduction The computing environment at the Olivetti &Oracle Research Laboratory (ORL) is highly heterogenous, supporting our work in multimedia an...

