Results 1 - 10
of
61
The Ninja architecture for robust Internet-scale systems and services
- Computer Networks
, 2001
"... ..."
Market-based Proportional Resource Sharing for Clusters
, 1999
"... Enabling technologies in high speed communication and global process scheduling have pushed clusters of computers into the mainstream as general-purpose high-performance computing systems. More generality, however, implies more sharing and this raises new questions in the area of cluster resource ma ..."
Abstract
-
Cited by 52 (3 self)
- Add to MetaCart
Enabling technologies in high speed communication and global process scheduling have pushed clusters of computers into the mainstream as general-purpose high-performance computing systems. More generality, however, implies more sharing and this raises new questions in the area of cluster resource management. In particular, in systems where the aggregate demand for computing resources can exceed the aggregate supply, how to allocate resources amongst competing applications is an important problem. Traditional solutions to this problem have focused mainly on global optimization with respect to system-centric performance metrics, metrics which ignore higher level user intent. In this paper, we propose an alternative market-based approach based on the notion of a computational economy which optimizes for user value. Starting with fundamental requirements, we describe an abstract architecture for market-based cluster resource management based on the idea of proportional resource sharing of...
SOVIA: A User-level Sockets Layer over Virtual Interface Architecture
- In Cluster Computing
, 2001
"... The Virtual Interface Architecture (VIA) is an industry standard user-level communication architecture for system area networks. The VIA provides a protected, directlyaccessible interface to a network hardware, removing the operating system from the critical communication path. In this paper, we des ..."
Abstract
-
Cited by 45 (0 self)
- Add to MetaCart
The Virtual Interface Architecture (VIA) is an industry standard user-level communication architecture for system area networks. The VIA provides a protected, directlyaccessible interface to a network hardware, removing the operating system from the critical communication path. In this paper, we design and implement a user-level Sockets layer over VIA, named SOVIA (Sockets Over VIA). Our objective is to use the SOVIA layer to accelerate the existing Sockets-based applications with a reasonable effort and to provide a portable and high performance communication library based on VIA to the application developers. SOVIA realizes comparable performance to native VIA, showing the minimum latency of 10.5�sec and the peak bandwidth of 814Mbps on Giganet’s cLAN. We have verified the functional compatibility with the existing Sockets API by porting FTP (File Transfer Protocol) and RPC (Remote Procedure Call) applications over the SOVIA layer. Compared to the Giganet’s LANE driver which emulates TCP/IP inside the kernel, SOVIA easily doubles the file transfer bandwidth in FTP and reduces the latency of calling an empty remote procedure by 77 % in RPC applications. 1.
User-Level Communication in Cluster-Based Servers
- In Proceedings of the 8th IEEE International Symposium on High-Performance Computer Architecture (HPCA 8
, 2002
"... Clusters of commodity computers are currently being used to provide the scalability required by several popular Internet services. In this paper we evaluate an efficient cluster-based WWW server, as a function of the characteristicsof the intra-cluster communication architecture. More specifically, ..."
Abstract
-
Cited by 29 (11 self)
- Add to MetaCart
Clusters of commodity computers are currently being used to provide the scalability required by several popular Internet services. In this paper we evaluate an efficient cluster-based WWW server, as a function of the characteristicsof the intra-cluster communication architecture. More specifically, we evaluate the impact of processor overhead, networkbandwidth, remote memory writes, and zero-copy data transfers on the performance of our server. Our experimental results with an 8-node cluster and four real WWW traces show that networkbandwidth affects the performanceof our server by only 6%. In contrast, user-level communication can improve performance by as much as 29%. Low processor overhead, remote memory writes, and zero-copyall make small contributions towardsthis overall gain. Tobe able to extrapolate fromour experimental results, we usean analytical model to assess the performance of our server under different workload characteristics, different numbers of cluster nodes, and higher performance systems. Our modeling results show that higher gains (of up to 55%) can be accrued for workloads with large working sets and next-generation servers running on large clusters. 1
Software Distributed Shared Memory over Virtual Interface Architecture: Implementation and Performance
- IN PROCEEDINGS OF THE 3RD EXTREME LINUX WORKSHOP
, 2000
"... In this paper, we describe an implementation of a software Distributed Shared Memory (DSM) over Virtual Interface Architecture (VIA) for a Linux-based cluster of PCs and evaluate its performance. VIA is a user-level memory-mapped communication model that provides zero-copy communication and low-over ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
In this paper, we describe an implementation of a software Distributed Shared Memory (DSM) over Virtual Interface Architecture (VIA) for a Linux-based cluster of PCs and evaluate its performance. VIA is a user-level memory-mapped communication model that provides zero-copy communication and low-overhead by excluding the operating system kernel from the communication path. To our best knowledge, our implementation is the rst software DSM protocol on VIA. The DSM protocol we have implemented on VIA is Home-based Lazy Release Consistency (HLRC) that previous studies have shown to exhibit good scalability by reducing the number of messages and memory overhead compared to the homeless counterpart. The experimental results obtained on seven Splash-2 applications show that VIA can be successfully used to support software shared memory on clusters of PCs. The paper is accompanied by a source-code distribution of the software DSM protocol for Linux/VIA clusters.
Sockets Direct Protocol over InfiniBand in Clusters: Is it Beneficial?
, 2003
"... InfiniBand has been recently standardized by the industry to design next generation high-end clusters for both datacenter and high performance computing domains. Though InfiniBand has been able support low latency and high bandwidth, traditional sockets based applications have not been able to take ..."
Abstract
-
Cited by 22 (10 self)
- Add to MetaCart
InfiniBand has been recently standardized by the industry to design next generation high-end clusters for both datacenter and high performance computing domains. Though InfiniBand has been able support low latency and high bandwidth, traditional sockets based applications have not been able to take advantage of this; this is mainly attributed to the multiple copies and kernel context switches associated with the traditional TCP/IP protocol stack. The Sockets Direct Protocol (SDP) had been proposed recently in order to enable sockets based applications to take advantage of the enhanced features provided by InfiniBand Architecture. In this
Impact of High Performance Sockets on Data Intensive Applications
- In the Proceedings of the IEEE International Conference on High Performance Distributed Computing (HPDC 2003
, 2003
"... ¤ balaji,wuj,panda ¥ ..."
Efficient Collective Operations using Remote Memory Operations on VIA-Based Clusters
- in Proceedings of the International Parallel and Distributed Processing Symposium
, 2003
"... High performance scientific applications require efficient and fast collective communication operations. Most collective communication operations have been built on top of point-to-point send/receive primitives. Modern user-level protocols such as VIA and the emerging InfiniBand architecture support ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
High performance scientific applications require efficient and fast collective communication operations. Most collective communication operations have been built on top of point-to-point send/receive primitives. Modern user-level protocols such as VIA and the emerging InfiniBand architecture support remote DMA operations. These operations not only allow data to be moved between the nodes with low overhead but also allow the user to create and provide a logical shared memory address space across the nodes. This feature demonstrates potential for designing high performance and scalable collective operations. In this paper, we discuss the various design issues that may be the basis of a RDMA supported collective communication library. As a proof of concept, we have designed and implemented the RDMA-based broadcast and the RDMA-based allreduce operations. For RDMA-based broadcast, we get a benefit of 14%, when compared to send/receive-based broadcast for 4KB data size on a 16 node cluster. We also introduce a new reduce algorithm called as the Degree-k tree-based reduce algorithm. Combining the RDMA mechanism with the new reduce algorithm shows a benefit of 38 % for 4 byte messages and 9 % for 4KB messages on a 16 node cluster for the allreduce operation. We also introduce analytical models for broadcast and allreduce to predict the performance of this design for large-scale clusters. These analytical models yield a performance benefit of about 35-40 % for 4 bytes and around 14 % for 4KB messages for 512 and 1024 node clusters for the allreduce operation. 1
The Data Mover: A Machine-independent Abstraction for Managing Customized Data Motion
, 1999
"... This paper discusses an abstraction, called the Data Mover, for expressing machine-independent customized communication algorithms in a variety of block-structured applications. The Data Mover enables its user to express data motion using intuitive geometric operations that encapsulate the low-level ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
This paper discusses an abstraction, called the Data Mover, for expressing machine-independent customized communication algorithms in a variety of block-structured applications. The Data Mover enables its user to express data motion using intuitive geometric operations that encapsulate the low-level details of the underlying communication. Communication patterns are expressed as collective operations, and are restricted to movement of rectangular array sections. We describe the Data Mover model of communication, and present performance for various applications. The Data Mover currently serves as useful middleware for application library designers, but defines a simple machine-independent interface suitable as a target for a compiler or compiler run time library. 1.
Comparison and Evaluation of Design Choices for Implementing the Virtual Interface Architecture (VIA)
, 2000
"... The Virtual Interface Architecture (VIA) specification has been developed to standardize user-level network interfaces that provide low latency, high bandwidth communications. Few hardware and software implementations of VIA exist. Since the VIA specification is flexible, different choices exist for ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
The Virtual Interface Architecture (VIA) specification has been developed to standardize user-level network interfaces that provide low latency, high bandwidth communications. Few hardware and software implementations of VIA exist. Since the VIA specification is flexible, different choices exist for implementing various components of VIA such as doorbells, address translation methods, and completion queues. Although previous studies have evaluated the overall performance of different VIA implementations, there has not been a comparative study on the performance of VIA components. In this paper, we evaluate and compare the performance of different implementations of essential VIA components. We discuss the pros and cons of each design approach and describe the required support for implementing each of them. As a user application, we use the NAS Parallel Benchmarks to study the effect of caching the address translation tables on the NIC and to study design issues involved in implementing completion queues. As a hardware platform we use the IBM Net nity SP cluster running the NT 4.0 operating system and a Myrinet connected cluster of PCs running the Linux operating system.

