Results 1 -
5 of
5
Performance Evaluation of RDMA over IP: A Case Study with the Ammasso Gigabit Ethernet NIC
- In Workshop on High Performance Interconnects for Distributed Computing (HPI-DC); In conjunction with HPDC-14
, 2005
"... Remote Direct Memory Access (RDMA) has been proposed to overcome the limitations of traditional send/receive based communication protocols. The immense potential of RDMA to improve the communication performance while being extremely conservative on resource requirements have made RDMA the most soug ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Remote Direct Memory Access (RDMA) has been proposed to overcome the limitations of traditional send/receive based communication protocols. The immense potential of RDMA to improve the communication performance while being extremely conservative on resource requirements have made RDMA the most sought after feature in current and next generation networks. Recently, there are many active efforts to enable RDMA over IP and the fabrication of RDMA-enabled Ethernet NICs has just been started. However, its performance has not been quantitatively evaluated over WAN environments while existing researches have been focused on LAN environments. In this paper, we evaluate the performance of RDMA over IP networks with Ammasso Gigabit Ethernet NIC while emulating high delay WANs and varying load on remote node. We observe that RDMA is beneficial especially under heavy load conditions. More importantly, even with a high delay, RDMA can provide better communication progress and requires less CPU resource as compared to the traditional sockets over TCP/IP. Further we show that RDMA can support high performance intra-cluster communications providing unified communication interface for inter- and intra-cluster communication. To the best of our knowledge, this is the first quantitative study of RDMA over IP on a WAN setup.
Optimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications
"... Abstract—Distributed applications tend to have a complex design due to issues such as concurrency, synchronization and communication. Researchers in the past have proposed simpler abstractions to hide these complexities. However, many of the proposed techniques use messaging protocols which incur hi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract—Distributed applications tend to have a complex design due to issues such as concurrency, synchronization and communication. Researchers in the past have proposed simpler abstractions to hide these complexities. However, many of the proposed techniques use messaging protocols which incur high overhead and are not very scalable. In our previous work [20], we proposed an efficient Distributed Data Sharing Substrate (DDSS) using the advanced features of high-speed networks to address the limitations of these messaging protocols. In this paper, we propose optimizations for DDSS in large-scale multi-core systems and comprehensively evaluate DDSS in terms of performance, scalability and associated overheads using several micro-benchmarks and applications such as Distributed STORM, R-Tree and B-Tree query processing, checkpointing applications and resource monitoring services. The proposed substrate is implemented over the OpenFabrics standard interface and we demonstrate its portability across multiple modern interconnects such as InfiniBand and iWARP-capable 10-Gigabit Ethernet networks (applicable for both LAN/WAN environments). Our micro-benchmark results not only show a very low latency in DDSS operations but also demonstrate the scalability of DDSS with increasing number of processes. Application evaluations with R-Tree and B-Tree query processing and distributed STORM shows an improvement of up to 56%, 45 % and 44%, respectively, as compared to the traditional implementations, while evaluations with application checkpointing using DDSS demonstrate the scalability with increasing number of checkpointing applications. In addition, our evaluations using an additional core for DDSS services show a lot of potential benefits for performing these services on dedicated cores. I.
Improving Per-Node Efficiency in the Datacenter with New OS Abstractions
"... We believe datacenters can benefit from more focus on per-node efficiency, performance, and predictability, versus the more common focus so far on scalability to a large number of nodes. Improving per-node efficiency decreases costs and fault recovery because fewer nodes are required for the same am ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We believe datacenters can benefit from more focus on per-node efficiency, performance, and predictability, versus the more common focus so far on scalability to a large number of nodes. Improving per-node efficiency decreases costs and fault recovery because fewer nodes are required for the same amount of work. We believe that the use of complex, general-purpose operating systems is a key contributing factor to these inefficiencies. Traditional operating system abstractions are ill-suited for high performance and parallel applications, especially on large-scale SMP and many-core architectures. We propose four key ideas that help to overcome these limitations. These ideas are built on a philosophy of exposing as much information to applications as possible and giving them the tools necessary to take advantage of that information to run more efficiently. In short, high-performance applications need to be able to peer through layers of virtualization in the software stack to optimize their behavior. We explore abstractions based on these ideas and discuss how we build them in the context of a new operating system called Akaros.
Ohio State University (OSU-CISRC-8/07-TR53) OptimizedDistributed DataSharing Substrate in Multi-CoreCommodity Clusters: AComprehensive Study with Applications
"... Distributed applications tend to have a complex design due to issues such as concurrency, synchronization and communication. Researchers in the past have proposed simpler abstractions to hide these complexities. However, many of the proposed techniques use messaging protocols which incur high overhe ..."
Abstract
- Add to MetaCart
Distributed applications tend to have a complex design due to issues such as concurrency, synchronization and communication. Researchers in the past have proposed simpler abstractions to hide these complexities. However, many of the proposed techniques use messaging protocols which incur high overhead and are not very scalable. To address these limitations, in our previous work [20], we proposed an efficient Distributed Data Sharing Substrate (DDSS) using the features of high-speed networks. In this paper, we propose several design optimizations for DDSS in multi-core systems such as the combination of shared memory and message queues for inter-process communication, dedicated thread for communication progress and for onloading DDSS operations such as get and put. Our micro-benchmark results not only show a very low latency in DDSS operations but also demonstrate the scalability of DDSS with increasing number of processes. Application evaluations with R-Tree and B-Tree query processing and distributed STORM shows an improvement of up to 56%, 45 % and 44%, respectively, as compared to the traditional implementations,whileevaluationswithapplicationcheckpointing using DDSS demonstrate the scalability with increasing number of checkpointing applications. Further, in our evaluations, we demonstrate the portability of DDSS across multiple modern interconnects such as InfiniBand and iWARP-capable 10-Gigabit Ethernet networks (applicable for both LAN/WAN environments). 1
High Performance Communication Support for . . .
, 2006
"... In the past decade several high-speed networks have been introduced, each superseding the others with respect to raw performance, communication features and capabilities. However, such aggressive initiative is accompanied by an increasing divergence in the communication interface or “language” used ..."
Abstract
- Add to MetaCart
In the past decade several high-speed networks have been introduced, each superseding the others with respect to raw performance, communication features and capabilities. However, such aggressive initiative is accompanied by an increasing divergence in the communication interface or “language” used by each network. Accordingly, portability for applications across these various network languages has recently been a topic of extensive research. Programming models such as the Sockets Interface, Message Passing Interface (MPI), Shared memory models, etc., have been widely accepted as the primary means for achieving such portability. This dissertation investigates the different design choices for implementing one such programming model, i.e., Sockets, in various high-speed network environments (e.g., InfiniBand and 10-Gigabit Ethernet). Specifically, the dissertation targets three important sub-problems: (a) designing efficient sockets implementations to allow existing applications to be directly and transparently deployed on to clusters connected with high-speed networks; (b) analyzing the limitations of the sockets interface in various domains and extending it with features that applications need but are currently

