Results 1 -
5 of
5
Scalable, Fault-Tolerant Membership for MPI Tasks on HPC Systems
- ICS06
, 2006
"... Reliability is increasingly becoming a challenge for highperformance computing (HPC) systems with thousands of nodes, such as IBM’s Blue Gene/L. A shorter mean-time-to-failure can be addressed by adding fault tolerance to reconfigure working nodes to ensure that communication and computation can pro ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
Reliability is increasingly becoming a challenge for highperformance computing (HPC) systems with thousands of nodes, such as IBM’s Blue Gene/L. A shorter mean-time-to-failure can be addressed by adding fault tolerance to reconfigure working nodes to ensure that communication and computation can progress. However, existing approaches fall short in providing scalability and small reconfiguration overhead within the fault-tolerant layer. This paper contributes a scalable approach to reconfigure the communication infrastructure after node failures. We propose a decentralized (peer-to-peer) protocol that maintains a consistent view of active nodes in the presence of faults. Our protocol shows response times in the order of hundreds of microseconds and singledigit milliseconds for reconfiguration using MPI over BlueGene/L and TCP over Gigabit, respectively. The protocol can be adapted to match the network topology to further increase performance. We also verify experimental results against a performance model, which demonstrates the scalability of the approach. Hence, the membership service is suitable for deployment in the communication layer of MPI runtime systems, and we have integrated an early version into LAM/MPI.
High Performance Distributed Lock Management Services using Networkbased Remote Atomic Operations
- In Proceedings of Int’l Symposium on Cluster Computing and the Grid (CCGrid
, 2007
"... Recently there has been a massive increase in computing requirements for parallel applications. These parallel applications and supporting cluster services often need to share system-wide resources. The coordination of these applications is typically managed by a distributed lock manager. The perfor ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Recently there has been a massive increase in computing requirements for parallel applications. These parallel applications and supporting cluster services often need to share system-wide resources. The coordination of these applications is typically managed by a distributed lock manager. The performance of the lock manager is extremely critical for application performance. Researchers have shown that the use of two sided communication protocols, like TCP/IP, (used by current generation lock managers) can have significant impact on the scalability of distributed lock managers. In addition, existing onesided communication based locking designs support locking in exclusive access mode only and can pose significant scalability limitations on applications that need shared and exclusive access modes like cooperative/file-system caching. Hence the utility of these existing designs in high performance scenarios can be limited. In this paper, we present a novel protocol, for distributed locking services, utilizing the advanced network level one-sided atomic operations provided by InfiniBand. Our approach augments existing approaches by eliminating the need for two sided communication protocols in the critical locking path. Further, we also demonstrate that our approach provides significantly higher performance in scenarios needing both shared and exclusive mode access to resources. Our experimental results show 39 % improvement in basic locking latencies over traditional send/receive based implementations. Further, we also observe a significant (upto 317 % for 16 nodes) improvement over existing RDMA based distributed queuing schemes for shared mode locking scenarios.
Scalable Hierarchical Locking for Distributed Systems
- Journal of Parallel Distributed Computing
, 2003
"... Middleware components are becoming increasingly important as applications share computational resources in distributed environments, such as high-end clusters with ever larger number of processors, computational grids and increasingly large server farms. One of the main challenges in such environmen ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Middleware components are becoming increasingly important as applications share computational resources in distributed environments, such as high-end clusters with ever larger number of processors, computational grids and increasingly large server farms. One of the main challenges in such environments is to achieve scalability of synchronization. In general, concurrency services arbitrate resource requests in distributed systems. But concurrency protocols currently lack scalability. Adding such guarantees enables resource sharing and computing with distributed objects in systems with a large number of nodes.
Distributed Queue-based Locking using Advanced Network Features
, 2005
"... A Distributed Lock Manager (DLM) provides advisory locking services to applications such as databases and file systems that run on distributed systems. Lock management at the server is implemented using First-In-First-Out (FIFO) queues. In this paper, we demonstrate a novel way of delegating the loc ..."
Abstract
- Add to MetaCart
A Distributed Lock Manager (DLM) provides advisory locking services to applications such as databases and file systems that run on distributed systems. Lock management at the server is implemented using First-In-First-Out (FIFO) queues. In this paper, we demonstrate a novel way of delegating the lock management to the participating lock-requesting nodes, using advanced network primitives such as Remote Direct Memory Access (RDMA) and Atomic operations. This nicely complements the original idea of DLM, where management of the lock space is distributed. Our implementation achieves better load balancing, reduction in server load and improved throughput over traditional designs.
Dissertation Committee: Approved by
"... In the past decade, rapid advances have taken place in the field of computer and network design enabling us to connect thousands of computers together to form high performance clusters. These clusters are used to solve computationally challenging scientific problems. The Message Passing Interface (M ..."
Abstract
- Add to MetaCart
In the past decade, rapid advances have taken place in the field of computer and network design enabling us to connect thousands of computers together to form high performance clusters. These clusters are used to solve computationally challenging scientific problems. The Message Passing Interface (MPI) is a popular model to write applications for these clusters. There are a vast array of scientific applications which use MPI on clusters. As the applications operate on larger and more complex data, the size of the compute clusters is scaling higher and higher. The scalability and the performance of the MPI library if very important for the end application performance. InfiniBand is a cluster interconnect which is based on open-standards and is gaining rapid acceptance. This dissertation explores the different transports provided by Infini-Band to determine the scalabilty and performance aspects of each. Further, new MPI designs have been proposed and implemented for transports that have never been used for MPI in the past. These designs have significantly decreased the resource consumption, increased the performance and increased the reliability of ultra-scale InfiniBand clusters. A

