DMCA
Ft-mpi, fault-tolerant metacomputing and generic name services: A case study (2006)
Venue: | Lecture Notes in Computer Science 4192 |
Citations: | 1 - 0 self |
Citations
222 | CoCheck: Checkpointing and Process Migration for MPI
- STELLNER
- 1996
(Show Context)
Citation Context ...ature extensive geographical distribution across multiple Administrative Domains (ADs). This raises the issue of fault-tolerance. FT-MPI [7] di ers from other solutions to the fault-tolerance problem =-=[3,4,6,10,5]-=-, in that it allows the application itself to restore it's own state, instead of relying on automated - but potentially unscalable - solutions like global distributed checkpointing. This makes it an i... |
209 | Manetho: Transparent rollback-recovery with low overhead, limited rollback and fast output commit
- Elnozahy, Zwaenepoel
- 1992
(Show Context)
Citation Context ...ature extensive geographical distribution across multiple Administrative Domains (ADs). This raises the issue of fault-tolerance. FT-MPI [7] di ers from other solutions to the fault-tolerance problem =-=[3,4,6,10,5]-=-, in that it allows the application itself to restore it's own state, instead of relying on automated - but potentially unscalable - solutions like global distributed checkpointing. This makes it an i... |
81 | F.: MPIch-V2: A fault tolerant mpi for volatile nodes based on the pessimistic sender based message logging
- Bouteiller, Cappello, et al.
(Show Context)
Citation Context ...ature extensive geographical distribution across multiple Administrative Domains (ADs). This raises the issue of fault-tolerance. FT-MPI [7] di ers from other solutions to the fault-tolerance problem =-=[3,4,6,10,5]-=-, in that it allows the application itself to restore it's own state, instead of relying on automated - but potentially unscalable - solutions like global distributed checkpointing. This makes it an i... |
81 | CLIP: A checkpointing tool for message-passing parallel programs. High Performance Networking and Computing
- Chen, Li, et al.
- 1997
(Show Context)
Citation Context ...ature extensive geographical distribution across multiple Administrative Domains (ADs). This raises the issue of fault-tolerance. FT-MPI [7] di ers from other solutions to the fault-tolerance problem =-=[3,4,6,10,5]-=-, in that it allows the application itself to restore it's own state, instead of relying on automated - but potentially unscalable - solutions like global distributed checkpointing. This makes it an i... |
23 | MPI-FT: Portable fault tolerance scheme for MPI - Louca, Neophytou, et al. - 2000 |
16 | The Harness metacomputing framework
- Migliardi, Sunderam
(Show Context)
Citation Context ...of performance tests comparing the original NS with two alternatives: the LDAP-based OpenLDAP using the Berkeley DB, and HDNS [12]. HDNS is a naming service initially developed for the Harness Project=-=[11]-=-. While developing the SPI, a completely new version of HDNS has been designed and implemented. Both of the NSs tested support distribution and a number of features like fault-tolerance and persistenc... |
11 | Scalable fault tolerant MPI: extending the recovery algorithm
- Fagg, Angskun, et al.
- 2005
(Show Context)
Citation Context ...y state-retaining and critically important for the general functioning of the VM, 2) it is also a possible choke-point when communicating over slow AD interconnects (this issue was recently addressed =-=[13]-=- and an adapted recovery algorithm should be added to future releases of FT-MPI) and 3) it does not support features like replication and load balancing, which would be desirable to improve scalabilit... |
1 |
Applicability of Generic Naming Services and Fault Tolerant Metacomputing with FT-MPI
- Dewolfs, Kurzyniec, et al.
- 2005
(Show Context)
Citation Context ...single points of failure (SPoFs) become an issue when deploying it over slower AD interconnects. One of the critical modules is the FT-MPI name service (NS). We have previously addressed these points =-=[2,1]-=- by developing a proxy-based solution which allows FT-MPI administrators to use any NS of their own choice (including any fault tolerance features available with it). Further, we use features of the H... |
1 |
Combining FT-MPI with H20: Fault-tolerant MPI across administrative boundaries
- Kurzyniec, Sunderam
(Show Context)
Citation Context ...single points of failure (SPoFs) become an issue when deploying it over slower AD interconnects. One of the critical modules is the FT-MPI name service (NS). We have previously addressed these points =-=[2,1]-=- by developing a proxy-based solution which allows FT-MPI administrators to use any NS of their own choice (including any fault tolerance features available with it). Further, we use features of the H... |
1 |
sh: Fault-tolerant dynamic MPI programs on clusters of workstations
- Star
- 1999
(Show Context)
Citation Context ...ature extensive geographical distribution across multiple Administrative Domains (ADs). This raises the issue of fault-tolerance. FT-MPI [7] di ers from other solutions to the fault-tolerance problem =-=[3,4,6,10,5]-=-, in that it allows the application itself to restore it's own state, instead of relying on automated - but potentially unscalable - solutions like global distributed checkpointing. This makes it an i... |
1 |
Process fault-tolerance: Sematics, design and applications for highperformance computing
- Fagg, Gabriel, et al.
- 2004
(Show Context)
Citation Context ...ere has been a growing interest in clustering resources that feature extensive geographical distribution across multiple Administrative Domains (ADs). This raises the issue of fault-tolerance. FT-MPI =-=[7]-=- di ers from other solutions to the fault-tolerance problem [3,4,6,10,5], in that it allows the application itself to restore it's own state, instead of relying on automated - but potentially unscalab... |
1 |
Towards selforganising distributed computing frameworks: The H2O approach
- Kurzyniec, Wrzosek, et al.
(Show Context)
Citation Context ...d solution which allows FT-MPI administrators to use any NS of their own choice (including any fault tolerance features available with it). Further, we use features of the H2O metacomputing framework =-=[8]-=- to span multiple ADs without the need for individual accounts on each system.In this paper, we focus on improving performance of operations over the proxies. We demonstrate the ability of our approa... |
1 |
Integrating Heterogeneous Information Services Using JNDI
- Gorissen, Wendykier, et al.
- 2006
(Show Context)
Citation Context ... cases, plus the front-end in the case of the new design. We ran a number of performance tests comparing the original NS with two alternatives: the LDAP-based OpenLDAP using the Berkeley DB, and HDNS =-=[12]-=-. HDNS is a naming service initially developed for the Harness Project[11]. While developing the SPI, a completely new version of HDNS has been designed and implemented. Both of the NSs tested support... |