|
85
|
FT-MPI: Fault Tolerant MPI, supporting dynamic applications in a dynamic world
– Graham E. Fagg, Jack J. Dongarra
- 2000
|
|
175
|
CoCheck: Checkpointing and Process Migration for MPI
– Georg Stellner
- 1996
|
|
94
|
MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes
– George Bosilca, Aurelien Bouteiller, Franck Cappello, Samir Djailali, Gilles Fedak, Cecile Germain, Thomas Herault, Pierre Lemarinier, Oleg Lodygensky, Frederic Magniette, Vincent Neri, Anton Selikhov
- 2002
|
|
474
|
A Survey of Rollback-Recovery Protocols in Message-Passing Systems
– E. N. ( Mootaz) Elnozahy, Lorenzo Alvisi, Yi-min Wang, David B. Johnson
- 1996
|
|
105
|
Checkpoint and migration of UNIX processes in the condor distributed processing system
– M Litzkow, T Tannenbaum, J Basney, M Livny
- 1997
|
|
62
|
The design and implementation of Berkeley Lab’s linux Checkpoint/Restart
– Jason Duell
- 2003
|
|
929
|
Distributed Snapshots: Determining Global States of Distributed Systems
– K. Mani Chandy
- 1985
|
|
251
|
Libckpt: Transparent Checkpointing under Unix
– James S. Plank, Micah Beck, Gerry Kingsley, Kai Li
- 1995
|
|
119
|
Open MPI: Goals, concept, and design of a next generation MPI implementation
– Edgar Gabriel, Graham E. Fagg, George Bosilca, Thara Angskun, Jack J. Dongarra, Jeffrey M. Squyres, Vishal Sahay, Prabhanjan Kambadur, Brian Barrett, Andrew Lumsdaine, Ralph H. Castain, David J. Daniel, Richard L. Graham, Timothy S. Woodall
- 2004
|
|
651
|
A high-performance, portable implementation of the MPI message passing interface standard
– Ewing Lusk, Nathan Doss, Anthony Skjellum
- 1996
|
|
57
|
A Network-Failure-tolerant Message-Passing system for Terascale Clusters
– Richard L. Graham, Sung-eun Choi, David J. Daniel, Nehal N. Desai, Ronald G. Minnich, Craig E. Rasmussen, L. Dean Risinger, Mitchel W. Sukalski Introduction
- 2003
|
|
84
|
Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations (Extended Abstract)
– Adnan M. Agbaria, et al.
|
|
185
|
LAM: an open cluster environment for MPI
– G Burns, R Daoud, J Vaigl
- 1994
|
|
63
|
A Component Architecture for LAM/MPI
– Jeffrey M. Squyres, Andrew Lumsdaine
- 2003
|
|
181
|
Manetho: Transparent Rollback-Recovery with Low Overhead, Limited Rollback and Fast Output Commit
– Elmootazbellah N. Elnozahy, Willy Zwaenepoel
- 1992
|
|
67
|
Automated Application-level Checkpointing of MPI Programs
– Greg Bronevetsky, Daniel Marques, Keshav Pingali, Paul Stodghill
- 2003
|
|
18
|
Architecture of LA-MPI, a network-fault-tolerant MPI
– Rob T. Aulwes, David J. Daniel, Nehal N. Desai, Richard L. Graham, L. Dean Risinger, Mark A. Taylor, Timothy S. Woodall
- 2004
|
|
30
|
J.J.: HARNESS and fault tolerant MPI
– G E Fagg, A Bukovsky, Dongarra
- 2001
|
|
34
|
Egida: An extensible toolkit for low-overhead fault-tolerance
– Sriram Rao, Lorenzo Alvisi, Harrick M. Viny, Department Computer Sciences
- 1999
|