CoCheck: Checkpointing and Process Migration for MPI (1996)

Cached

Download Links

by Georg Stellner
Venue:IN PROCEEDINGS OF THE 10TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM (IPPS ’96
Citations:196 - 4 self

Documents Related by Co-Citation

271 Libckpt: Transparent Checkpointing under Unix – James S. Plank, Micah Beck, Gerry Kingsley, Kai Li - 1995
542 A Survey of Rollback-Recovery Protocols in Message-Passing Systems – E. N. ( Mootaz) Elnozahy, Lorenzo Alvisi, Yi-min Wang, David B. Johnson - 1996
69 CLIP: A Checkpointing Tool for Message-Passing Parallel Programs – James S. Plank, Yuqun Chen, Kai Li - 1997
88 Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations (Extended Abstract) – Adnan M. Agbaria, et al.
1051 Condor - A Hunter of Idle Workstations – M Litzkow, M Livny, M W Mutka - 1988
721 A high-performance, portable implementation of the MPI message passing interface standard – Ewing Lusk, Nathan Doss, Anthony Skjellum - 1996
197 The Performance of Consistent Checkpointing – Elmootazbellah Nabil Elnozahy, David B. Johnson, Willy Zwaenepoel - 1992
187 Manetho: Transparent Rollback-Recovery with Low Overhead, Limited Rollback and Fast Output Commit – Elmootazbellah N. Elnozahy, Willy Zwaenepoel - 1992
101 FT-MPI: Fault Tolerant MPI, supporting dynamic applications in a dynamic world – Graham E. Fagg, Jack J. Dongarra - 2000
1019 Distributed Snapshots: Determining Global States of Distributed Systems – K. Mani Chandy - 1985
39 ickp: A Consistent Checkpointer for Multicomputers – J S Plank, K Li - 1994
92 A first order approximation to the optimum checkpoint interval – J W Young - 1974
37 MIST: PVM with Transparent Migration and Checkpointing – Jeremy Casas, Dan Clark, Phil Galbiati, Ravi Konuru, Steve Otto, Robert Prouty, Jonathan Walpole - 1995
70 The condor distributed processing system – T Tannenbaum, M Litzkow - 1995
55 Managing Checkpoints for Parallel Programs – Jim Pruyne, Miron Livny
53 Application Level Fault Tolerance in Heterogeneous Networks of Workstations – Adam Beguelin, Erik Seligman, Peter Stephan - 1997
297 Optimistic recovery in distributed systems – Robert E. Strom, Shaula Yemini - 1985
52 On the Use and Implementation of Message Logging – Elmootazbellah Elnozahy, Willy Zwaenepoel - 1994
83 The design and implementation of Berkeley Lab’s linux Checkpoint/Restart – Jason Duell - 2003