CoCheck: Checkpointing and Process Migration for MPI (1996)

Cached

Download Links

by Georg Stellner
Venue:IN PROCEEDINGS OF THE 10TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM (IPPS ’96
Citations:203 - 4 self

Documents Related by Co-Citation

271 Libckpt: Transparent Checkpointing under Unix – James S. Plank, Micah Beck, Gerry Kingsley, Kai Li - 1995
547 A Survey of Rollback-Recovery Protocols in Message-Passing Systems – E. N. ( Mootaz) Elnozahy, Lorenzo Alvisi, Yi-min Wang, David B. Johnson - 1996
67 CLIP: A Checkpointing Tool for Message-Passing Parallel Programs – James S. Plank, Yuqun Chen, Kai Li - 1997
87 Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations (Extended Abstract) – Adnan M. Agbaria, et al.
1057 Condor - a hunter of idle workstations – M Litzkow, M Livny, M Mutka - 1988
719 A high-performance, portable implementation of the MPI message passing interface standard – Ewing Lusk, Nathan Doss, Anthony Skjellum - 1996
197 The Performance of Consistent Checkpointing – Elmootazbellah Nabil Elnozahy, David B. Johnson, Willy Zwaenepoel - 1992
190 Manetho: Transparent Rollback-Recovery with Low Overhead, Limited Rollback and Fast Output Commit – Elmootazbellah N. Elnozahy, Willy Zwaenepoel - 1992
102 FT-MPI: Fault Tolerant MPI, supporting dynamic applications in a dynamic world – Graham E. Fagg, Jack J. Dongarra - 2000
1023 Distributed Snapshots: Determining Global States of Distributed Systems – K. Mani Chandy - 1985
39 Ickp --- a consistent checkpointer for multicomputers – J S Plank, K Li - 1994
94 A first order approximation to the optimum checkpoint interval – J W Young - 1974
37 MIST: PVM with Transparent Migration and Checkpointing – Jeremy Casas, Dan Clark, Phil Galbiati, Ravi Konuru, Steve Otto, Robert Prouty, Jonathan Walpole - 1995
70 The condor distributed processing system – T Tannenbaum, M Litzkow - 1995
55 Managing Checkpoints for Parallel Programs – Jim Pruyne, Miron Livny
51 Application Level Fault Tolerance in Heterogeneous Networks of Workstations – Adam Beguelin, Erik Seligman, Peter Stephan - 1997
300 Optimistic recovery in distributed systems – Robert E. Strom, Shaula Yemini - 1985
89 The design and implementation of Berkeley Lab’s linux Checkpoint/Restart – Jason Duell - 2003
213 LAM: An Open Cluster Environment for MPI – G Burns, R Daoud, J Vaigl - 1994