The design and implementation of checkpoint/restart process fault tolerance for Open MPI (2007)

by Joshua Hursey , Jeffrey M. Squyres , Timothy I. Mattox , Andrew Lumsdaine
Venue:In Workshop on Dependable Parallel, Distributed and Network-Centric Systems(DPDNS), in conjunction with IPDPS
Citations:25 - 2 self