|
1
|
Parallel Checkpoint/Restart for MPI Applications
– Sriram Sankaran, Jeffrey M. Squyres, Brian Barrett, Andrew Lumsdaine
|
|
8
|
Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI
– Camille Coti, Thomas Herault, Pierre Lemarinier, Ala Rezmerita, Eric Rodriguez
- 2006
|
|
2
|
Time-based coordinated checkpointing
– Nuno F. Neves
- 1998
|
|
1
|
Egida: A Toolkit for Low-overhead Fault-tolerance
– Sriram S. Rao, Sriram S. Rao, Ph. D, Supervisors Lorenzo Alvisi, Harrick M. Vin
- 1999
|
|
|
A PREEMPTION-BASED META-SCHEDULING SYSTEM FOR DISTRIBUTED COMPUTING
– Sathish Vadhiyar
- 2003
|
|
3
|
Interconnect agnostic checkpoint/restart in Open MPI
– Joshua Hursey, Timothy I. Mattox, Andrew Lumsdaine
- 2009
|
|
5
|
MPICH-V Project: a Multiprotocol Automatic Fault Tolerant MPI
– Aurelien Bouteiller , Franck Cappello , Thomas Herault, Geraud Krawezik, Pierre Lemarinier , Frederic Magniette
|
|
20
|
Network Multicomputing Using Recoverable Distributed Shared Memory
– John B. Carter, Alan L. Cox, Sandhya Dwarkadas, Hya Dwarkadas, Elmootazbellah N. Elnozahy, Pete Keleher, David B. Johnson, Steven Rodrigues, Weimin Yu, Willy Zwaenepoel
- 1993
|
|
17
|
RENEW: A tool for fast and efficient implementation of checkpoint protocols
– Nuno Neves, W. Kent Fuchs
- 1998
|
|
28
|
Checkpointing for peta-scale systems: A look into the future of practical rollback-recovery
– Elmootazbellah N. Elnozahy, James S. Plank
- 2004
|
|
2
|
Recent Advances in Checkpoint/Recovery Systems
– Greg Bronevetsky, Rohit Fern, Daniel Marques, Keshav Pingali, Paul Stodghill
|
|
1
|
Fault Manager for Distributed Operating Environments Design, Implementation, and Performance
– Pierre Sens, Bertil Folliot
- 1998
|
|
|
Dr. D.K. Panda, Adviser
– Karthik Gopalakrishnan B. E, Karthik Gopalakrishnan
|
|
18
|
Application-transparent checkpoint/restart for MPI programs over InfiniBand
– Qi Gao, Weikuan Yu, Wei Huang, Dhabaleswar K. Panda
- 2006
|
|
14
|
The design and implementation of checkpoint/restart process fault tolerance for Open MPI
– Joshua Hursey, Jeffrey M. Squyres, Timothy I. Mattox, Andrew Lumsdaine
- 2007
|
|
38
|
An Overview of Checkpointing in Uniprocessor and Distributed Systems, Focusing on Implementation and Performance
– James Plank
- 1997
|
|
22
|
Processor Allocation and Checkpoint Interval Selection in Cluster Computing Systems
– James S. Plank, Michael G. Thomason
- 2001
|
|
|
apport de rechercheA Framework for High Availability Based on a Single System Image
– Geoffroy Vallée, Christine Morin, Stephen L. Scott, Geoffroy Vallée, Christine Morin, Stephen L. Scott, Projet Paris
|
|
|
A Framework for High Availability Based on a Single System Image
– Èmes Al, Geoffroy Vallée, Christine Morin, Stephen L, Geoffroy Vallée, Christine Morin, Stephen L. Scott, Projet Paris
|