|
1
|
Parallel Checkpoint/Restart for MPI Applications
– Sriram Sankaran, Jeffrey M. Squyres, Brian Barrett, Andrew Lumsdaine
|
|
67
|
The LAM/MPI checkpoint/restart framework: System-initiated checkpointing
– Sriram Sankaran, Jeffrey M. Squyres, Brian Barrett, Andrew Lumsdaine
- 2003
|
|
|
LAM/MPI Installation Guide Version 7.1.1 The LAM/MPI Team Open Systems Lab
– unknown authors
- 2004
|
|
|
LAM/MPI Installation Guide Version 7.1.2 The LAM/MPI Team Open Systems Lab
– unknown authors
|
|
8
|
Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI
– Camille Coti, Thomas Herault, Pierre Lemarinier, Ala Rezmerita, Eric Rodriguez
- 2006
|
|
2
|
Towards MPI progression layer elimination with TCP and SCTP
– Bradley Thomas Penoff
- 2006
|
|
16
|
Improved message logging versus improved coordinated checkpointing for fault tolerant MPI
– Pierre Lemarinier, Aurelien Bouteiller, Thomas Herault, Geraud Krawezik
- 2004
|
|
5
|
MPICH-V Project: a Multiprotocol Automatic Fault Tolerant MPI
– Aurelien Bouteiller , Franck Cappello , Thomas Herault, Geraud Krawezik, Pierre Lemarinier , Frederic Magniette
|
|
3
|
Interconnect agnostic checkpoint/restart in Open MPI
– Joshua Hursey, Timothy I. Mattox, Andrew Lumsdaine
- 2009
|
|
22
|
The Component Architecture of Open MPI: Enabling Third-Party Collective Algorithms
– Jeffrey M. Squyres, Andrew Lumsdaine
- 2004
|
|
2
|
Improving the Communication Subsystem Performance of WARPED
– Umesh Kumar V. Rajasekaran, Umesh Kumar, V. Rajasekaran
- 1998
|
|
208
|
MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface
– Nicholas T. Karonis, Brian Toonen, Ian Foster
- 2002
|
|
2
|
Implementing High-Level Parallelism on Computational GRIDs
– Abdallah Deeb, I. Al Zain, Phil Trinder, Greg Michaelson (supervisors
- 2006
|
|
|
LAM/MPI User's Guide
– Version The Lam
- 2004
|
|
6
|
A checkpoint and restart service specification for open mpi
– Joshua Hursey, Jeffrey M. Squyres, Andrew Lumsdaine
- 2006
|
|
|
ABSTRACT WANG, CHAO. Transparent Fault Tolerance for Job Healing in HPC Environments.
– Chao Wang
|
|
15
|
A job pause service under lam/mpi+blcr for transparent fault tolerance
– Chao Wang, Frank Mueller, Christian Engelmann, Stephen L. Scott
- 2007
|
|
2
|
Recent Advances in Checkpoint/Recovery Systems
– Greg Bronevetsky, Rohit Fern, Daniel Marques, Keshav Pingali, Paul Stodghill
|
|
1
|
Improving MPI Multicast Performance over Grid Environment using Intelligent Message Scheduling
– Theewara Vorakosit, Putchong Uthayopas
- 2004
|