Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations (Extended Abstract)

by Adnan M. Agbaria, et al.
Citations:88 - 6 self

Documents Related by Co-Citation

196 CoCheck: Checkpointing and Process Migration for MPI – Georg Stellner - 1996
542 A Survey of Rollback-Recovery Protocols in Message-Passing Systems – E. N. ( Mootaz) Elnozahy, Lorenzo Alvisi, Yi-min Wang, David B. Johnson - 1996
69 CLIP: A Checkpointing Tool for Message-Passing Parallel Programs – James S. Plank, Yuqun Chen, Kai Li - 1997
113 Checkpoint and migration of unix processes in the condor distributed processing system – M Litzkow, T Tannenbaum, J Basney, M Livny - 1997
38 Egida: An extensible toolkit for low-overhead fault-tolerance – Sriram Rao, Lorenzo Alvisi, Harrick M. Viny, Department Computer Sciences - 1999
187 Manetho: Transparent Rollback-Recovery with Low Overhead, Limited Rollback and Fast Output Commit – Elmootazbellah N. Elnozahy, Willy Zwaenepoel - 1992
101 FT-MPI: Fault Tolerant MPI, supporting dynamic applications in a dynamic world – Graham E. Fagg, Jack J. Dongarra - 2000
721 A high-performance, portable implementation of the MPI message passing interface standard – Ewing Lusk, Nathan Doss, Anthony Skjellum - 1996
114 MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes – George Bosilca, Aurelien Bouteiller, Franck Cappello, Samir Djailali, Gilles Fedak, Cecile Germain, Thomas Herault, Pierre Lemarinier, Oleg Lodygensky, Frederic Magniette, Vincent Neri, Anton Selikhov - 2002
271 Libckpt: Transparent Checkpointing under Unix – James S. Plank, Micah Beck, Gerry Kingsley, Kai Li - 1995
64 A Network-Failure-tolerant Message-Passing system for Terascale Clusters – Richard L. Graham, Sung-eun Choi, David J. Daniel, Nehal N. Desai, Ronald G. Minnich, Craig E. Rasmussen, L. Dean Risinger, Mitchel W. Sukalski Introduction - 2003
297 Optimistic recovery in distributed systems – Robert E. Strom, Shaula Yemini - 1985
1019 Distributed Snapshots: Determining Global States of Distributed Systems – K. Mani Chandy - 1985
40 HARNESS: A Next Generation Distributed Virtual Machine – Micah Beck, Jack J. Dongarra, Graham E. Fagg, G. Al Geist, Paul Gray, James Kohl, Mauro Migliardi, Keith Moore, Terry Moore, Philip Papadopoulous, Stephen L. Scott, Vaidy Sunderam - 1998
19 MPI-FT: portable fault tolerance scheme for MPI. Parallel Processing Letters 10(4):371–382 – S Louca, N Neophytou, A Lachanas, P Evripidou - 2000
20 MPI/FT TM : Architecture and taxonomies for fault-tolerant, message-passing middleware for performance-portable parallel computing – Rajanikanth Batchu, Jothi P. Neelamegam, Zhenqian Cui, Murali Beddhu, Anthony Skjellum, Yoginder D - 2001
62 MPICH-V2: a fault tolerant MPI for volatile nodes based on pessimistic sender based message logging – Aurélien Bouteiller, Thomas Hérault - 2003
53 Application Level Fault Tolerance in Heterogeneous Networks of Workstations – Adam Beguelin, Erik Seligman, Peter Stephan - 1997
55 Managing Checkpoints for Parallel Programs – Jim Pruyne, Miron Livny