Lazy Checkpoint Coordination for Bounding Rollback Propagation (1993)
| Venue: | in Proc. IEEE Symp. Reliable Distributed Syst |
| Citations: | 54 - 7 self |
BibTeX
@INPROCEEDINGS{Wang93lazycheckpoint,
author = {Yi-min Wang and W. Kent Fuchs},
title = {Lazy Checkpoint Coordination for Bounding Rollback Propagation},
booktitle = {in Proc. IEEE Symp. Reliable Distributed Syst},
year = {1993},
pages = {78--85}
}
Years of Citing Articles
OpenURL
Abstract
In this paper, we propose the technique of lazy checkpoint coordination which preserves process autonomy while employing communication-induced checkpoint coordination for bounding rollback propagation. The notion of laziness is introduced to control the coordination frequency and allow a flexible trade-off between the cost of checkpoint coordination and the average rollback distance. Worst-case overhead analysis provides a means for estimating the extra checkpoint overhead. Communication trace-driven simulation for several parallel programs is used to evaluate the benefits of the proposed scheme. 1 Introduction Uncoordinated checkpointing [1--3] for parallel and distributed systems allows maximum process autonomy and independent design of recovery capability for each process. However, in a general nondeterministic execution, cascading rollback propagation may result in the domino effect [4] which can prevent progression of the recovery line. It has been shown that message reordering [...







