An Overview of Checkpointing in Uniprocessor and Distributed Systems, Focusing on Implementation and Performance (1997)
| Citations: | 38 - 0 self |
BibTeX
@TECHREPORT{Plank97anoverview,
author = {James Plank},
title = {An Overview of Checkpointing in Uniprocessor and Distributed Systems, Focusing on Implementation and Performance},
institution = {},
year = {1997}
}
OpenURL
Abstract
Checkpointing is the act of saving the state of a running program so that it may be reconstructed later in time. It is an important basic functionality in computing systems that paves the way for powerful tools in many fields of computer science. This article provides a comprehensive overview of checkpointing in uniprocessor and parallel processing systems, including definitions, uses of checkpointing, and implementation details. Also included in this overview is a brief discussion of checkpoint consistency, which is a major concern in parallel processing systems, and a thorough discussion of issues related to the performance of checkpointing. It is intended that the reader of this article should receive a thorough grounding in checkpointing, with enough detail to implement an efficient checkpointer if so desired.







