Improving Availability with Recursive Micro-Reboots: A Soft-State System Case Study (2003)
by
George Candea
,
James Cutler
,
Armando Fox
| Citations: | 35 - 4 self |
BibTeX
@MISC{Candea03improvingavailability,
author = {George Candea and James Cutler and Armando Fox},
title = {Improving Availability with Recursive Micro-Reboots: A Soft-State System Case Study},
year = {2003}
}
Years of Citing Articles
OpenURL
Abstract
Even after decades of software engineering research, complex computer systems still fail. This paper makes the case for increasing research emphasis on dependability and, specifically, on improving availability by reducing time-to-recover. All software fails at some point, so systems must be able to recover from failures. Recovery itself can fail too, so systems must know how to intelligently retry their recovery. We present here a recursive approach, in which a minimal subset of components is recovered first







