Results 1 -
3 of
3
On the effectiveness of a message-driven confidence-driven protocol for guarded software upgrading
- Performance Evaluation
, 2001
"... In order to accomplish dependable onboard evolution, we develop a methodology which is called guarded software upgrading (GSU). The core of the methodology is a low-cost error containment and recovery protocol that escorts an upgraded software component through onboard validation and guarded operati ..."
Abstract
-
Cited by 11 (7 self)
- Add to MetaCart
In order to accomplish dependable onboard evolution, we develop a methodology which is called guarded software upgrading (GSU). The core of the methodology is a low-cost error containment and recovery protocol that escorts an upgraded software component through onboard validation and guarded operation, safeguarding mission functions. The message-driven confidence-driven (MDCD) nature of the protocol elim-inates the need for costly process coordination or atomic action, yet guaranteeing the system to reach a consistent global state upon the completion of the rollback or roll-forward actions carried out by individual processes during error recovery. Aimed at validating the effectiveness of the MDCD protocol with respect to its ability, in a real-istic, non-ideal execution environment, to enhance system reliability when a software component undergoes onboard upgrading, we conduct a stochastic activity network model based analysis. The results confirm the effectiveness of the protocol as origi-nally surmised. Moreover, the model-based analysis provides to us useful insights about the system behavior resulting from the use of the protocol under various conditions in its execution environment, facilitating effective utility of the protocol.
Low-Cost Error Containment and Recovery for Onboard Guarded Software Upgrading and Beyond
- IEEE Trans. Computers
, 2002
"... Message-driven confidence-driven (MDCD) error containment and recovery, a low-cost approach to mitigating the effect of software design faults in distributed embedded systems, is developed for onboard guarded software upgrading for deep-space missions. In this paper, we first describe and verify t ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Message-driven confidence-driven (MDCD) error containment and recovery, a low-cost approach to mitigating the effect of software design faults in distributed embedded systems, is developed for onboard guarded software upgrading for deep-space missions. In this paper, we first describe and verify the MDCD algorithms in which we introduce the notion of "confidence-driven" to complement the "communication-induced" approach employed by a number of existing checkpointing protocols to achieve error containment and recovery efficiency. We then conduct a model-based analysis to show that the algorithms ensure low performance overhead. Finally, we discuss the advantages of the MDCD approach and its potential utility as a general-purpose, low-cost software fault tolerance technique for distributed embedded computing.
On Low-Cost Error Containment and Recovery Methods for Guarded Software Upgrading
- in Proceedings of the 20th International Conference on Distributed Computing Systems (ICDCS 2000
, 2000
"... To assure dependable onboard evolution, we have developed a methodology called guarded software upgrading (GSU). In this paper, we focus on a low-cost approach to error containment and recovery for GSU. To ensure low development cost, we exploit inherent system resource redundancies as the fault tol ..."
Abstract
- Add to MetaCart
To assure dependable onboard evolution, we have developed a methodology called guarded software upgrading (GSU). In this paper, we focus on a low-cost approach to error containment and recovery for GSU. To ensure low development cost, we exploit inherent system resource redundancies as the fault tolerance means. In order to mitigate the effect of residual software faults at low performance cost, we take a crucial step in devising error containment and recovery methods by introducing the "confidencedriven " notion. This notion complements the message-driven (or "communication-induced") approach employed by a number of existing checkpointing protocols for tolerating hardware faults. In particular, we discriminate between the individual software components with respect to our confidence in their reliability, and keep track of changes of our confidence (due to knowledge about potential process state contamination) in particular processes. This, in turn, enables the individual processes in the spaceborne distributed system to make decisions locally, at run-time, on whether to establish a checkpoint upon message passing and whether to roll back or roll forward during error recovery. The resulting message-driven confidence-driven approach enables cost-effective checkpointing and cascading-rollback free recovery.

