Results 1 - 10
of
15
Fault Tolerance in Concurrent Object-Oriented Software through Coordinated Error Recovery
- FTCS-25 SUBMISSION
"... This paper presents a scheme for coordinated error recovery between multiple interacting objects in a concurrent object-oriented system. A conceptual framework for fault tolerance is established based on a general object concurrency model that is supported by most concurrent object-oriented language ..."
Abstract
-
Cited by 85 (41 self)
- Add to MetaCart
This paper presents a scheme for coordinated error recovery between multiple interacting objects in a concurrent object-oriented system. A conceptual framework for fault tolerance is established based on a general object concurrency model that is supported by most concurrent object-oriented languages and systems. This framework integrates two complementary concepts — conversations and transactions. Conversations (associated with cooperative exception handling) are used to provide coordinated error recovery between concurrent interacting activities whilst transactions are used to maintain the consistency of shared resources in the presence of concurrent access. The serialisability property of transactions is exploited in order to help prevent unexpected information smuggling. The proposed framework is illustrated by means of a case study, and various linguistic and implementation issues are discussed.
The evolution of the recovery block concept
- IN SOFTWARE FAULT TOLERANCE
, 1994
"... This chapter reviews the development of the recovery block approach to software fault tolerance and subsequent work based on this approach. It starts with an account of the development and implementations of the basic recovery block scheme in the early 1970s at Newcastle, and then goes on to describ ..."
Abstract
-
Cited by 43 (2 self)
- Add to MetaCart
This chapter reviews the development of the recovery block approach to software fault tolerance and subsequent work based on this approach. It starts with an account of the development and implementations of the basic recovery block scheme in the early 1970s at Newcastle, and then goes on to describe work at Newcastle and elsewhere on extensions to the basic scheme, recovery in concurrent systems, and linguistic support for recovery blocks based on the use of object-oriented programming concepts.
Abstractions for Constructing Dependable Distributed Systems
, 1992
"... ions for Constructing Dependable Distributed Systems Shivakant Mishra 1 and Richard D. Schlichting TR 92-19 Abstract Distributed systems, in which multiple machines are connected by a communications network, are often used to build highly dependable computing systems. However, constructing the softw ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
ions for Constructing Dependable Distributed Systems Shivakant Mishra 1 and Richard D. Schlichting TR 92-19 Abstract Distributed systems, in which multiple machines are connected by a communications network, are often used to build highly dependable computing systems. However, constructing the software required to realize such dependability is a difficult task since it requires the programmer to build fault-tolerant software that can continue to function despite failures. To simplify this process, canonical structuring techniques or programming paradigms have been developed, including the object/action model, the primary/backup approach, the state machine approach, and conversations. In this paper, some of the system abstractions designed to support these paradigms are described. These abstractions, which are termed fault-tolerant services, can be categorized into two types. One type provides functionality similar to standard hardware or operating system services, but with improved ...
MEMSY - A Modular Expandable Multiprocessor System with Fault Tolerance
, 1994
"... The experimental multiprocessor system MEMSY 2 will be described. This system was built to validate the concept of the MEMSY architecture - a scalable multiprocessor architecture based on local shared-memory. Main application areas are scientific computations with a high demand for processing powe ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The experimental multiprocessor system MEMSY 2 will be described. This system was built to validate the concept of the MEMSY architecture - a scalable multiprocessor architecture based on local shared-memory. Main application areas are scientific computations with a high demand for processing power and memory capacity. In designing the hardware architecture the extensive use of standard components was a condition. The programming model of MEMSY is custom made reflecting its real hardware structure whereas th eoperating system is a Unix extension. In massively parallel systems with its complexity and large number of components the chance of a single or multiple failure is no longer negligible. It is clear that the redundancy, reconfigurability and diagnosis techniques must be incorporated at the design stage itself and not as a subsequent add-on. Keywords: Multiprocessor, MIMD, Scalability, Fault Tolerance 1. Introduction: Motivation and Design Goals There are some well known reaso...
On Structuring Cooperative and Competitive Concurrent Systems
- COMPUTER JOURNAL
, 1999
"... Developing advanced structuring techniques has always been of great importance for computer science and practice. Many structuring approaches are used to help capture certain characteristics of applications: group communications, replication features, file services, etc. Procedures were among the fi ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Developing advanced structuring techniques has always been of great importance for computer science and practice. Many structuring approaches are used to help capture certain characteristics of applications: group communications, replication features, file services, etc. Procedures were among the first general techniques intended for structuring application software. They reflect both the static and dynamic structures of sequential systems (a stack of procedure contexts of nested calls represents the state of program execution). The situation is much more complex in concurrent systems, in which the states of several concurrent components should be taken into consideration while describing the system behaviour. The purpose of this survey is to outline recent trends in developing structuring approaches for competitive and cooperative concurrent systems and to discuss different directions of research in this area and their interrelations.
Asynchronous Construction of Consistent Global Snapshots in the Object and Action Model
, 1998
"... The Object and Action Model (OAM) is well-known as an adequate paradigm to build fault-tolerant configurable distributed applications. The reconfiguration of an application depends on the construction of a consistent global snapshot of its global state. An atomic action that reads the states of all ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The Object and Action Model (OAM) is well-known as an adequate paradigm to build fault-tolerant configurable distributed applications. The reconfiguration of an application depends on the construction of a consistent global snapshot of its global state. An atomic action that reads the states of all objects of the application is a simple and straightforward way to obtain such global snapshot, but reduces concurrency and interferes with the underlying computation. In the Process and Message Model (PMM) consistent snapshots can be constructed asynchronously by a component that passively receives process states. This paper presents OAM-based asynchronous global snapshot algorithms equivalent to PMM-based algorithms, built using a precedence relation defined for atomic actions. Arjuna, an object-oriented action-based distributed programming environment, has been used to implement these OAM-based global snapshot algorithms, allowing us to conclude that our approach is promising. 1. Introduc...
Approaches to Software Fault Tolerance
- Proc. the 25th Annual LAAS Conference
, 1993
"... A personal and rather discursive account is given of the background to the start of work in the early 1970s at Newcastle on software fault tolerance, and of how work has developed to encompass forward as well as backward error recovery, and parallel and distributed software as well as sequential ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
A personal and rather discursive account is given of the background to the start of work in the early 1970s at Newcastle on software fault tolerance, and of how work has developed to encompass forward as well as backward error recovery, and parallel and distributed software as well as sequential programs. A major theme of the paper is that of the links between this work and that carried out elsewhere in connection with the topic of objectoriented programming, in particular on concepts such as generic classes and functions, exception-handling, delegation and reflection.
Recovery in Heterogeneous System
- PDCS-2 ESPRIT Basic Research Project
, 1994
"... this paper planned conversations (in which a set of communicating processes are rolled back together), which also allow the recomputation after roll-back to use different code from the first computation, so that errors caused by software design faults may not be repeated at the new execution [Randel ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
this paper planned conversations (in which a set of communicating processes are rolled back together), which also allow the recomputation after roll-back to use different code from the first computation, so that errors caused by software design faults may not be repeated at the new execution [Randell 1975], and atomic transactions (in which a sequence of changes on a set of data items are undone together), which allow the designer to manage together error recovery and concurrency control in accessing data [Lynch, Merrit et al. 1993]. Conversations and atomic transactions are in fact dual models of recovery, as discussed in [Shrivastava, Mancini et al. 1993]: they are two ways of describing the same backward recovery philosophy in the two models (or design styles) which the authors of [Shrivastava, Mancini et al. 1993] call the "object-action" model (where the long-term state of the computation is encapsulated in data objects, and active processes invoke operations on these objects), and the "process-conversation" model (where the state is contained in the processes, which communicate via messages).

