## Replay, Recovery, Replication, and Snapshots of Nondeterministic Concurrent Programs (1990)

Venue: | In ACM Symposium on Principles of Distributed Computing |

Citations: | 5 - 0 self |

### BibTeX

@TECHREPORT{Gaifman90replay,recovery,,

author = {Haim Gaifman and Michael J. Maher and Ehud Shapiro},

title = {Replay, Recovery, Replication, and Snapshots of Nondeterministic Concurrent Programs},

institution = {In ACM Symposium on Principles of Distributed Computing},

year = {1990}

}

### OpenURL

### Abstract

The problem of replaying computations of nondeterministic concurrent programs arises in contexts such as debugging and recovery. We investigate the problem for an abstract model of concurrency, which generalizes dataflow networks, processors with shared variables, and logic programming models of concurrency. We say that nondeterminism is visible if the state is determined, up to some (appropriately defined) notion of equivalence, by the external behavior. We show that if nondeterminism is visible then replay is achievable using a one-step lookahead sequential simulation algorithm. If the program has an additional monotonicity property called stability then recovery is possible without simulating the original computation, by restarting the program from a certain easily constructed state. Also, for stable programs with visible nondeterminism, a process composed of identical parallel processes has the same external behavior as each of its components. Hence high crash-failure res...

### Citations

1340 |
A Calculus of Communicating Systems
- Milner
- 1980
(Show Context)
Citation Context ...tive step z t ) z 0 is from z to z 0 . Note that a computation can be broken into reactive steps that correspond to the trace items. The following definition is a reformulation of Milner's definition =-=[22]-=- for the asynchronous case. Definition 8 Next-step equivalence, ' ns Two states z i = hs i ; d i i i = 0; 1 are next-step equivalent, denoted z 1 ' ns z 2 , if d 0 = d 1 and for every active step from... |

1027 | Distributed snapshots: Determining global states in distributed systems
- Chandy, Lamport
- 1985
(Show Context)
Citation Context ...a "similar" computation from z 0 to z n . Since bisimulation equivalent states are indistinguishable, it suffices to get a snapshot up to equivalence. The concept was introduced by Chandy an=-=d Lamport [5]-=- in the context of dataflow networks. They give an elegant method for obtaining, during an ongoing dataflow computation (and without interfering with it) a (global) state that occurs in a computation ... |

367 | Hierarchical correctness proofs for distributed algorithms
- LYNCH, TUTTLE
- 1987
(Show Context)
Citation Context ...message, write a variable, instantiate a logic variable or place a constraint. The model generalizes shared variables models, dataflow networks, concurrent logic/constraints programs and I/O Automata =-=[19]-=-, in the sense that it admits simple sound embeddings of these models [28]. In this model processes share a common store whose values range over some partially ordered set. Processes interact by upgra... |

300 | Optimistic recovery in distributed systems
- Strom, Yemini
- 1985
(Show Context)
Citation Context ...f Science, Rehovot, 76100, Israel 0 1 Introduction Replay The problem of replaying concurrent computations has been investigated extensively, in relation to both debugging [1, 4, 15, 21] and recovery =-=[30, 31]-=-. Informally, the problem is: given the initial state and external behavior of a computation, construct a computation with the same initial state and the same external behavior. The problem is related... |

296 |
Debugging parallel programs with instant replay
- LeBlanc, Mellor-Crummey
- 1987
(Show Context)
Citation Context ...ce, The Weizmann Institute of Science, Rehovot, 76100, Israel 0 1 Introduction Replay The problem of replaying concurrent computations has been investigated extensively, in relation to both debugging =-=[1, 4, 15, 21]-=- and recovery [30, 31]. Informally, the problem is: given the initial state and external behavior of a computation, construct a computation with the same initial state and the same external behavior. ... |

270 | The Family of Concurrent Logic Programming Languages - Shapiro - 1989 |

226 |
Shared virtual memory on loosely coupled multiprocessors
- Li
- 1986
(Show Context)
Citation Context ...l) shared store (as in ordinary message passing systems such as the Cosmic Environment [2], distributed implementations of concurrent logic languages [13, 14, 32], and models of shared virtual memory =-=[16]-=-). 2.1 Reactive transition systems By a domain hD; i we mean a set D partially ordered by , which has a least element. We do not require the domain to be a cpo, only that it be complete relative to th... |

191 |
Concurrent Constraint Programming Languages
- Saraswat
- 1989
(Show Context)
Citation Context ...y; if it is a passive step then Rules (2) must apply; contradiction. 2 4 Stability The following definition of stability is a generalization of a notion known in concurrent logic/constraint languages =-=[26, 25]. (It-=- is unrelated to stability as defined by Milner [22]). Definition 11 Stable transition system T is stable if hs; di ! hs 0 ; d 0 i 2 T " implies that, for all d 00sd 0 , there is a computation co... |

171 | D.: Debugging concurrent programs
- McDowell, Helmbold
- 1989
(Show Context)
Citation Context ...ce, The Weizmann Institute of Science, Rehovot, 76100, Israel 0 1 Introduction Replay The problem of replaying concurrent computations has been investigated extensively, in relation to both debugging =-=[1, 4, 15, 21]-=- and recovery [30, 31]. Informally, the problem is: given the initial state and external behavior of a computation, construct a computation with the same initial state and the same external behavior. ... |

160 |
An application of games to the completeness problem for formalized theories
- Ehrenfeucht
- 1961
(Show Context)
Citation Context ...ynchronous case. A precise definition in terms of games is given below. Definitions of this kind, due to Fraisse [10], go back to the fifties and have been used extensively in mathematical logic (cf. =-=[7]-=-, [6], pp. 35-36). They make for easy intuitive proofs. Definition 10 Bisimulation game, bisimulation equivalence. ffl Let p i = (z i ; T i ); i = 0; 1 be two processes. The bisimulation game BG(p 0 ;... |

156 |
Multicomputers: Message-passing concurrent computers
- Athas, Seitz
- 1988
(Show Context)
Citation Context ...h processor holds some component of the store, and the least upper bound of the components represents the (virtual) shared store (as in ordinary message passing systems such as the Cosmic Environment =-=[2]-=-, distributed implementations of concurrent logic languages [13, 14, 32], and models of shared virtual memory [16]). 2.1 Reactive transition systems By a domain hD; i we mean a set D partially ordered... |

129 | Logic Semantics for a Class of Committed-Choice Programs - Maher - 1987 |

110 |
Efficient distributed recovery using message logging
- Sistla, Welch
- 1989
(Show Context)
Citation Context ...f Science, Rehovot, 76100, Israel 0 1 Introduction Replay The problem of replaying concurrent computations has been investigated extensively, in relation to both debugging [1, 4, 15, 21] and recovery =-=[30, 31]-=-. Informally, the problem is: given the initial state and external behavior of a computation, construct a computation with the same initial state and the same external behavior. The problem is related... |

73 |
Low-latency, concurrent checkpointing for parallel programs
- Li, Naughton, et al.
- 1994
(Show Context)
Citation Context ... actual computation. On the other hand our method employs stronger means than that of [5]: we assume that one can checkpoint the local states in some order and then checkpoint the entire store. (c.f. =-=[17] for a rea-=-l-time checkpointing algorithm for shared memory multiprocessors.) For stable concurrent logic programs, a solution to a "half snapshot problem" is given in [26]: A state, say z, is construc... |

48 | On describing the behavior and implementation of distributed systems - Lynch, Fischer - 1981 |

16 |
Determinacy → (observation equivalence = trace equivalence
- Engelfriet
- 1985
(Show Context)
Citation Context ...ayer 2 responds with the passive step that has the same store augmentation (the response that both strategies must recommend). 2 The next theorem is the asynchronous analog of a theorem of Engelfriet =-=[8]-=- for the case of CCS. Theorem 2 If the nondeterminism of T is visible, then the three equivalence relations ' ns , ' t and ' coincide. This easily implies that visible nondeterminism can be equivalent... |

12 | Separating concurrent languages with categories of language embeddings
- Shapiro
- 1991
(Show Context)
Citation Context ...int. The model generalizes shared variables models, dataflow networks, concurrent logic/constraints programs and I/O Automata [19], in the sense that it admits simple sound embeddings of these models =-=[28]-=-. In this model processes share a common store whose values range over some partially ordered set. Processes interact by upgrading that value. The state of a process is composed of its internal state,... |

11 |
Parallel Logic Programming Techniques
- Taylor
- 1989
(Show Context)
Citation Context ...per bound of the components represents the (virtual) shared store (as in ordinary message passing systems such as the Cosmic Environment [2], distributed implementations of concurrent logic languages =-=[13, 14, 32]-=-, and models of shared virtual memory [16]). 2.1 Reactive transition systems By a domain hD; i we mean a set D partially ordered by , which has a least element. We do not require the domain to be a cp... |

9 |
A parallel implementation of flat concurrent prolog
- Taylor, Safra, et al.
- 1987
(Show Context)
Citation Context ...red store can be characterized as the least upper bound of the local stores. This is the case with dataflow-like message passing systems [2], distributed implementations of concurrent logic languages =-=[14, 33, 13]-=-, and distributed implementations of shared virtual memory [16]. Our treatment should be extended to handle this case in the abstract setting, which requires composing transition systems over differen... |

8 |
A somewhat logical formulation of CLP synchronization primitives
- Saraswat
- 1988
(Show Context)
Citation Context ...ate the question for concurrent constraint programs with blocking-ask and atomic-tell operations (c.f. [25]) employing the framework developed in [11]. The language CC we work with is the language of =-=[23, 11]-=-, modified so that failed processes behave chaotically instead of simply aborting the computation. This modification allows us to ignore the answer substitution/constraint of failed computations. We f... |

7 |
Model Theory (North
- Chang, Keisler
- 1973
(Show Context)
Citation Context ...onous case. A precise definition in terms of games is given below. Definitions of this kind, due to Fraisse [10], go back to the fifties and have been used extensively in mathematical logic (cf. [7], =-=[6]-=-, pp. 35-36). They make for easy intuitive proofs. Definition 10 Bisimulation game, bisimulation equivalence. ffl Let p i = (z i ; T i ); i = 0; 1 be two processes. The bisimulation game BG(p 0 ; p 1 ... |

6 |
Reactive behavior semantics for concurrent constraint logic programs
- Gaifman, Maher, et al.
- 1989
(Show Context)
Citation Context ...to make a program have it, if it does not? We investigate the question for concurrent constraint programs with blocking-ask and atomic-tell operations (c.f. [25]) employing the framework developed in =-=[11]-=-. The language CC we work with is the language of [23, 11], modified so that failed processes behave chaotically instead of simply aborting the computation. This modification allows us to ignore the a... |

5 |
Reproducible testing of concurrent programs based on shared variables
- Carver, Tai
- 1986
(Show Context)
Citation Context ...ce, The Weizmann Institute of Science, Rehovot, 76100, Israel 0 1 Introduction Replay The problem of replaying concurrent computations has been investigated extensively, in relation to both debugging =-=[1, 4, 15, 21]-=- and recovery [30, 31]. Informally, the problem is: given the initial state and external behavior of a computation, construct a computation with the same initial state and the same external behavior. ... |

4 |
D.: Detecting stable properties of networks in concurrent logic programming languages
- Saraswat, Kahn, et al.
- 1988
(Show Context)
Citation Context ...y; if it is a passive step then Rules (2) must apply; contradiction. 2 4 Stability The following definition of stability is a generalization of a notion known in concurrent logic/constraint languages =-=[26, 25]. (It-=- is unrelated to stability as defined by Milner [22]). Definition 11 Stable transition system T is stable if hs; di ! hs 0 ; d 0 i 2 T " implies that, for all d 00sd 0 , there is a computation co... |

3 |
quelques classifications des systèmes de relations. Publications Scientifiques de l’Université d’Alger
- Sur
- 1954
(Show Context)
Citation Context ...e use is based on reactive steps. It is a natural adaptation of bisimulation for the asynchronous case. A precise definition in terms of games is given below. Definitions of this kind, due to Fraisse =-=[10]-=-, go back to the fifties and have been used extensively in mathematical logic (cf. [7], [6], pp. 35-36). They make for easy intuitive proofs. Definition 10 Bisimulation game, bisimulation equivalence.... |

2 |
Implementation of committed-choice logic language on shared memory multiprocessors
- Crammond
- 1988
(Show Context)
Citation Context ...ly in Figure 1. In concrete models the shared store can be implemented using physically shared memory (as in shared variable models [9] and shared memory implementations of concurrent logic languages =-=[3]-=-), a store controlled by a single processor, accessed via messages (as in database systems [34]), or a distributed store, in which each processor holds some component of the store, and the least upper... |

2 |
A dsitributed implementation
- Ichiyoshi, Miyazaki, et al.
- 1987
(Show Context)
Citation Context ...per bound of the components represents the (virtual) shared store (as in ordinary message passing systems such as the Cosmic Environment [2], distributed implementations of concurrent logic languages =-=[13, 14, 32]-=-, and models of shared virtual memory [16]). 2.1 Reactive transition systems By a domain hD; i we mean a set D partially ordered by , which has a least element. We do not require the domain to be a cp... |

1 | Full abstraction for Asynchronous Concurrency, in preparation - Gaifman, Maher, et al. |

1 |
A Distributed Variable Server for Atomic Unification, to appear
- Kleinman, Moses, et al.
- 1990
(Show Context)
Citation Context ...per bound of the components represents the (virtual) shared store (as in ordinary message passing systems such as the Cosmic Environment [2], distributed implementations of concurrent logic languages =-=[13, 14, 32]-=-, and models of shared virtual memory [16]). 2.1 Reactive transition systems By a domain hD; i we mean a set D partially ordered by , which has a least element. We do not require the domain to be a cp... |