Results 1 - 10
of
22
Libckpt: Transparent Checkpointing under Unix
, 1995
"... Checkpointing is a simple technique for rollback recovery: the state of an executing program is periodically saved to a disk file from whichitcan be recovered after a failure. While recent research has developed a collection of powerful techniques for minimizing the overhead of writing checkpoint f ..."
Abstract
-
Cited by 251 (15 self)
- Add to MetaCart
Checkpointing is a simple technique for rollback recovery: the state of an executing program is periodically saved to a disk file from whichitcan be recovered after a failure. While recent research has developed a collection of powerful techniques for minimizing the overhead of writing checkpoint files, checkpointing remains unavailable to most application developers. In this paper we describe libckpt, a portable checkpointing tool for Unix that implements all applicable performance optimizations which are reported in the literature. While libckpt can be used in a mode whichis almost totally transparent to the programmer, it also supports the incorporation of user directives into the creation of checkpoints. This user-directed checkpointing is an innovation which is unique to our work.
Lively Linear Lisp - 'Look Ma, No Garbage!'
- ACM Sigplan Notices
, 1992
"... Linear logic has been proposed as one solution to the problem of garbage collection and providing efficient "updatein -place" capabilities within a more functional language. Linear logic conserves accessibility, and hence provides a mechanical metaphor which is more appropriate for a distributed-me ..."
Abstract
-
Cited by 91 (6 self)
- Add to MetaCart
Linear logic has been proposed as one solution to the problem of garbage collection and providing efficient "updatein -place" capabilities within a more functional language. Linear logic conserves accessibility, and hence provides a mechanical metaphor which is more appropriate for a distributed-memory parallel processor in which copying is explicit. However, linear logic's lack of sharing may introduce significant inefficiencies of its own. We show an efficient implementation of linear logic called Linear Lisp that runs within a constant factor of non-linear logic. This Linear Lisp allows RPLACX operations, and manages storage as safely as a non-linear Lisp, but does not need a garbage collector. Since it offers assignments but no sharing, it occupies a twilight zone between functional languages and imperative languages. Our Linear Lisp Machine offers many of the same capabilities as combinator/graph reduction machines, but without their copying and garbage collection problems. Intr...
Diskless Checkpointing
, 1997
"... Diskless Checkpointing is a technique for checkpointing the state of a long-running computation on a distributed system without relying on stable storage. As such, it eliminates the performance bottleneck of traditional checkpointing on distributed systems. In this paper, we motivate diskless checkp ..."
Abstract
-
Cited by 91 (3 self)
- Add to MetaCart
Diskless Checkpointing is a technique for checkpointing the state of a long-running computation on a distributed system without relying on stable storage. As such, it eliminates the performance bottleneck of traditional checkpointing on distributed systems. In this paper, we motivate diskless checkpointing and present the basic diskless checkpointing scheme along with several variants for improved performance. The performance of the basic scheme and its variants is evaluated on a high-performance network of workstations and compared to traditional disk-based checkpointing. We conclude that diskless checkpointing is a desirable alternative to disk-based checkpointing that can improve the performance of distributed applications in the face of failures.
Debugging with Dynamic Slicing and Backtracking
- Software Practice and Experience
, 1993
"... this paper we present a debugging model, based on dynamic program slicing and execution backtracking techniques, that easily lends itself to automation. This model is based on experience with using these techniques to debug software. We also present a prototype debugging tool, SPYDER, that explicitl ..."
Abstract
-
Cited by 80 (0 self)
- Add to MetaCart
this paper we present a debugging model, based on dynamic program slicing and execution backtracking techniques, that easily lends itself to automation. This model is based on experience with using these techniques to debug software. We also present a prototype debugging tool, SPYDER, that explicitly supports the proposed model, and with which we are performing further debugging research
Pointer Swizzling at Page Fault Time: Efficiently and Compatibly Supporting Huge Address Spaces on Standard Hardware
- Computer Architecture News
, 1992
"... Pointer swizzling at page fault time is a novel address translation mechanism that exploits conventional address translation hardware. It can support huge address spaces efficiently without long hardware addresses; such large address spaces are attractive for persistent object stores, distributed sh ..."
Abstract
-
Cited by 78 (0 self)
- Add to MetaCart
Pointer swizzling at page fault time is a novel address translation mechanism that exploits conventional address translation hardware. It can support huge address spaces efficiently without long hardware addresses; such large address spaces are attractive for persistent object stores, distributed shared memories, and shared address space operating systems. This swizzling scheme can be used to provide data compatibility across machines with different word sizes, and even to provide binary code compatibility across machines with different hardware address sizes. Pointers are translated ("swizzled") from a long format to a shorter hardware-supported format at page fault time. No extra hardware is required, and no continual software overhead is incurred by presence checks or indirection of pointers. This pagewise technique exploits temporal and spatial locality in much the same way as a normal virtual memory; this gives it many desirable performance characteristics, especially given the tr...
Debugging Standard ML Without Reverse Engineering
- In Proceedings of the 1990 ACM Conference on Lisp and Functional Programming
, 1990
"... We have built a novel and efficient replay debugger for our Standard ML compiler. Debugging facilities are provided by instrumenting the user's source code; this approach, made feasible by ML's safety property, is machineindependent and back-end independent. Replay is practical because ML is normall ..."
Abstract
-
Cited by 43 (3 self)
- Add to MetaCart
We have built a novel and efficient replay debugger for our Standard ML compiler. Debugging facilities are provided by instrumenting the user's source code; this approach, made feasible by ML's safety property, is machineindependent and back-end independent. Replay is practical because ML is normally used functionally, and our compiler uses continuation-passing style; thus most of the program's state can be checkpointed quickly and compactly using call-with-current-continuation. Together, instrumentation and replay support a simple and elegant debugger featuring full variable display, polymorphic type resolution, stack trace-back, breakpointing, and reverse execution, even though our compiler is very highly optimizing and has no run-time stack. 1. Introduction Traditional "source-level" debuggers do their real work at machine level. They rely on detailed information about the underlying machine model, compiler back end, and runtime system. Although debuggers typically have access to t...
Compressed Differences: An Algorithm for Fast Incremental Checkpointing
, 1995
"... The overhead of saving checkpoints to stable storage is the dominant performance cost in checkpointing systems. In this paper, we present a complete study of compressed differences, a new algorithm for fast incremental checkpointing. Compressed differences reduce the overhead of checkpointing by sav ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
The overhead of saving checkpoints to stable storage is the dominant performance cost in checkpointing systems. In this paper, we present a complete study of compressed differences, a new algorithm for fast incremental checkpointing. Compressed differences reduce the overhead of checkpointing by saving only the words that have changed in the current checkpointing interval while monitoring those changes using page protection. We describe two checkpointing algorithms based on compressed differences, called standard and online compressed differences. These algorithms are analyzed in detail to determine the conditions that are necessary for them to improve the performance of checkpointing. We then present results of implementing these algorithms in a uniprocessor checkpointing system. These results both corroborate the analysis and show that in this environment, standard compressed differences almost invariably improve the performance of both sequential and incremental checkpointing.
Memory Exclusion: Optimizing the Performance of Checkpointing Systems
, 1996
"... Checkpointing systems are a convenient way for users to make their programs fault-tolerant by intermittently saving program state to disk, and restoring that state following a failure. The main concern with checkpointing is the overhead that it adds to running time of the program. This paper describ ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Checkpointing systems are a convenient way for users to make their programs fault-tolerant by intermittently saving program state to disk, and restoring that state following a failure. The main concern with checkpointing is the overhead that it adds to running time of the program. This paper describes memory exclusion, an important class of optimizations that reduce the overhead of checkpointing. These optimizations have been implemented in two checkpointers: libckpt, which works on Unix-based workstations, and libNXckpt, which works on the Intel Paragon. Both checkpointers are publicly available at no cost. We have checkpointed various long-running applications with both checkpointers and have explored the performance improvements that may be gained through memory exclusion. Results from these experiments are presented and show that the improvements are significant. We conclude that all checkpointing systems should include primitives allowing programmers and users to gain the full ben...
Improving the Performance of Coordinated Checkpointers on Networks of Workstations using RAID Techniques
, 1996
"... Coordinated checkpointing systems are popular and general-purpose tools for implementing process migration, coarse-grained job swapping, and fault-tolerance on networks of workstations. Though simple in concept, there are several design decisions concerning the placement of checkpoint files that can ..."
Abstract
-
Cited by 22 (10 self)
- Add to MetaCart
Coordinated checkpointing systems are popular and general-purpose tools for implementing process migration, coarse-grained job swapping, and fault-tolerance on networks of workstations. Though simple in concept, there are several design decisions concerning the placement of checkpoint files that can impact the performance and functionality of coordinated checkpointers. Although several such checkpointers have been implemented for popular programming platforms like PVM and MPI, none have taken this issue into consideration. This paper addresses the issue of checkpoint placement and its impact on the performance and functionality of coordinated checkpointing systems. Several strategies, both old and new, are described and implemented on a network of SPARC-5 workstations running PVM. These strategies range from very simple to more complex, borrowing heavily from ideas in RAID (Redundant Arrays of Inexpensive Disks) faulttolerance. The results of this paper will serve as a guide so that f...
Persistence + Undoability = Transactions
- In Proc. of HICSS-25
, 1992
"... Persistence means objects live potentially forever. Undoability means that any change to a program's store can potentially be undone. In our design and implementation of support for single-threaded nested transactions in Standard ML of New Jersey (SML/NJ), we provide persistence and undoability as o ..."
Abstract
-
Cited by 21 (9 self)
- Add to MetaCart
Persistence means objects live potentially forever. Undoability means that any change to a program's store can potentially be undone. In our design and implementation of support for single-threaded nested transactions in Standard ML of New Jersey (SML/NJ), we provide persistence and undoability as orthogonal features and combine them in a simple and elegant manner. We provide support for persistence through an SML interface that lets users manipulate a set of persistent roots and provides a save function that causes all data reachable from the persistent roots to be moved into the persistent heap. We provide support for undoability through an SML interface that exports two functions: checkpoint, which checkpoints the current store, and restore, which undoes all changes made to the previously checkpointed store. Finally, we succinctly define a higher-order function transact completely in terms of the interfaces for persistence and undoability. 1 Motivation 1.1 Revisiting Transactions ...

