Results 1 - 10
of
66
A Survey of Rollback-Recovery Protocols in Message-Passing Systems
, 1996
"... this paper, we use the terms event logging and message logging interchangeably ..."
Abstract
-
Cited by 716 (22 self)
- Add to MetaCart
this paper, we use the terms event logging and message logging interchangeably
Deploying and managing Web services: issues, solutions, and directions
- THE VLDB JOURNAL
, 2005
"... Web services are expected to be the key technology in enabling the next installment of the Web in the form of the Service Web. In this paradigm shift, Web services would be treated as first-class objects that can be manipulated much like data is now manipulated using a database management system. ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
Web services are expected to be the key technology in enabling the next installment of the Web in the form of the Service Web. In this paradigm shift, Web services would be treated as first-class objects that can be manipulated much like data is now manipulated using a database management system. Hitherto, Web services have largely been driven by standards. However, there is a strong impetus for defining a solid and integrated foundation that would facilitate the kind of innovations witnessed in other fields, such as databases. This survey focuses on investigating the different research problems, solutions, and directions to deploying Web services that are managed by an integrated Web Service Management System (WSMS). The survey identifies the key features of a WSMS and conducts a comparative study on how current research approaches and projects fit in.
Message Logging in Mobile Computing
, 1999
"... Dependable mobile computing is enhanced by independent recovery, low power consumption and no dependence on stable storage at the mobile host. Existing recovery protocols proposed for mobile environments typically create consistent global checkpoints that do not guarantee independent recovery and lo ..."
Abstract
-
Cited by 28 (6 self)
- Add to MetaCart
Dependable mobile computing is enhanced by independent recovery, low power consumption and no dependence on stable storage at the mobile host. Existing recovery protocols proposed for mobile environments typically create consistent global checkpoints that do not guarantee independent recovery and low power consumption. This paper demonstrates the advantages of message logging by describing a receiver based logging protocol. Checkpointing is utilized to limit log size and recovery latency. We compare the performance of our approach with that of existing mobile checkpointing and recovery algorithms in terms of failure free overhead and recovery time. We also describe a stable storage management scheme for mobile support stations. Garbage collection is achieved without direct participation of mobile hosts.
An Asynchronous Recovery Scheme based on Optimistic Message Logging for Mobile Computing Systems
- In Proc. the 20th International Conference on Distributed Computing Systems
, 2000
"... This paper presents an asynchronous recovery scheme to provide fault-tolerance for mobile computing systems. The proposed scheme is based on optimistic message logging, since the checkpointing-only schemes are not suitable for the mobile environment in which unreliable mobile hosts and fragile netwo ..."
Abstract
-
Cited by 22 (5 self)
- Add to MetaCart
(Show Context)
This paper presents an asynchronous recovery scheme to provide fault-tolerance for mobile computing systems. The proposed scheme is based on optimistic message logging, since the checkpointing-only schemes are not suitable for the mobile environment in which unreliable mobile hosts and fragile network connection may hinder any kind of coordination for checkpointing and recovery. Also, in order to reduce the overhead imposed on mobile hosts, mobile support stations take charge of logging and dependency tracking, and mobile hosts maintain only a small amount of information for mobility tracking. As a result, truly asynchronous recovery for mobile systems can be achieved with the little overhead. Keywords: Distributed systems, Fault-tolerance, Mobile computing, Message logging, Asynchronous recovery. 1 Introduction Distributed computing systems are nowadays extended to continue their services in the mobile environment, and checkpointing-recovery is one of such services to provide fault...
RENEW: A Tool for Fast and Efficient Implementation of Checkpoint Protocols
- IN PROCEEDINGS OF THE 28TH IEEE FAULT-TOLERANT COMPUTING SYMPOSIUM (FTCS
, 1998
"... This paper describes the design, implementation, and evaluation of a run-time system for clusters of workstations that allows the rapid testing of checkpoint protocols with standard benchmarks. To achieve this goal, RENEW provides a flexible set of operations that facilitates the integration of a pr ..."
Abstract
-
Cited by 19 (7 self)
- Add to MetaCart
This paper describes the design, implementation, and evaluation of a run-time system for clusters of workstations that allows the rapid testing of checkpoint protocols with standard benchmarks. To achieve this goal, RENEW provides a flexible set of operations that facilitates the integration of a protocol in the system with reduced programming effort. To support a broad range of applications, RENEW exports, as its external interface, the industry endorsed Message Passing Interface (MPI). Three distinct classes of protocols were evaluated using the RENEW environment with SPEC and NAS benchmarks on a network of workstations connected by ATM. It was observed that the communication-induced protocol emulated the behavior of the coordinated protocol, with comparable performance. The message logging protocol degraded the performance. Even though the message logging protocol was slower due to log replay, all three protocols required a similar amount of time to restore the application to the same state as before failure occurred and recovery was initiated.
A low-cost hybrid coordinated checkpointing protocol for mobile distributed systems,
- Journal of Mobile Information Systems,
, 2008
"... Abstract. Mobile distributed systems raise new issues such as mobility, low bandwidth of wireless channels, disconnections, limited battery power and lack of reliable stable storage on mobile nodes. In minimum-process coordinated checkpointing, some processes may not checkpoint for several checkpoi ..."
Abstract
-
Cited by 18 (8 self)
- Add to MetaCart
(Show Context)
Abstract. Mobile distributed systems raise new issues such as mobility, low bandwidth of wireless channels, disconnections, limited battery power and lack of reliable stable storage on mobile nodes. In minimum-process coordinated checkpointing, some processes may not checkpoint for several checkpoint initiations. In the case of a recovery after a fault, such processes may rollback to far earlier checkpointed state and thus may cause greater loss of computation. In all-process coordinated checkpointing, the recovery line is advanced for all processes but the checkpointing overhead may be exceedingly high. To optimize both matrices, the checkpointing overhead and the loss of computation on recovery, we propose a hybrid checkpointing algorithm, wherein an all-process coordinated checkpoint is taken after the execution of minimum-process coordinated checkpointing algorithm for a fixed number of times. Thus, the Mobile nodes with low activity or in doze mode operation may not be disturbed in the case of minimum-process checkpointing and the recovery line is advanced for each process after an all-process checkpoint. Additionally, we try to minimize the information piggybacked onto each computation message. For minimum-process checkpointing, we design a blocking algorithm, where no useless checkpoints are taken and an effort has been made to optimize the blocking of processes. We propose to delay selective messages at the receiver end. By doing so, processes are allowed to perform their normal computation, send messages and partially receive them during their blocking period. The proposed minimum-process blocking algorithm forces zero useless checkpoints at the cost of very small blocking.
Adaptive Checkpointing with Storage Management for Mobile Environments
- IEEE Transactions on Reliability
, 1998
"... This paper describes an adaptive protocol that manages storage for base stations. The protocol integrates leasing storage management with a time-based coordinated checkpointing mechanism. The leasing enables storage managers to effectively control disk space. Leasing prevents hanged processes from i ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
(Show Context)
This paper describes an adaptive protocol that manages storage for base stations. The protocol integrates leasing storage management with a time-based coordinated checkpointing mechanism. The leasing enables storage managers to effectively control disk space. Leasing prevents hanged processes from indefinitely retaining storage and, in addition, garbage collection is simple. Time-based 1
An Efficient Recovery Scheme for Mobile Computing Environment
- Proc. 2001 Int'l Conf. on Parallel and Distributed Systems
, 2001
"... This paper presents an efficient recovery scheme to provide fault-tolerance for mobile computing systems. The proposed scheme is based on message logging and independent checkpointing, since the checkpointing-only schemes are not suitable for the mobile environment in which unreliable mobile hosts a ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
(Show Context)
This paper presents an efficient recovery scheme to provide fault-tolerance for mobile computing systems. The proposed scheme is based on message logging and independent checkpointing, since the checkpointing-only schemes are not suitable for the mobile environment in which unreliable mobile hosts and fragile network connection may hinder any kind of coordination for checkpointing and recovery. For efficient management of recovery information, such as checkpoints and message logs, the movement- based scheme is suggested. The mobile host carrying its recovery information to its current mobile support station can recover instantly in case of a failure. However, the mobile support stations visited by the mobile host have to experience high failure-free execution cost to transfer the recovery information and access the stable storage. On the other hand, the recovery cost can be too high, if the recovery information is dispersed over a wide range of cells. The movement-based scheme considers both costs. While the mobile host moves within a certain range, recovery information of the mobile host is not moved. However, if the mobile host moves out of the range, it transfers the recovery information nearby. As a result, the scheme controls the transfer cost as well as the recovery cost. The performance of the proposed scheme is evaluated with extensive simulation results.
On failure recoverability of client-server applications in mobile wireless environments
- IEEE Transactions on Reliability
"... Abstract—Analytical results for the Cdf of the failure recovery time for client-server applications in mobile wireless environments characterized by logging, and mobility handoff strategies for facilitating failure recovery are reported in the paper. The results can be applied to determine if a mobi ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Abstract—Analytical results for the Cdf of the failure recovery time for client-server applications in mobile wireless environments characterized by logging, and mobility handoff strategies for facilitating failure recovery are reported in the paper. The results can be applied to determine if a mobile application can satisfy its recoverability requirement upon a mobile host failure when operating under a set of parameter values characterizing the mobile application, the underlying client-server environment, and the logging & mobility handoff strategies adopted by the mobile application. Model parameters which affect the shape of the failure recovery time Cdf for two mobility handoff strategies, namely, Eager and Lazy, are identified, and their effects are analyzed, with numerical data and result interpretations given. A tradeoff analysis between the cost invested by these two mobility handoff strategies for maintaining the logging and checkpoint information before failure versus the return of investment in terms of improved failure recoverability is given, and the best checkpoint interval period that would yield the best return of investment for the eager mobility handoff strategy over the lazy strategy is identified. Index Terms—Client-server mobile applications, failure recoverability, mobile wireless networks, mobility handoff.
Performance and effectiveness analysis of checkpointing in mobile environments
- in Proc. the 22nd Symposium on Reliable Distributed Systems
, 2003
"... Many mathematical models have been proposed to evaluate the execution performance of an application with and without checkpointing in the presence of failures. They assume that the total program execution time without failure is known in advance, under which condition the optimal checkpointing inter ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
(Show Context)
Many mathematical models have been proposed to evaluate the execution performance of an application with and without checkpointing in the presence of failures. They assume that the total program execution time without failure is known in advance, under which condition the optimal checkpointing interval can be determined. In mobile environments, application components are distributed and tasks are computed by sending and receiving computational and control messages. The total execution time includes communication time and depends on multiple factors, such as heterogeneous processing speeds, link bandwidth, etc., making it unpredictable during different executions. However, the number of total computational messages received is usually unchanged within an application. Another special factor that should be considered for checkpointing purpose is handoff, which often happens in mobile networks. With these observations, we analyze application execution performance and average effectiveness, and introduce an equinumber checkpointing strategy. We show how checkpointing and handoff affect performance and effectiveness metrics, determine the conditions when checkpointing is beneficial, and calculate the optimal checkpointing interval for minimizing the total execution time and maximizing the average effectiveness in mobile environments.