Results 1 - 10
of
58
Microreboot - A Technique for Cheap Recovery
, 2004
"... A significant fraction of software failures in large-scale Internet systems are cured by rebooting, even when the exact failure causes are unknown. However, rebooting can be expensive, causing nontrivial service disruption or downtime even when clusters and failover are employed. In this work we sep ..."
Abstract
-
Cited by 94 (2 self)
- Add to MetaCart
A significant fraction of software failures in large-scale Internet systems are cured by rebooting, even when the exact failure causes are unknown. However, rebooting can be expensive, causing nontrivial service disruption or downtime even when clusters and failover are employed. In this work we separate process recovery from data recovery to enable microrebooting -- a fine-grain technique for surgically recovering faulty application components, without disturbing the rest of the application.
User Interface Directions For The Web
, 1999
"... signing a large site requires collaboration between a team of UI professionals, and some UI will prefer to stay with their traditional role of software design rather than moving into the wild world of Web design. There are three possible solutions to the problem: . Make it possible to design reason ..."
Abstract
-
Cited by 78 (0 self)
- Add to MetaCart
signing a large site requires collaboration between a team of UI professionals, and some UI will prefer to stay with their traditional role of software design rather than moving into the wild world of Web design. There are three possible solutions to the problem: . Make it possible to design reasonably usable sites without having UI expertise; . Train more people in good Web design; and . Live with poorly designed sites that are hard to use. The third option is not acceptable in my opinion. Unless the vast majority of Web sites are improved considerably, we will suffer a usability meltdown of the Web no later than the Year 2000, and most people will refer to the Web as "oh, yes, we tried that last year, but it was no good." Thus, we have to strive for a combination of the first two options: making it easier to design acceptable sites, and increasing the availability of staff who know how to do so. Making it easier to design usable sites will likely involve a combination of template
Undo for Operators: Building an Undoable E-mail Store
- In Proceedings of the 2003 USENIX Annual Technical Conference
, 2003
"... System operators play a critical role in maintaining server dependability yet lack powerful tools to help them do so. To help address this unfulfilled need, we describe Operator Undo, a tool that provides a forgiving operations environment by allowing operators to recover from their own mistakes, ..."
Abstract
-
Cited by 65 (3 self)
- Add to MetaCart
System operators play a critical role in maintaining server dependability yet lack powerful tools to help them do so. To help address this unfulfilled need, we describe Operator Undo, a tool that provides a forgiving operations environment by allowing operators to recover from their own mistakes, from unanticipated software problems, and from intentional or accidental data corruption. Operator Undo starts by intercepting and logging user interactions with a network service before they enter the system, creating a record of user intent. During an undo cycle, all system hard state is physically rewound, allowing the operator to perform arbitrary repairs; after repairs are complete, lost user data is reintegrated into the repaired system by replaying the logged user interactions while tracking and compensating for any resulting externally-visible inconsistencies. We describe the design and implementation of an application-neutral framework for Operator Undo, and detail the process by which we instantiated the framework in the form of an undo-capable e-mail store supporting SMTP mail delivery and IMAP mail retrieval. Our proof-of-concept e-mail implementation imposes only a small performance overhead, and can store days or weeks of recovery log on a single disk.
The Scope and Importance of Human Interruption In Human-Computer . . .
- HUMAN-COMPUTER INTERACTION
, 2002
"... At first glance it seems absurd that busy people doing important jobs should want their computers to interrupt them. Interruptions are disruptive and people need to concentrate to make good decisions. However, successful job performance also frequently depends on people's abilities to (a) constantly ..."
Abstract
-
Cited by 61 (0 self)
- Add to MetaCart
At first glance it seems absurd that busy people doing important jobs should want their computers to interrupt them. Interruptions are disruptive and people need to concentrate to make good decisions. However, successful job performance also frequently depends on people's abilities to (a) constantly monitor their dynamically changing information environments, (b) collaborate and communicate with other people in the system, and (c) supervise background autonomous services. These critical abilities can require people to simultaneously query a large set of information sources, continuously monitor for important events, and respond to and communicate with other human operators. Automated monitoring
Predictive Resource Management for Wearable Computing
- Proceedings of the 1st International Conference on Mobile Systems, Applications, and Services (MobiSys
, 2003
"... Achieving crisp interactive response in resource-intensive applications such as augmented reality, language translation, and speech recognition is a major challenge on resource-poor wearable hardware. In this paper we describe a solution based on multi-fidelity computation supported by predictive re ..."
Abstract
-
Cited by 31 (3 self)
- Add to MetaCart
Achieving crisp interactive response in resource-intensive applications such as augmented reality, language translation, and speech recognition is a major challenge on resource-poor wearable hardware. In this paper we describe a solution based on multi-fidelity computation supported by predictive resource management. We show that such an approach can substantially reduce both the mean and the variance of response time. On a benchmark representative of augmented reality, we demonstrate a 60 % reduction in mean latency and a 30 % reduction in the coefficient of variation. We also show that a history-based approach to demand prediction is the key to this performance improvement. 1
JAGR: An Autonomous Self-Recovering Application Server
, 2003
"... This paper demonstrates that the dependability of generic, evolving J2EE applications can be enhanced through a combination of a few recovery-oriented techniques. Our goal is to reduce downtime by automatically and efficiently recovering from a broad class of transient software failures without havi ..."
Abstract
-
Cited by 29 (5 self)
- Add to MetaCart
This paper demonstrates that the dependability of generic, evolving J2EE applications can be enhanced through a combination of a few recovery-oriented techniques. Our goal is to reduce downtime by automatically and efficiently recovering from a broad class of transient software failures without having to modify applications. We describe here the integration of three new techniques into JBoss, an open-source J2EE application server. The resulting system is JAGR---JBoss with Application-Generic Recovery---a self-recovering execution platform.
Extending Tuplespaces for Coordination in Interactive Workspaces
- Journal of Systems and Software
, 2004
"... Abstract. The current interest in programming models and software infrastructures to support ubiquitous and environmental computing is heightened by the falling cost of hardware and the ubiquity of local-area wireless networking technologies. Interactive workspaces are technologically augmented team ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
Abstract. The current interest in programming models and software infrastructures to support ubiquitous and environmental computing is heightened by the falling cost of hardware and the ubiquity of local-area wireless networking technologies. Interactive workspaces are technologically augmented team-project rooms that represent a specific sub-domain of ubiquitous computing. We argue both from related work and from our own experience with a prototype that the tuplespace model of communication forms the best basis for a coordination infrastructure for such workspaces. This paper presents the usage and characteristics expected of interactive workspaces, from which we derive a set of key system properties for any coordination infrastructure in an interactive workspace. We show that the design aspects of tuplespaces, augmented with some new extensions, yield a system model, which we call the Event Heap, that satisfies all of the desired properties. We also briefly discuss why other coordination models fall short of the desired properties, and describe our experience using our implementation of the Event Heap model. The paper focuses on a justification of the use of tuplespaces in interactive workspaces, and does not provide a detailed discussion of the Event Heap implementation or our more general experience with interactive workspaces, each of which is treated in detail elsewhere. 1
When Does Fast Recovery Trump High Reliability?
- in Proc. 2nd Workshop on Evaluating and Architecting System Dependability
, 2002
"... this paper, we argue that for interactive Internet applications, a decrease in MTTR is sometimes more valuable than the corresponding increase in MTTF to improve Availability by the same amount, and we make a case for adopting MTTR as the primary metric for reasoning about system availability and fo ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
this paper, we argue that for interactive Internet applications, a decrease in MTTR is sometimes more valuable than the corresponding increase in MTTF to improve Availability by the same amount, and we make a case for adopting MTTR as the primary metric for reasoning about system availability and focusing designs on fast recovery
Planning and the user interface: The effects of lockout time and error recovery cost
- International Journal of Human-Computer Studies
, 1999
"... Planning and the user interface: the effects of lockout time and error recovery cost ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Planning and the user interface: the effects of lockout time and error recovery cost

