Results 1 -
6 of
6
Guaranteeing Fault Tolerance Through Scheduling In Real-Time Systems
, 1996
"... Real-time systems are those which must execute all tasks within their timing constraints. Due to the catastrophic consequences of missing deadlines of some realtime tasks, fault tolerance is an essential component of such systems. This thesis introduces techniques to enhance the fault tolerance capa ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
Real-time systems are those which must execute all tasks within their timing constraints. Due to the catastrophic consequences of missing deadlines of some realtime tasks, fault tolerance is an essential component of such systems. This thesis introduces techniques to enhance the fault tolerance capability of real-time systems by incorporating time redundancy. Time redundancy is essential in ultrareliable real-time systems where correlated faults must be tolerated. It can also be used to detect and tolerate transient faults, which are a majority of the faults in computing systems. This thesis demonstrates how time redundancy can be used in conjunction with hardware and software redundancy to tolerate a variety of faults in real-time systems. This thesis considers several different system and task models, and for each model, presents a schedulability test (a utilization bound or a set of conditions) which guarantees that all tasks in the system will satisfy their timing constraints even ...
Scheduling Fault-Tolerant Distributed Hard Real-Time Tasks Independently of the Replication Strategies
- In Proc. of 6th Int. Conf on Real-Time Computing Systems and Applications
, 1999
"... Replication is a well-know fault-tolerance technique, and several replication strategies exist (e.g. active, passive, and semi-active replication). To be used in hard real-time systems, the presence of replication must be dealt with in scheduling algorithms, and more particularly in the feasibility ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Replication is a well-know fault-tolerance technique, and several replication strategies exist (e.g. active, passive, and semi-active replication). To be used in hard real-time systems, the presence of replication must be dealt with in scheduling algorithms, and more particularly in the feasibility tests in charge of testing whether deadlines will be met or not. So far, existing solutions to integrate replicated tasks in scheduling algorithms were specific to a given replication strategy or to its implementation on a given architecture. This paper is devoted to the description of a framework for taking into account the replicated tasks in scheduling algorithms that is largely independent of the replication technique. We show on an example that the same scheduling algorithm can be used whatever replication strategy is selected, even if several replication strategies are simultaneously used. 1 Introduction Hard real-time systems are systems for which tasks produce incorrect results when...
A nonpreemptive real-time scheduler with recovery from transient faults and its implementation
- IEEE TRANSACTIONS ON SOFTWARE ENGINEERING
, 2003
"... Real-time systems (RTS) are those whose correctness depends on satisfying the required functional as well as the required temporal properties. Due to the criticality of such systems, recovery from faults is an essential part of a RTS. In many systems, such as those supporting space applications, si ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Real-time systems (RTS) are those whose correctness depends on satisfying the required functional as well as the required temporal properties. Due to the criticality of such systems, recovery from faults is an essential part of a RTS. In many systems, such as those supporting space applications, single event upsets (SEUs) are the prevalent type of faults; SEUs are transient faults and affect a single task at a time. This paper presents a scheme to guarantee that the execution of real-time tasks can tolerate SEUs and intermittent faults assuming any queue-based scheduling technique. Three algorithms are presented to solve the problem of adding fault tolerance to a queue of real-time tasks by reserving sufficient slack in a schedule so that recovery can be carried out before the task deadline without compromising guarantees given to other tasks. The first algorithm is a dynamic programming optimal solution, the second is a linear-time heuristic for scheduling dynamic tasks, and the third algorithm comprises extensions to address queues with gaps between tasks (gaps are caused by precedence, resource, or timing constraints). We show through simulations that the heuristics closely approximate the optimal algorithm. Finally, the paper describes the implementation of the modified admission control algorithm, the nonpreemptive scheduler, and a recovery mechanism in the FT-RT-Mach operating system.
Mechanisms for System-Level Fault Tolerance in Real-Time Systems
"... this paper we concentrate on the development of a method for achieving fault tolerance at the system level, and we define a mechanism by which the guarantees of applications accepted for execution may be revoked. The removal of applications from, or the their replacement in, a schedule may be necess ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
this paper we concentrate on the development of a method for achieving fault tolerance at the system level, and we define a mechanism by which the guarantees of applications accepted for execution may be revoked. The removal of applications from, or the their replacement in, a schedule may be necessary in order to maintain the system in a safe state, even in a real-time environment. Our scheme encompasses recovery techniques such as graceful degradation, load shedding, and reconfiguration, while the implementation is done transparently to the user. The paper is organized as follows. In the rest of this section we give a brief overview of previous work in the field, as well as of the framework we are working with. In the following section we define the basic way to treat faults at the system level, called scenario changes. In Section 2.2 we describe the mechanisms and conditions necessary to apply the scenario changes, and in Section 3 we describe practical uses for the scheme. We close the paper with concluding remarks and directions for future work.
Anticipated Faults in Real-Time Distributed Systems
, 1995
"... In this paper we present a Petri-net-based approach to consider anticipated faults in real-time distributed systems. The proposed approach is based on Fuzzy Time G-Nets which is the integration of two Petri Nets extensions: Fuzzy Time Petri Net (timing Petri net extension) and G-Nets (Petri net exte ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
In this paper we present a Petri-net-based approach to consider anticipated faults in real-time distributed systems. The proposed approach is based on Fuzzy Time G-Nets which is the integration of two Petri Nets extensions: Fuzzy Time Petri Net (timing Petri net extension) and G-Nets (Petri net extended with object oriented concepts). We also show that Fuzzy Time GNets can be used to represent faul-tolerant techniques. 1 Introduction The development of large and complex systems requires powerful modeling and analysis tools in order to treat the inherent complexity of such systems. The natural trend of the computing world is moving toward the concept of distributed systems [19]. As consequence of the increasing popularity of the use of distributed computing systems in complex critical applications, issues related to safety and fault tolerance of systems have gained importance in the past years. A system is fault-tolerant if it maintains full performance and functional capabilities in ...
Faults and Timing Analysis in Real-Time Distributed Systems: A Fuzzy Time Petri-Net-Based Approach
"... In this paper we present a Petri-net-based approach to consider faults and timing analysis in real-time distributed systems. The proposed approach is based on Fuzzy Time G-Nets which is the integration of two Petri Nets extensions: Fuzzy Time Petri Net (timing Petri net extension) and G-Nets (Pet ..."
Abstract
- Add to MetaCart
In this paper we present a Petri-net-based approach to consider faults and timing analysis in real-time distributed systems. The proposed approach is based on Fuzzy Time G-Nets which is the integration of two Petri Nets extensions: Fuzzy Time Petri Net (timing Petri net extension) and G-Nets (Petri net extended with object oriented concepts). We show how to perform timing analysis using the proposed approach. We also show that Fuzzy Time G-Nets can be used to represent fault-tolerant techniques and to consider anticipated faults. Keywords: fuzzy time intervals, Petri nets, fault-tolerance, distributed real-time systems. Published in: International Journal Fuzzy Sets and Systems, Vol. 83, Num. 3, 1996, North-Holland, pp. 143--168. 1 Introduction The development of large and complex systems requires powerful modeling and analysis tools in order to treat the inherent complexity of such systems. For several reasons, as for example firm mathematical foundation and graphical repres...

