Results 1 - 10
of
30
A bi-criteria scheduling heuristics for distributed embedded systems under reliability and real-time constraints
- In Int. Conf. on Dependable Systems and Networks, DSN’04
, 2004
"... Multi-criteria scheduling problems, involving optimiza-tion of more than one criterion, are subject to a growing interest. In this paper, we present a new bi-criteria schedul-ing heuristic for scheduling data-flow graphs of operations onto parallel heterogeneous architectures according to two criter ..."
Abstract
-
Cited by 40 (6 self)
- Add to MetaCart
(Show Context)
Multi-criteria scheduling problems, involving optimiza-tion of more than one criterion, are subject to a growing interest. In this paper, we present a new bi-criteria schedul-ing heuristic for scheduling data-flow graphs of operations onto parallel heterogeneous architectures according to two criteria: first the minimization of the schedule length, and second the maximization of the system reliability. Reliabil-ity is defined as the probability that none of the system com-ponents will fail while processing. The proposed algorithm is a list scheduling heuristics, based on a bi-criteria com-promise function that introduces priority between the op-erations to be scheduled, and that chooses on what subset of processors they should be scheduled. It uses the active replication of operations to improve the reliability. If the system reliability or the schedule length requirements are not met, then a parameter of the compromise function can be changed and the algorithm re-executed. This process is iterated until both requirements are met.
Fault-Tolerant Deployment of Embedded Software for Cost-Sensitive Real-Time Feedback-Control Applications
- In Procs. of Design Automation and Test in Europe
, 2004
"... Designing cost-sensitive real-time control systems for safety critical applications requires a careful analysis of the cost/coverage trade-offs of fault-tolerant solutions. This further complicates the difficult task of deploying the embedded software that implements the control algorithms on the ex ..."
Abstract
-
Cited by 32 (5 self)
- Add to MetaCart
(Show Context)
Designing cost-sensitive real-time control systems for safety critical applications requires a careful analysis of the cost/coverage trade-offs of fault-tolerant solutions. This further complicates the difficult task of deploying the embedded software that implements the control algorithms on the execution platform that is often distributed around the plant (as it is typical, for instance, in automotive applications). We propose a synthesis-based design methodology that relieves the designers from the burden of specifying detailed mechanisms for addressing platform faults, while involving them in the definition of the overall fault-tolerance strategy. Thus, they can focus on addressing plant faults within their control algorithms, selecting the best components for the execution platform, and defining an accurate fault model. Our approach is centered on a new model of computation, Fault Tolerant Data Flows (FTDF), that enables the integration of formal validation techniques.
Design optimization of time- and cost-constrained fault-tolerant distributed embedded systems with . . .
, 2009
"... We present an approach to the synthesis of fault-tolerant hard real-time systems for safety-critical applications. We use checkpointing with rollback recovery and active replication for tolerating transient faults. Processes and communications are statically scheduled. Our synthesis approach decide ..."
Abstract
-
Cited by 24 (9 self)
- Add to MetaCart
(Show Context)
We present an approach to the synthesis of fault-tolerant hard real-time systems for safety-critical applications. We use checkpointing with rollback recovery and active replication for tolerating transient faults. Processes and communications are statically scheduled. Our synthesis approach decides the assignment of fault-tolerance policies to processes, the optimal placement of checkpoints and the mapping of processes to processors such that multiple transient faults are tolerated and the timing constraints of the application are satisfied. We present several design optimization approaches which are able to find fault-tolerant implementations given a limited amount of resources. The developed algorithms are evaluated using extensive experiments, including a real-life example.
Fault Tolerant Scheduling of Precedence Task Graphs on Heterogeneous Platforms
, 2007
"... Fault tolerance and latency are important requirements in several applications which are time critical in nature: such applications require guaranties in terms of latency, even when processors are subject to failures. In this paper, we propose a fault tolerant scheduling heuristic for mapping preced ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
Fault tolerance and latency are important requirements in several applications which are time critical in nature: such applications require guaranties in terms of latency, even when processors are subject to failures. In this paper, we propose a fault tolerant scheduling heuristic for mapping precedence task graphs on heterogeneous systems. Our approach is based on an active replication scheme, capable of supporting ε arbitrary fail-silent (fail-stop) processor failures, hence valid results will be provided even if ε processors fail. We focus on a bi-criteria approach, where we aim at minimizing the latency given a fixed number of failures supported in the system, or the other way round. Major achievements include a low complexity, and a drastic reduction of the number of additional communications induced by the replication mechanism. Experimental results demonstrate that our heuristics, despite their lower complexity, outperform their direct competitor, the FTBAR scheduling algorithm [8].
A scheduling heuristics for distributed real-time embedded systems tolerant to processor . . .
, 2004
"... ..."
An Active Replication Scheme that Tolerates Failures in Distributed Embedded Real-Time Systems
- in "Proceedings of IFIP Working Conference on Distributed and Parallel Embedded Systems, DIPES’04
, 2004
"... Abstract Embedded real-time systems are being increasingly used in a major part of critical applications. In these systems, critical real-time constraints must be satisfied even in the presence of failures. In this paper, we present a new method-based on graph transformation that introduces fault-to ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
Abstract Embedded real-time systems are being increasingly used in a major part of critical applications. In these systems, critical real-time constraints must be satisfied even in the presence of failures. In this paper, we present a new method-based on graph transformation that introduces fault-tolerance in building embedded real-time systems. The proposed method targets distributed architecture and can tolerate a fixed number of arbitrary processors and communication links failures. Because of the resource limitation in embedded systems, our method uses a software-based replication technique to provide fault-tolerance. Finally, since we use graph transformation to perform replication, our method may be used by any off-line distribution-scheduling algorithm to generate a fault-tolerant distributed schedule.
Middleware for Resource-Aware Deployment and Configuration of Fault-Tolerant Real-time Systems
"... Abstract—Developing large-scale distributed real-time and embedded (DRE) systems is hard in part due to complex deployment and configuration issues involved in satisfying multiple quality for service (QoS) properties, such as real-timeliness and fault tolerance. This paper makes three contributions ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Developing large-scale distributed real-time and embedded (DRE) systems is hard in part due to complex deployment and configuration issues involved in satisfying multiple quality for service (QoS) properties, such as real-timeliness and fault tolerance. This paper makes three contributions to the study of deployment and configuration middleware for DRE systems that satisfy multiple QoS properties. First, it describes a novel task allocation algorithm for passively replicated DRE systems to meet their real-time and fault-tolerance QoS properties while consuming significantly less resources. Second, it presents the design of a strategizable allocation engine that enables application developers to evaluate different allocation algorithms. Third, it presents the design of a middlewareagnostic configuration framework that uses allocation decisions to deploy application components/replicas and configure the underlying middleware automatically on the chosen nodes. These contributions are realized in the DeCoRAM (Deployment and Configuration Reasoning and Analysis via Modeling) middleware. Empirical results on a distributed testbed demonstrate DeCoRAM’s ability to handle multiple failures and provide efficient and predictable real-time performance. Keywords-passive replication; replica allocation; resource minimization; real-time. I.
Synthesis of Fault-Tolerant Embedded Systems
"... This work addresses the issue of design optimization for faulttolerant hard real-time systems. In particular, our focus is on the handling of transient faults using both checkpointing with rollback recovery and active replication. Fault tolerant schedules are generated based on a conditional process ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
This work addresses the issue of design optimization for faulttolerant hard real-time systems. In particular, our focus is on the handling of transient faults using both checkpointing with rollback recovery and active replication. Fault tolerant schedules are generated based on a conditional process graph representation. The formulated system synthesis approaches decide the assignment of fault-tolerance policies to processes, the optimal placement of checkpoints and the mapping of processes to processors, such that multiple transient faults are tolerated, transparency requirements are considered, and the timing constraints of the application are satisfied. 1.
Realistic Models and Efficient Algorithms for Fault Tolerant Scheduling on Heterogeneous Platforms
, 2008
"... Most list scheduling heuristics rely on a simple platform model where communication contention is not taken into account. In addition, it is generally assumed that processors in the systems are completely safe. To schedule precedence graphs in a more realistic framework, we introduce an efficient fa ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
(Show Context)
Most list scheduling heuristics rely on a simple platform model where communication contention is not taken into account. In addition, it is generally assumed that processors in the systems are completely safe. To schedule precedence graphs in a more realistic framework, we introduce an efficient fault tolerant scheduling algorithm that is both contentionaware and capable of supporting ε arbitrary fail-silent (fail-stop) processor failures. We focus on a bi-criteria approach, where we aim at minimizing the total execution time, or latency, given a fixed number of failures supported in the system. Our algorithm has a low time complexity, and drastically reduces the number of additional communications induced by the replication mechanism. Experimental results fully demonstrate the usefulness of the proposed algorithm, which leads to efficient execution schemes while guaranteeing a prescribed level of fault tolerance.
Scheduling and Optimization of Fault-Tolerant Embedded Systems with Transparency/ Performance Trade-Offs
, 2010
"... In this paper, we propose a strategy for the synthesis of fault-tolerant schedules and for the mapping of faulttolerant applications. Our techniques handle transparency/performance trade-offs and use the fault-occurrence information to reduce the overhead due to fault tolerance. Processes and messag ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
In this paper, we propose a strategy for the synthesis of fault-tolerant schedules and for the mapping of faulttolerant applications. Our techniques handle transparency/performance trade-offs and use the fault-occurrence information to reduce the overhead due to fault tolerance. Processes and messages are statically scheduled, and we use process re-execution for recovering from multiple transient faults. We propose a fine-grained transparent recovery, where the property of transparency can be selectively applied to processes and messages. Transparency hides the recovery actions in a selected part of the application so that they do not affect the schedule of other processes and messages. While leading to longer schedules, transparent recovery has the advantage of both improved debuggability and less memory needed to store the fault-tolerant schedules.