Results 1 -
4 of
4
Soft Error Rate Determination for Nanometer CMOS VLSI Logic
"... Nanometer CMOS VLSI circuits are highly sensitive to soft errors due to environmental causes such as cosmic radiation and charged particles. These phenomena, also known as single-event upset (SEU) induce current pulses at random times and random locations in a digital circuit. In this paper we model ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
Nanometer CMOS VLSI circuits are highly sensitive to soft errors due to environmental causes such as cosmic radiation and charged particles. These phenomena, also known as single-event upset (SEU) induce current pulses at random times and random locations in a digital circuit. In this paper we model neutron-induced soft errors using two parameters, namely, frequency and intensity. Our soft error rate (SER) estimation method propagates both frequency (expressed as probability) and intensity as the width of single event transient (SET) pulses expressed as probability density functions through the circuit. With this model we are able to accurately model electrical masking factors in logic circuits. Also, the error pulse width density information at primary outputs of the logic circuit allows evaluation of SER reduction schemes such as time or space redundancy. 1
Energy efficient configuration for qos in reliable parallel servers
- In Proc. of the Fifth European Dependable Computing Conference (EDCC
, 2005
"... Abstract. Redundancy is the traditional technique used to increase system reliability. With modern technology, in addition to being used as temporal redundancy, slack time can also be used by energy management schemes to scale down system processing speed and supply voltage to save energy. In this p ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract. Redundancy is the traditional technique used to increase system reliability. With modern technology, in addition to being used as temporal redundancy, slack time can also be used by energy management schemes to scale down system processing speed and supply voltage to save energy. In this paper, we consider a system that consists of multiple servers for providing reliable service. Assuming that servers have self-detection mechanisms to detect faults, we first propose an efficient parallel recovery scheme that processes service requests in parallel to increase the number of faults that can be tolerated and thus the system reliability. Then, for a given request arrival rate, we explore the optimal number of active severs needed for minimizing system energy consumption while achieving k-fault tolerance or for maximizing the number of faults to be tolerated with limited energy budget. Analytical results are presented to show the trade-off between the energy savings and the number of faults being tolerated. 1
Design and Analysis of an Optimal Instruction-Retry Policy for TMR Controller Computers
, 1993
"... An instruction-retry policy is proposed to enhance the fault-tolerance of triple modular redundant (TMR) controller computers by adding time redundancy to them. A TMR failure is said to occur if a TMR system fails to establish a majority among its modules' outputs due to multiple faulty modules or a ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
An instruction-retry policy is proposed to enhance the fault-tolerance of triple modular redundant (TMR) controller computers by adding time redundancy to them. A TMR failure is said to occur if a TMR system fails to establish a majority among its modules' outputs due to multiple faulty modules or a faulty voter. Either multiple consecutive TMR failures the active period of which exceeds a certain time limit or the exhaustion of spares as a result of frequent system reconfigurations may result in failure to meet the timing constraints of one or more tasks, called the dynamic failure, during a given mission. An optimal instruction-retry period is derived by minimizing the probability of dynamic failure upon detection of either a masked (by the TMR) error or a TMR failure. We also derive the minimum number of spares needed to keep below the pre-specified level the probability of dynamic failure for a given mission by using the derived optimal retry period. Index Terms --- Real-time contr...
A Novel Replication Technique for Implementing Fault-Tolerant Parallel Software
"... In this paper we present a novel replication technique based on the FTAG computation model. FTAG is a functional and attribute based language for programming fault-tolerant parallel applications [4]. FTAG have a tree structure computation model. In the replication technique developed an application ..."
Abstract
- Add to MetaCart
In this paper we present a novel replication technique based on the FTAG computation model. FTAG is a functional and attribute based language for programming fault-tolerant parallel applications [4]. FTAG have a tree structure computation model. In the replication technique developed an application is replicated on different group of processors. Each group is called a replica. All replicas are active [3] and compute concurrently a different piece of the application parallel code. In our model replicas cooperate not only to detect and mask failures but also to perform parallel computations. The replication mechanisms are supported by the FTAG run time system and are fully application-transparent. Different novel mechanisms for recovery in case of one or multiple simultaneous failures are developed. The developed replication technique reduces replication cost and take full advantage of parallel computation to reduce computation time. The recovery mechanisms used introduces a new checkpoi...

