Results 1  10
of
15
Sorting and Searching in the Presence of Memory Faults (without Redundancy)
 Proc. 36th ACM Symposium on Theory of Computing (STOC’04
, 2004
"... We investigate the design of algorithms resilient to memory faults, i.e., algorithms that, despite the corruption of some memory values during their execution, are able to produce a correct output on the set of uncorrupted values. In this framework, we consider two fundamental problems: sorting and ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
We investigate the design of algorithms resilient to memory faults, i.e., algorithms that, despite the corruption of some memory values during their execution, are able to produce a correct output on the set of uncorrupted values. In this framework, we consider two fundamental problems: sorting and searching. In particular, we prove that any O(n log n) comparisonbased sorting algorithm can tolerate at most O((n log n) ) memory faults. Furthermore, we present one comparisonbased sorting algorithm with optimal space and running time that is resilient to O((n log n) ) faults. We also prove polylogarithmic lower and upper bounds on faulttolerant searching.
Fast Deterministic Simulation of Computations on Faulty Parallel Machines
 in Proc. of the 3rd Ann. European Symp. on Algorithms, 1995, Springer Verlag LNCS 979
, 1995
"... A method of deterministic simulation of fully operational parallel machines on the analogous machines prone to errors is developed. The simulation is presented for the exclusiveread exclusivewrite (EREW) PRAM and the Optical Communication Parallel Computer (OCPC), but it applies to a large class o ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
A method of deterministic simulation of fully operational parallel machines on the analogous machines prone to errors is developed. The simulation is presented for the exclusiveread exclusivewrite (EREW) PRAM and the Optical Communication Parallel Computer (OCPC), but it applies to a large class of parallel computers. It is shown that simulations of operational multiprocessor machines on faulty ones can be performed with logarithmic slowdown in the worst case. More precisely, we prove that both a PRAM with a bounded fraction of faulty processors and memory cells and an OCPC with a bounded fraction of faulty processors can simulate deterministically their faultfree counterparts with O(log n) slowdown and preprocessing done in time O(log 2 n). The fault model is as follows. The faults are deterministic (worstcase distribution) and static (do not change in the course of a computation). If a processor attempts to communicate with some other processor (in the case of an OCPC) or re...
SharedMemory Simulations on a FaultyMemory DMM
, 1996
"... this paper are synchronous, and the time performance is our major efficiency criterion. We consider a DMM with faulty memory words, otherwise everything is assumed to be operational. In particular the communication between the processors and the MUs is reliable, and a processor may always attempt to ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
this paper are synchronous, and the time performance is our major efficiency criterion. We consider a DMM with faulty memory words, otherwise everything is assumed to be operational. In particular the communication between the processors and the MUs is reliable, and a processor may always attempt to obtain an access to any MU, and, having been granted it, may access any memory word in it, even if all of them are faulty. The only restriction on the distribution of faults among memory words is that their total number is bounded from above by a fraction of the total number of memory words in all the MUs. In particular, some MUs may contain only operational cells, some only faulty cells, and some mixed cells. This report presents fast simulations of the PRAM on a DMM with faulty memory.
Optimal Scheduling for Disconnected Cooperation
, 2001
"... We consider a distributed environment consisting of n processors that need to perform t tasks. We assume that communication is initially unavailable and that processors begin work in isolation. At some unknown point of time an unknown collection of processors may establish communication. Before proc ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
We consider a distributed environment consisting of n processors that need to perform t tasks. We assume that communication is initially unavailable and that processors begin work in isolation. At some unknown point of time an unknown collection of processors may establish communication. Before processors begin communication they execute tasks in the order given by their schedules. Our goal is to schedule work of isolated processors so that when communication is established for the rst time, the number of redundantly executed tasks is controlled. We quantify worst case redundancy as a function of processor advancements through their schedules. In this work we rene and simplify an extant deterministic construction for schedules with n t, and we develop a new analysis of its waste. The new analysis shows that for any pair of schedules, the number of redundant tasks can be controlled for the entire range of t tasks. Our new result is asymptotically optimal: the tails of these schedules are within a 1 +O(n 1 4 ) factor of the lower bound. We also present two new deterministic constructions one for t n, and the other for t n 3=2 , which substantially improve pairwise waste for all prexes of length t= p n, and oer near optimal waste for the tails of the schedules. Finally, we present bounds for waste of any collection of k 2 processors for both deterministic and randomized constructions. 1
Designing Reliable Algorithms in Unreliable Memories
"... Some of today’s applications run on computer platforms with large and inexpensive memories, which are also errorprone. Unfortunately, the appearance of even very few memory faults may jeopardize the correctness of the computational results. An algorithm is resilient to memory faults if, despite t ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Some of today’s applications run on computer platforms with large and inexpensive memories, which are also errorprone. Unfortunately, the appearance of even very few memory faults may jeopardize the correctness of the computational results. An algorithm is resilient to memory faults if, despite the corruption of some memory values before or during its execution, it is nevertheless able to get a correct output at least on the set of uncorrupted values. In this paper we will survey some recent work on reliable computation in the presence of memory faults.
Deterministic Computations on a PRAM with Static Faults
"... We develop a deterministic simulation of fully operational Parallel Random Access Machine (PRAM) on a PRAM with some faulty processors and memory cells. The faults considered are static, i.e., once the machine starts to operate, the operational/faulty status of PRAM components does not change. The s ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We develop a deterministic simulation of fully operational Parallel Random Access Machine (PRAM) on a PRAM with some faulty processors and memory cells. The faults considered are static, i.e., once the machine starts to operate, the operational/faulty status of PRAM components does not change. The simulating machine can tolerate a constant fraction of faults among processors and memory cells. The simulating PRAM has n processors and m memory cells, and simulates a PRAM with n processors and m) memory cells. The simulation is in three phases: (1) preprocessing, followed by (2) retrieving the input by the processors active in the simulation, followed by (3) the proper part of the simulation performed in a stepbystep fashion. Preprocessing is performed in time O(( m n + log n) log n). The input is retrieved in time O(log 2 n). The slowdown of the proper part of the simulation is O(log m).
A WorkOptimal Deterministic Algorithm for the Asynchronous Certified WriteAll Problem (Extended Abstract)
 22nd ACM Symposium on Principles of Distributed Computing PODC’03
, 2003
"... In their SIAM J. on Computing paper [27] from 1992, Martel et al. posed a question for developing a workoptimal deterministic asynchronous algorithm for the fundamental loadbalancing and synchronization problem called Certified WriteAll. In this problem, introduced in a slightly different form by ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
In their SIAM J. on Computing paper [27] from 1992, Martel et al. posed a question for developing a workoptimal deterministic asynchronous algorithm for the fundamental loadbalancing and synchronization problem called Certified WriteAll. In this problem, introduced in a slightly different form by Kanellakis and Shvartsman in a PODC'89 paper [17], $p$ processors must update $n$ memory cells and then signal the completion of the updates. It is known that solutions to this problem can be used to simulate synchronous parallel programs on asynchronous systems with worstcase guarantees for the overhead of a simulation. Such simulations are interesting because they may increase productivity in parallel computing since synchronous parallel programs are easier to reason about than asynchronous ones are. This paper presents a solution to the question of Martel et al. Specifically, we show a deterministic asynchronous algorithm for the Certified WriteAll problem. Our algorithm has $O(n p^4\log n)$ work, which is optimal for a nontrivial number of processors $p\leq \bra{n/\log n}^{1/4}$. In contrast, all known deterministic algorithms require superlinear in $n$ work when $p= n^{1/r}$, for any fixed $r\geq 1$. Our algorithm generalizes the collision principle used by the algorithm T that was introduced by Buss et al. [7].
A Method for Creating NearOptimal Instances of a Certified WriteAll Algorithm
 11th Annual European Symposium on Algorithms (ESA’03
, 2003
"... This paper shows how to create nearoptimal instances of the Certified WriteAll algorithm called AWT that was introduced by Anderson and Woll [2]. This algorithm is the best known deterministic algorithm that can be used to simulate n synchronous parallel processors on n asynchronous processors. In ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
This paper shows how to create nearoptimal instances of the Certified WriteAll algorithm called AWT that was introduced by Anderson and Woll [2]. This algorithm is the best known deterministic algorithm that can be used to simulate n synchronous parallel processors on n asynchronous processors. In this algorithm n processors update n memory cells and then signal the completion of the updates. The algorithm is instantiated with q permutations, where q can be chosen from a wide range of values. When implementing a simulation on a specific parallel system with n processors, one would like to use an instance of the algorithm with the best possible value of q, in order to maximize the efficiency of the simulation. This paper shows that the choice of q is critical for obtaining an instance of the AWT algorithm with nearoptimal work. For any > 0, and any large enough n, work of any instance of the algorithm must be at least n . Under certain conditions, however, that q is about e and for infinitely many large enough n, this lower bound can be nearly attained by instances of the algorithm with work at most n . The paper also shows a penalty for not selecting q well. When q is significantly away from e , then work of any instance of the algorithm with this displaced q must be considerably higher than otherwise.
Efficient Computations on faultprone BSP machines
, 1997
"... In this paper general simulations of algorithms designed for fully operational BSP machines on BSP machines with faulty or unavailable processors, are developed. The failstop model is considered for the fault occurrences, that is, if a processor fails or becomes unavailable, it remains so until th ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
In this paper general simulations of algorithms designed for fully operational BSP machines on BSP machines with faulty or unavailable processors, are developed. The failstop model is considered for the fault occurrences, that is, if a processor fails or becomes unavailable, it remains so until the end of the computation. The faults are random, in the sense that a processor may fail independently with probability at most a, where a is a constant.
Robust Parallel Computations through Randomization
, 2000
"... In this paper we present an efficient general simulation strategy for computations designed for fully operational BSP machines of n ideal processors, on nprocessor dynamicfaultprone BSP machines. The fault occurrences are failstop and fully dynamic, i.e., they are allowed to happen online at any ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
In this paper we present an efficient general simulation strategy for computations designed for fully operational BSP machines of n ideal processors, on nprocessor dynamicfaultprone BSP machines. The fault occurrences are failstop and fully dynamic, i.e., they are allowed to happen online at any point of the computation, subject to the constraint that the total number of faulty processors may never exceed a known fraction. The computational paradigm can be exploited for robust computations over virtual parallel settings with a volatile underlying infrastructure, such as a NETWORK OF WORKSTATIONS (where workstations may be taken out of the virtual parallel machine by their owner).