Results 1 - 10
of
52
Optimal Clock Synchronization
- Journal of the ACM
, 2003
"... We present a simple, efficient, and unified solution to the problems of synchronizing, initializing, and integrating clocks for systems with different types of failures: crash, omission, and arbitrary failures with and without message authentication. This is the ft known solution that achieves optim ..."
Abstract
-
Cited by 106 (0 self)
- Add to MetaCart
We present a simple, efficient, and unified solution to the problems of synchronizing, initializing, and integrating clocks for systems with different types of failures: crash, omission, and arbitrary failures with and without message authentication. This is the ft known solution that achieves optimal accuracy -- the accuracy of synchronized clocks (with respect to real time) is as good as that specified for the underlying hardware clocks. The solution is also optimal with respect to the number of faulty processes that can be tolerated to achieve this accuracy.
Reaching approximate agreement in the presence of faults
- Journal of the ACM
, 1986
"... Abstract. This paper considers a variant of the Byzantine Generals problem, in which processes start with arbitrary real values rather than Boolean values or values from some bounded range, and in which approximate, rather than exact, agreement is the desired goal. Algorithms are presented to reach ..."
Abstract
-
Cited by 89 (10 self)
- Add to MetaCart
Abstract. This paper considers a variant of the Byzantine Generals problem, in which processes start with arbitrary real values rather than Boolean values or values from some bounded range, and in which approximate, rather than exact, agreement is the desired goal. Algorithms are presented to reach approximate agreement in asynchronous, as well as synchronous systems. The asynchronous agreement algorithm is an interesting contrast to a result of Fischer et al, who show that exact agreement with guaranteed termination is not attainable in an asynchronous system with as few as one faulty process. The algorithms work by successive approximation, with a provable convergence rate that depends on the ratio between the number of faulty processes and the total number of processes. Lower bounds on the convergence rate for algorithms of this form are proved, and the algorithms presented are shown to
A Comparison of Bus Architectures for Safety-Critical Embedded Systems
, 2001
"... Abstract. Embedded systems for safety-critical applications often integrate multiple “functions ” and must generally be fault-tolerant. These requirements lead to a need for mechanisms and services that provide protection against fault propagation and ease the construction of distributed fault-toler ..."
Abstract
-
Cited by 78 (4 self)
- Add to MetaCart
Abstract. Embedded systems for safety-critical applications often integrate multiple “functions ” and must generally be fault-tolerant. These requirements lead to a need for mechanisms and services that provide protection against fault propagation and ease the construction of distributed fault-tolerant applications. A number of bus architectures have been developed to satisfy this need. This paper reviews the requirements on these architectures, the mechanisms employed, and the services provided. Four representative architectures (SAFEbus TM, SPIDER, TTA, and FlexRay) are briefly described. 1
Hundreds of Impossibility Results for Distributed Computing
- Distributed Computing
, 2003
"... We survey results from distributed computing that show tasks to be impossible, either outright or within given resource bounds, in various models. The parameters of the models considered include synchrony, fault-tolerance, different communication media, and randomization. The resource bounds refe ..."
Abstract
-
Cited by 32 (4 self)
- Add to MetaCart
We survey results from distributed computing that show tasks to be impossible, either outright or within given resource bounds, in various models. The parameters of the models considered include synchrony, fault-tolerance, different communication media, and randomization. The resource bounds refer to time, space and message complexity. These results are useful in understanding the inherent difficulty of individual problems and in studying the power of different models of distributed computing.
On the composition of authenticated Byzantine agreement
- In 34th Annual ACM Symposium on Theory of Computing (STOC
, 2002
"... ..."
Gap Theorems for Distributed Computation
- SIAM Journal on Computing
, 1986
"... lower bounds, gap theorem. Consider a bidirectional ring of n identical processors that communicate asynchronously. The processors have no identifiers and hence the ring is called anonymous. Each processor receives an input letter, and the ring is to compute a function of the circular input string. ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
lower bounds, gap theorem. Consider a bidirectional ring of n identical processors that communicate asynchronously. The processors have no identifiers and hence the ring is called anonymous. Each processor receives an input letter, and the ring is to compute a function of the circular input string. If the function value is constant for all input strings, then the processors do not need to send any messages. On the other hand, we prove that any deterministic algorithm that computes any non-constant function for anonymous rings requires Ω(n logn) bits of communication for some input string. We also exhibit non-constant functions that require O (n logn) bits of communication for every input string. The same gap for the bit complexity of non-constant functions remains even if the processors have distinct identifier, provided that the identifiers are taken from a large enough domain. When the communication is measured in messages rather than bits, the results change. We present a non-constant function that can be computed with O (n log*n) messages on an anonymous ring. 1.
The wakeup problem
- SIAM Journal on Computing
, 1996
"... We study a new problem, the wakeup problem, that seems to be fundamental in distributed computing. We present efficient solutions to the problem and show how these solutions can be used to solve the consensus problem, the leader election problem, and other related problems. The main question we try ..."
Abstract
-
Cited by 22 (5 self)
- Add to MetaCart
We study a new problem, the wakeup problem, that seems to be fundamental in distributed computing. We present efficient solutions to the problem and show how these solutions can be used to solve the consensus problem, the leader election problem, and other related problems. The main question we try to answer is, how much memory is needed to solve the wakeup problem? We assume a model that captures important properties of real systems that have been largely ignored by previous work on cooperative problems.
A combinatorial characterization of the distributed 1-solvable tasks
- Journal of Algorithms
, 1990
"... Fischer, Lynch and Paterson showed in a fundamental paper that achieving a distributed agreement is impossible in the presence of one faulty processor. This result was later extended by Moran and Wolfstahl who showed that it holds for any task with a connected input graph and a disconnected decision ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
Fischer, Lynch and Paterson showed in a fundamental paper that achieving a distributed agreement is impossible in the presence of one faulty processor. This result was later extended by Moran and Wolfstahl who showed that it holds for any task with a connected input graph and a disconnected decision graph. In this paper we extend that latter result, and in fact we set an exact borderline between solvable and unsolvable tasks, by giving a necessary and sufficient condition for a task to be 1-solvable (that is: solvable in the presence of one faulty processor). Our characterization is purely combinatorial, and involves only relations between the input graph and the output graph, defined by the given task. It provides easy proofs for the non-solvability of tasks, and also provides a universal protocol which solves any task which is found to be solvable by our condition. Using the above characterization, we also derive a novel technique to prove lower bounds on the number of messages that must be sent due to processor failure; specifically, we provide a simple proof that for each fixed N>2 there exist distributed tasks for N processors, that can be solved in the presence of a faulty processor, but any protocol that solves them must send arbitrarily many messages in the worst case.
The SunSCALR Framework for Internet Servers
- in IEEE Fault-Tolerant Computing Systems
, 1998
"... Internet servers need to be highly-available, inexpensive, and scalable. These goals are often conflicting and most designs meet, with limited success, only few of them. In this paper we describe the SunSCALR framework that achieves these goals by combining proven technologies, careful system desig ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
Internet servers need to be highly-available, inexpensive, and scalable. These goals are often conflicting and most designs meet, with limited success, only few of them. In this paper we describe the SunSCALR framework that achieves these goals by combining proven technologies, careful system design, and engineering trade-offs. It uses a distributed, self-stabilizing algorithm for status monitoring and failure detection, and IP failover for automatic reconfiguration. SunSCALR provides high-availability against message loss, host crashes, and scheduled downtime, and allows on-the-fly addition and removal of hosts. We present detailed performance of SunSCALR. It can provide 10 second failover latency (i.e., better than 99.999% availability if machines fail for 2 hours/month). SunSCALR based products have been in use within Sun and are also available in the market. 1 Introduction SunSCALR is currently deployed in the Netra Proxy Cache Array. It is a strategic technology belonging to ...

