Results 11  20
of
67
Adaptive Randomized Mutual Exclusion in SubLogarithmic Expected Time ABSTRACT
"... Mutual exclusion is a fundamental distributed coordination problem. Sharedmemory mutual exclusion research focuses on localspin algorithms and uses the remote memory references (RMRs) metric. A mutual exclusion algorithm is adaptive to point contention, if its RMR complexity is a function of the m ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
Mutual exclusion is a fundamental distributed coordination problem. Sharedmemory mutual exclusion research focuses on localspin algorithms and uses the remote memory references (RMRs) metric. A mutual exclusion algorithm is adaptive to point contention, if its RMR complexity is a function of the maximum number of processes concurrently executing their entry, critical, or exit section. In the best prior art deterministic adaptive mutual exclusion algorithm, presented by Kim and Anderson [22], a process performs O ( min(k, log N) ) RMRs as it enters and exits its critical section, where k is point contention and N is the number of processes in the system. Kim and Anderson also proved that a deterministic algorithm with o(k) RMR complexity does not exist [21]. However, they describe a randomized mutual exclusion algorithm that has O(log k) expected RMR complexity against an oblivious adversary. All these results apply for algorithms that use only atomic read and write operations. We present a randomized adaptive mutual exclusion algorithms with O(log k / log log k) expected amortized RMR complexity, even against a strong adversary, for the cachecoherent shared memory read/write model. Using techniques similar to those used in [17], our algorithm can be adapted for the distributed shared memory read/write model. This establishes that sublogarithmic adaptive mutual exclusion, using reads and writes only, is possible.
ABSTRACT An Ω(n log n) Lower Bound on the Cost of Mutual Exclusion
"... We prove an Ω(n log n) lower bound on the number of nonbusywaiting memory accesses by any deterministic algorithm solving n process mutual exclusion that communicates via shared registers. The cost of the algorithm is measured in the state change cost model, a variation of the cache coherent model. ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
We prove an Ω(n log n) lower bound on the number of nonbusywaiting memory accesses by any deterministic algorithm solving n process mutual exclusion that communicates via shared registers. The cost of the algorithm is measured in the state change cost model, a variation of the cache coherent model. Our bound is tight in this model. We introduce a novel information theoretic proof technique. We first establish a lower bound on the information needed by processes to solve mutual exclusion. Then we relate the amount of information processes can acquire through shared memory accesses to the cost they incur. We believe our proof technique is flexible and intuitive, and may be applied to a variety of other problems and system models.
Adaptive Batching for Replicated Servers ∗
"... This paper presents two novel generic adaptive batching schemes for replicated servers. Both schemes are oblivious to the underlying communication protocols. Our novel schemes adapt their batching levels automatically and immediately according to the current communication load. This is done without ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
This paper presents two novel generic adaptive batching schemes for replicated servers. Both schemes are oblivious to the underlying communication protocols. Our novel schemes adapt their batching levels automatically and immediately according to the current communication load. This is done without any explicit monitoring or calibration of the system. Additionally, the paper includes a detailed performance evaluation. 1
On the Inherent Weakness of Conditional Synchronization Primitives
 In Proceedings of the 23rd Annual ACM Symposium on Principles of Distributed Computing
, 2004
"... The “waitfree hierarchy ” classifies multiprocessor synchronization primitives according to their power to solve consensus. The classification is based on assigning a number n to each synchronization primitive, where n is the maximal number of processes for which deterministic waitfree consensus c ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
The “waitfree hierarchy ” classifies multiprocessor synchronization primitives according to their power to solve consensus. The classification is based on assigning a number n to each synchronization primitive, where n is the maximal number of processes for which deterministic waitfree consensus can be solved using instances of the primitive and read write registers. Conditional synchronization primitives, such as compareandswap and loadlinked/storeconditional, can implement deterministic waitfree consensus for any number of processes (they have consensus number ∞), and are thus considered to be among the strongest synchronization primitives. To some extent because of that, compareandswap and loadlinked/storeconditional have became the synchronization primitives of choice, and have been implemented in hardware in many multiprocessor architectures. This paper shows that, though they are strong in the context of consensus, conditional synchronization primitives are not efficient in terms of memory space for implementing many key objects. Our results hold for starvationfree implementations of mutual exclusion, and for waitfree implementations of a large class of concurrent objects, that we call Visible(n). Roughly, Visible(n) is a class that includes all objects that support some operation that must perform a “visible”
Transformations of mutual exclusion algorithms from the cachecoherent model to the distributed shared memory model
 In Proc. ICDCS 2005
, 2005
"... We present two transformations that convert a class of localspin mutual exclusion algorithms on the cachecoherent model to localspin mutual exclusion algorithms on the distributed shared memory model without increasing their time complexity. Our first transformation uses registers and testandset ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
We present two transformations that convert a class of localspin mutual exclusion algorithms on the cachecoherent model to localspin mutual exclusion algorithms on the distributed shared memory model without increasing their time complexity. Our first transformation uses registers and testandset objects, and does not increase the number of busywaiting periods. The second transformation uses only registers, but contains two busywaiting periods for each busywaiting period of the input algorithm. We carefully define the class of mutual exclusion algorithms that are applicable to our transformations, and formally prove the correctness of our transformations. 1
Mutual Exclusion withO(log 2 logn) Amortized Work
"... Abstract — This paper presents a new algorithm for mutual exclusion in which each passage through the critical section costs amortized O(log 2 logn) RMRs with high probability. The algorithm operates in a standard asynchronous, local spinning, sharedmemory model with an oblivious adversary. It guara ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract — This paper presents a new algorithm for mutual exclusion in which each passage through the critical section costs amortized O(log 2 logn) RMRs with high probability. The algorithm operates in a standard asynchronous, local spinning, sharedmemory model with an oblivious adversary. It guarantees that every process enters the critical section with high probability. The algorithm achieves its efficient performance by exploiting a connection between mutual exclusion and approximate counting. 1.
Highly efficient synchronization based on active memory operations
 In International Parallel and Distributed Processing Symposium
, 2004
"... Synchronization is a crucial operation in many parallel applications. As network latency approaches thousands of processor cycles for large scale multiprocessors, conventional synchronization techniques are failing to keep up with the increasing demand for scalable and efficient synchronization oper ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Synchronization is a crucial operation in many parallel applications. As network latency approaches thousands of processor cycles for large scale multiprocessors, conventional synchronization techniques are failing to keep up with the increasing demand for scalable and efficient synchronization operations. In this paper, we present a mechanism that allows atomic synchronization operations to be executed on the home memory controller of the synchronization variable. By performing atomic operations near where the data resides, our proposed mechanism can significantly reduce the number of network messages required by synchronization operations. Our proposed design also enhances performance by using finegrained updates to selectively “push ” the results of offloaded synchronization operations back to processors when they complete (e.g., when a barrier count reaches the desired value). We use the proposed mechanism to optimize two of the most widely used synchronization operations, barriers and spin locks. Our simulation results show that the proposed mechanism outperforms conventional implementations based on loadlinked/storeconditional, processorcentric atomic instructions, conventional memoryside atomic instructions, or active messages. It speeds up conventional barriers by up to 2.1 (4 processors) to 61.9 (256 processors) and spin locks by a factor of up to 2.0 (4 processors) to 10.4 (256 processors). 1
Fast synchronization on sharedmemory multiprocessors: An architectural approach
 Journal of Parallel and Distributed Computing
"... Synchronization is a crucial operation in many parallel applications. Conventional synchronization mechanisms are failing to keep up with the increasing demand for efficient synchronization operations as systems grow larger and network latency increases. The contributions of this paper are threefold ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
Synchronization is a crucial operation in many parallel applications. Conventional synchronization mechanisms are failing to keep up with the increasing demand for efficient synchronization operations as systems grow larger and network latency increases. The contributions of this paper are threefold. First, we revisit some representative synchronization algorithms in light of recent architecture innovations and provide an example of how the simplifying assumptions made by typical analytical models of synchronization mechanisms can lead to significant performance estimate errors. Second, we present an architectural innovation called active memory that enables very fast atomic operations in a sharedmemory multiprocessor. Third, we use executiondriven simulation to quantitatively compare the performance of a variety of synchronization mechanisms based on both existing hardware techniques and active memory operations. To the best of our knowledge, synchronization based on active memory outforms all existing spinlock and nonhardwired barrier implementations by a large margin.
Reactive Spinlocks: A Selftuning Approach
"... Reactive spinlock algorithms that can automatically adapt to contention variation on the lock have received great attention in the field of multiprocessor synchronization, since they can help applications achieve good performance in all possible contention conditions. However, in existing reactive ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Reactive spinlock algorithms that can automatically adapt to contention variation on the lock have received great attention in the field of multiprocessor synchronization, since they can help applications achieve good performance in all possible contention conditions. However, in existing reactive spinlocks the reaction relies on (i) some fixed experimentally tuned thresholds, which may get frequently inappropriate in dynamic environments like multiprogramming/multiprocessor systems, or (ii) known probability distributions of inputs. This paper presents a new reactive spinlock algorithm that is completely selftuning, which means no experimentally tuned parameter nor probability distribution of inputs are needed. The new spinlock is built on a competitive online algorithm. Our experiments, which use the Spark98 kernels and the SPLASH2 applications as application benchmarks, on a multiprocessor machine SGI Origin2000 and on an Intel Xeon workstation show that the new selftuning spinlock helps applications with different characteristics achieve good performance in a wide range of contention levels. 1.
Some myths about famous mutual exclusion algorithms
 SIGACT News
"... Dekker's algorithm[9] is the historically first software solution to mutual exclusion problem for 2process case. The first software solution for nprocess case was subsequently proposed by Dijkstra[8]. These two algorithms have become de facto examples of mutual exclusion algorithms, for thei ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Dekker's algorithm[9] is the historically first software solution to mutual exclusion problem for 2process case. The first software solution for nprocess case was subsequently proposed by Dijkstra[8]. These two algorithms have become de facto examples of mutual exclusion algorithms, for their historical importance. Since the publication of Dijkstra's algorithm, there have been many solutions proposed in the literature[24, 1, 2]. In that, Peterson's algorithm[21] is one among the very popular algorithms. Peterson's algorithm has been extensively analyzed for its elegance and compactness. This paper attempts to dispel the myths about some of the properties of these three remarkable algorithms, by a systematic analysis.