Results 11  20
of
50
Highly efficient synchronization based on active memory operations
 In International Parallel and Distributed Processing Symposium
, 2004
"... Synchronization is a crucial operation in many parallel applications. As network latency approaches thousands of processor cycles for large scale multiprocessors, conventional synchronization techniques are failing to keep up with the increasing demand for scalable and efficient synchronization oper ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Synchronization is a crucial operation in many parallel applications. As network latency approaches thousands of processor cycles for large scale multiprocessors, conventional synchronization techniques are failing to keep up with the increasing demand for scalable and efficient synchronization operations. In this paper, we present a mechanism that allows atomic synchronization operations to be executed on the home memory controller of the synchronization variable. By performing atomic operations near where the data resides, our proposed mechanism can significantly reduce the number of network messages required by synchronization operations. Our proposed design also enhances performance by using finegrained updates to selectively “push ” the results of offloaded synchronization operations back to processors when they complete (e.g., when a barrier count reaches the desired value). We use the proposed mechanism to optimize two of the most widely used synchronization operations, barriers and spin locks. Our simulation results show that the proposed mechanism outperforms conventional implementations based on loadlinked/storeconditional, processorcentric atomic instructions, conventional memoryside atomic instructions, or active messages. It speeds up conventional barriers by up to 2.1 (4 processors) to 61.9 (256 processors) and spin locks by a factor of up to 2.0 (4 processors) to 10.4 (256 processors). 1
Fast synchronization on sharedmemory multiprocessors: An architectural approach
 Journal of Parallel and Distributed Computing
"... Synchronization is a crucial operation in many parallel applications. Conventional synchronization mechanisms are failing to keep up with the increasing demand for efficient synchronization operations as systems grow larger and network latency increases. The contributions of this paper are threefold ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Synchronization is a crucial operation in many parallel applications. Conventional synchronization mechanisms are failing to keep up with the increasing demand for efficient synchronization operations as systems grow larger and network latency increases. The contributions of this paper are threefold. First, we revisit some representative synchronization algorithms in light of recent architecture innovations and provide an example of how the simplifying assumptions made by typical analytical models of synchronization mechanisms can lead to significant performance estimate errors. Second, we present an architectural innovation called active memory that enables very fast atomic operations in a sharedmemory multiprocessor. Third, we use executiondriven simulation to quantitatively compare the performance of a variety of synchronization mechanisms based on both existing hardware techniques and active memory operations. To the best of our knowledge, synchronization based on active memory outforms all existing spinlock and nonhardwired barrier implementations by a large margin.
The Weakest Failure Detector For WaitFree, Eventually Fair Mutual Exclusion
, 2007
"... We establish the necessary conditions for solving waitfree, eventually fair mutual exclusion in messagepassing environments subject to crash faults. Waitfreedom guarantees that every correct hungry process eventually enters its critical section. Eventual fairness guarantees that every run has an ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
We establish the necessary conditions for solving waitfree, eventually fair mutual exclusion in messagepassing environments subject to crash faults. Waitfreedom guarantees that every correct hungry process eventually enters its critical section. Eventual fairness guarantees that every run has an infinite suffix during which no correct hungry process is overtaken more than b times. Previously, we showed that the eventually perfect failure detector (3P) is sufficient to solve waitfree, eventually fair mutual exclusion. The present paper completes this reduction by proving that 3P is also necessary, and hence is the weakest oracle to solve this problem. Our construction uses waitfree, eventually fair exclusion to build an elastic clock that provides an eventually reliable timeout mechanism for detecting crashed processes. This leasebased implementation of 3P uses boundedcapacity, nonFIFO channels, and is crashquiescent. The construction itself may be of independent interest, insofar as it demonstrates how fairness properties can be sufficient to encapsulate temporal assumptions about partial synchrony.
Randomized Mutual Exclusion in O(log N / log log N) RMRs [Extended Abstract]
"... Mutual exclusion is a fundamental distributed coordination problem. Sharedmemory mutual exclusion research focuses on localspin algorithms and uses the remote memory references (RMRs) metric. A recent proof [9] established an Ω(log N) lower bound on the number of RMRs incurred by processes as they ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Mutual exclusion is a fundamental distributed coordination problem. Sharedmemory mutual exclusion research focuses on localspin algorithms and uses the remote memory references (RMRs) metric. A recent proof [9] established an Ω(log N) lower bound on the number of RMRs incurred by processes as they enter and exit the critical section, matching an upper bound by Yang and Anderson [18]. Both these bounds apply for algorithms that only use read and write operations. The lower bound of [9] only holds for deterministic algorithms, however; the question of whether randomized mutual exclusion algorithms, using reads and writes only, can achieve sublogarithmic expected RMR complexity remained open. This paper answers this question in the affirmative. We present two strongadversary [8] randomized localspin mutual exclusion algorithms. In both algorithms, processes incur O(log N / log log N) expected RMRs per passage in every execution. Our first algorithm has suboptimal worstcase RMR complexity of O ( (log N / log log N) 2). Our second algorithm is a variant of the first that can be combined with a deterministic algorithm, such as [18], to obtain O(log N) worstcase RMR complexity. The combined algorithm thus achieves sublogarithmic expected RMR complexity while maintaining optimal worstcase RMR complexity. Our upper bounds apply for both the cache coherent (CC) and the distributed shared memory (DSM) models.
Adaptive Randomized Mutual Exclusion in SubLogarithmic Expected Time ABSTRACT
"... Mutual exclusion is a fundamental distributed coordination problem. Sharedmemory mutual exclusion research focuses on localspin algorithms and uses the remote memory references (RMRs) metric. A mutual exclusion algorithm is adaptive to point contention, if its RMR complexity is a function of the m ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Mutual exclusion is a fundamental distributed coordination problem. Sharedmemory mutual exclusion research focuses on localspin algorithms and uses the remote memory references (RMRs) metric. A mutual exclusion algorithm is adaptive to point contention, if its RMR complexity is a function of the maximum number of processes concurrently executing their entry, critical, or exit section. In the best prior art deterministic adaptive mutual exclusion algorithm, presented by Kim and Anderson [22], a process performs O ( min(k, log N) ) RMRs as it enters and exits its critical section, where k is point contention and N is the number of processes in the system. Kim and Anderson also proved that a deterministic algorithm with o(k) RMR complexity does not exist [21]. However, they describe a randomized mutual exclusion algorithm that has O(log k) expected RMR complexity against an oblivious adversary. All these results apply for algorithms that use only atomic read and write operations. We present a randomized adaptive mutual exclusion algorithms with O(log k / log log k) expected amortized RMR complexity, even against a strong adversary, for the cachecoherent shared memory read/write model. Using techniques similar to those used in [17], our algorithm can be adapted for the distributed shared memory read/write model. This establishes that sublogarithmic adaptive mutual exclusion, using reads and writes only, is possible.
Reactive Spinlocks: A Selftuning Approach
"... Reactive spinlock algorithms that can automatically adapt to contention variation on the lock have received great attention in the field of multiprocessor synchronization, since they can help applications achieve good performance in all possible contention conditions. However, in existing reactive ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Reactive spinlock algorithms that can automatically adapt to contention variation on the lock have received great attention in the field of multiprocessor synchronization, since they can help applications achieve good performance in all possible contention conditions. However, in existing reactive spinlocks the reaction relies on (i) some fixed experimentally tuned thresholds, which may get frequently inappropriate in dynamic environments like multiprogramming/multiprocessor systems, or (ii) known probability distributions of inputs. This paper presents a new reactive spinlock algorithm that is completely selftuning, which means no experimentally tuned parameter nor probability distribution of inputs are needed. The new spinlock is built on a competitive online algorithm. Our experiments, which use the Spark98 kernels and the SPLASH2 applications as application benchmarks, on a multiprocessor machine SGI Origin2000 and on an Intel Xeon workstation show that the new selftuning spinlock helps applications with different characteristics achieve good performance in a wide range of contention levels. 1.
Transformations of mutual exclusion algorithms from the cachecoherent model to the distributed shared memory model
 In Proc. ICDCS 2005
, 2005
"... We present two transformations that convert a class of localspin mutual exclusion algorithms on the cachecoherent model to localspin mutual exclusion algorithms on the distributed shared memory model without increasing their time complexity. Our first transformation uses registers and testandset ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We present two transformations that convert a class of localspin mutual exclusion algorithms on the cachecoherent model to localspin mutual exclusion algorithms on the distributed shared memory model without increasing their time complexity. Our first transformation uses registers and testandset objects, and does not increase the number of busywaiting periods. The second transformation uses only registers, but contains two busywaiting periods for each busywaiting period of the input algorithm. We carefully define the class of mutual exclusion algorithms that are applicable to our transformations, and formally prove the correctness of our transformations. 1
Some myths about famous mutual exclusion algorithms
 SIGACT News
"... Dekker's algorithm[9] is the historically first software solution to mutual exclusion problem for 2process case. The first software solution for nprocess case was subsequently proposed by Dijkstra[8]. These two algorithms have become de facto examples of mutual exclusion algorithms, for thei ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Dekker's algorithm[9] is the historically first software solution to mutual exclusion problem for 2process case. The first software solution for nprocess case was subsequently proposed by Dijkstra[8]. These two algorithms have become de facto examples of mutual exclusion algorithms, for their historical importance. Since the publication of Dijkstra's algorithm, there have been many solutions proposed in the literature[24, 1, 2]. In that, Peterson's algorithm[21] is one among the very popular algorithms. Peterson's algorithm has been extensively analyzed for its elegance and compactness. This paper attempts to dispel the myths about some of the properties of these three remarkable algorithms, by a systematic analysis.
Timingbased mutual exclusion with local spinning
 In 17th international symposium on distributed computing, October 2003. LNCS 2848
, 2003
"... Abstract We consider the time complexity of sharedmemory mutual exclusion algorithms based on reads, writes, and comparison primitives under the remotememoryreference (RMR) time measure. For asynchronous systems, a lower bound of \Omega (log N / log log N) RMRs per criticalsection entry has been ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract We consider the time complexity of sharedmemory mutual exclusion algorithms based on reads, writes, and comparison primitives under the remotememoryreference (RMR) time measure. For asynchronous systems, a lower bound of \Omega (log N / log log N) RMRs per criticalsection entry has been established in previous work, where N is the number of processes. Also, algorithms with O(log N) time complexity are known. Thus, for algorithms in this class, logarithmic or nearlogarithmic RMR time complexity is fundamentally required.