Results 1 - 10
of
28
Efficient Adaptive Collect using Randomization
- PROC. OF THE INTL. SYMP. ON DISTRIBUTED COMPUTING (DISC
, 2004
"... An adaptive algorithm, whose step complexity adjusts to the number of active processes, is attractive for distributed systems with a highly-variable number of processes. The cornerstone of many adaptive algorithms is an adaptive mechanism to collect up-to-date information from all participating p ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
An adaptive algorithm, whose step complexity adjusts to the number of active processes, is attractive for distributed systems with a highly-variable number of processes. The cornerstone of many adaptive algorithms is an adaptive mechanism to collect up-to-date information from all participating processes. To date, all known collect algorithms either have non-linear step complexity or they are impractical because of unrealistic memory overhead. This paper
Constant-RMR Implementations of CAS and Other Synchronization Primitives Using Read and Write Operations (Extended Abstract)
- PODC'07
, 2007
"... We consider asynchronous multiprocessors where processes communicate only by reading or writing shared memory. We show how to implement consensus, all comparison primitives (such as CAS and TAS), and load-linked/store-conditional using only a constant number of remote memory references (RMRs), in bo ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We consider asynchronous multiprocessors where processes communicate only by reading or writing shared memory. We show how to implement consensus, all comparison primitives (such as CAS and TAS), and load-linked/store-conditional using only a constant number of remote memory references (RMRs), in both the cache-coherent and the distributed-shared-memory models of such multiprocessors. Our implementations are blocking, rather than wait-free: they ensure progress provided all processes that invoke the implemented primitive are live. Our results imply that any algorithm using read and write operations, comparison primitives, and load-linked/storeconditional, can be simulated by an algorithm that uses read and write operations only, with at most a constant blowup in RMR complexity.
On the Inherent Weakness of Conditional Synchronization Primitives
- In Proceedings of the 23rd Annual ACM Symposium on Principles of Distributed Computing
, 2004
"... The “wait-free hierarchy ” classifies multiprocessor synchronization primitives according to their power to solve consensus. The classification is based on assigning a number n to each synchronization primitive, where n is the maximal number of processes for which deterministic wait-free consensus c ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
The “wait-free hierarchy ” classifies multiprocessor synchronization primitives according to their power to solve consensus. The classification is based on assigning a number n to each synchronization primitive, where n is the maximal number of processes for which deterministic wait-free consensus can be solved using instances of the primitive and read write registers. Conditional synchronization primitives, such as compare-and-swap and load-linked/store-conditional, can implement deterministic wait-free consensus for any number of processes (they have consensus number ∞), and are thus considered to be among the strongest synchronization primitives. To some extent because of that, compare-and-swap and load-linked/store-conditional have became the synchronization primitives of choice, and have been implemented in hardware in many multiprocessor architectures. This paper shows that, though they are strong in the context of consensus, conditional synchronization primitives are not efficient in terms of memory space for implementing many key objects. Our results hold for starvation-free implementations of mutual exclusion, and for wait-free implementations of a large class of concurrent objects, that we call Visible(n). Roughly, Visible(n) is a class that includes all objects that support some operation that must perform a “visible”
ABSTRACT An Ω(n log n) Lower Bound on the Cost of Mutual Exclusion
"... We prove an Ω(n log n) lower bound on the number of nonbusywaiting memory accesses by any deterministic algorithm solving n process mutual exclusion that communicates via shared registers. The cost of the algorithm is measured in the state change cost model, a variation of the cache coherent model. ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We prove an Ω(n log n) lower bound on the number of nonbusywaiting memory accesses by any deterministic algorithm solving n process mutual exclusion that communicates via shared registers. The cost of the algorithm is measured in the state change cost model, a variation of the cache coherent model. Our bound is tight in this model. We introduce a novel information theoretic proof technique. We first establish a lower bound on the information needed by processes to solve mutual exclusion. Then we relate the amount of information processes can acquire through shared memory accesses to the cost they incur. We believe our proof technique is flexible and intuitive, and may be applied to a variety of other problems and system models.
An optimal k-exclusion real-time locking protocol motivated by multi-GPU systems
- In RTNS ’11
, 2011
"... Graphics processing units (GPUs) are becoming increasingly important in today’s platforms as their increased generality allows for them to be used as powerful co-processors. In previous work, we have found that GPUs may be integrated into real-time systems through the treatment of GPUs as shared res ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Graphics processing units (GPUs) are becoming increasingly important in today’s platforms as their increased generality allows for them to be used as powerful co-processors. In previous work, we have found that GPUs may be integrated into real-time systems through the treatment of GPUs as shared resources, allocated to real-time tasks through mutual exclusion locking protocols. In this paper, we present an optimal k-exclusion locking protocol for globally-scheduled job-level static-priority (JLSP) systems. This protocol may be used to manage a pool of GPU resources in such systems. 1
Adaptive Batching for Replicated Servers ∗
"... This paper presents two novel generic adaptive batching schemes for replicated servers. Both schemes are oblivious to the underlying communication protocols. Our novel schemes adapt their batching levels automatically and immediately according to the current communication load. This is done without ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper presents two novel generic adaptive batching schemes for replicated servers. Both schemes are oblivious to the underlying communication protocols. Our novel schemes adapt their batching levels automatically and immediately according to the current communication load. This is done without any explicit monitoring or calibration of the system. Additionally, the paper includes a detailed performance evaluation. 1
Highly efficient synchronization based on active memory operations
- In International Parallel and Distributed Processing Symposium
, 2004
"... Synchronization is a crucial operation in many parallel applications. As network latency approaches thousands of processor cycles for large scale multiprocessors, conventional synchronization techniques are failing to keep up with the increasing demand for scalable and efficient synchronization oper ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Synchronization is a crucial operation in many parallel applications. As network latency approaches thousands of processor cycles for large scale multiprocessors, conventional synchronization techniques are failing to keep up with the increasing demand for scalable and efficient synchronization operations. In this paper, we present a mechanism that allows atomic synchronization operations to be executed on the home memory controller of the synchronization variable. By performing atomic operations near where the data resides, our proposed mechanism can significantly reduce the number of network messages required by synchronization operations. Our proposed design also enhances performance by using finegrained updates to selectively “push ” the results of offloaded synchronization operations back to processors when they complete (e.g., when a barrier count reaches the desired value). We use the proposed mechanism to optimize two of the most widely used synchronization operations, barriers and spin locks. Our simulation results show that the proposed mechanism outperforms conventional implementations based on load-linked/store-conditional, processor-centric atomic instructions, conventional memory-side atomic instructions, or active messages. It speeds up conventional barriers by up to 2.1 (4 processors) to 61.9 (256 processors) and spin locks by a factor of up to 2.0 (4 processors) to 10.4 (256 processors). 1
Some myths about famous mutual exclusion algorithms
- SIGACT News
"... Dekker's algorithm[9] is the historically first software solution to mutual exclusion problem for 2-process case. The first software solution for n-process case was subse-quently proposed by Dijkstra[8]. These two algorithms have become de facto examples of mutual exclusion algorithms, for their his ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Dekker's algorithm[9] is the historically first software solution to mutual exclusion problem for 2-process case. The first software solution for n-process case was subse-quently proposed by Dijkstra[8]. These two algorithms have become de facto examples of mutual exclusion algorithms, for their historical importance. Since the publication of Dijkstra's algorithm, there have been many solutions proposed in the literature[24, 1, 2]. In that, Peterson's algorithm[21] is one among the very popular algorithms. Peterson's algorithm has been extensively analyzed for its elegance and compactness. This paper attempts to dispel the myths about some of the properties of these three remarkable algorithms, by a systematic analysis.
Verification Manager: Automating the Verification Process by
, 2009
"... is permitted for educational or research use on condition that this copyright notice is included in any copy. Publications in the FI MU Report Series are in general accessible via WWW: ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
is permitted for educational or research use on condition that this copyright notice is included in any copy. Publications in the FI MU Report Series are in general accessible via WWW:
Reactive Spin-locks: A Self-tuning Approach
"... Reactive spin-lock algorithms that can automatically adapt to contention variation on the lock have received great attention in the field of multiprocessor synchronization, since they can help applications achieve good performance in all possible contention conditions. However, in existing reactive ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Reactive spin-lock algorithms that can automatically adapt to contention variation on the lock have received great attention in the field of multiprocessor synchronization, since they can help applications achieve good performance in all possible contention conditions. However, in existing reactive spin-locks the reaction relies on (i) some fixed experimentally tuned thresholds, which may get frequently inappropriate in dynamic environments like multiprogramming/multiprocessor systems, or (ii) known probability distributions of inputs. This paper presents a new reactive spin-lock algorithm that is completely self-tuning, which means no experimentally tuned parameter nor probability distribution of inputs are needed. The new spin-lock is built on a competitive online algorithm. Our experiments, which use the Spark98 kernels and the SPLASH-2 applications as application benchmarks, on a multiprocessor machine SGI Origin2000 and on an Intel Xeon workstation show that the new self-tuning spin-lock helps applications with different characteristics achieve good performance in a wide range of contention levels. 1.

