Results 1  10
of
13
Contention in Shared Memory Algorithms
, 1993
"... Most complexitymeasures for concurrent algorithms for asynchronous sharedmemory architectures focus on process steps and memory consumption. In practice, however, performance of multiprocessor algorithms is heavily influenced by contention, the extent to which processes access the same location at t ..."
Abstract

Cited by 63 (1 self)
 Add to MetaCart
Most complexitymeasures for concurrent algorithms for asynchronous sharedmemory architectures focus on process steps and memory consumption. In practice, however, performance of multiprocessor algorithms is heavily influenced by contention, the extent to which processes access the same location at the same time. Nevertheless, even though contention is one of the principal considerations affecting the performance of real algorithms on real multiprocessors, there are no formal tools for analyzing the contention of asynchronous sharedmemory algorithms. This paper introduces the first formal complexity model for contention in multiprocessors. We focus on the standard multiprocessor architecture in which n asynchronous processes communicate by applying read, write, and readmodifywrite operations to a shared memory. We use our model to derive two kinds of results: (1) lower bounds on contention for well known basic problems such as agreement and mutual exclusion, and (2) tradeoffs betwe...
Reactive Synchronization Algorithms for Multiprocessors
"... Synchronization algorithms that are efficient across a wide range of applications and operating conditions are hard to design because their performance depends on unpredictable runtime factors. The designer of a synchronization algorithm has a choice of protocols to use for implementing the synchro ..."
Abstract

Cited by 50 (2 self)
 Add to MetaCart
Synchronization algorithms that are efficient across a wide range of applications and operating conditions are hard to design because their performance depends on unpredictable runtime factors. The designer of a synchronization algorithm has a choice of protocols to use for implementing the synchronization operation. For example, candidate protocols for locks include testandset protocols and queueing protocols. Frequently, the best choice of protocols depends on the level of contention: previous research has shown that testandset protocols for locks outperform queueing protocols at low contention, while the opposite is true at high contention. This paper investigates reactive synchronization algorithms that dynamically choose protocols in response to the level of contention. We describe reactive algorithms for spin locks and fetchandop that choose among several sharedmemory and messagepassing protocols. Dynamically choosing protocols presents a challenge: a reactive algorithm needs to select and change protocols efficiently, and has to allow for the possibility that multiple processes may be executing different protocols at the same time. We describe the notion of consensus objects that the reactive algorithms use to preserve correctness in the face of dynamic protocol changes. Experimental measurements demonstrate that reactive algorithms perform close to the best static choice of protocols at all levels of contention. Furthermore, with mixed levels of contention, reactive algorithms outperform passive algorithms with fixed protocols, provided that contention levels do not change too frequently. Measurements of several parallel applications show that reactive algorithms result in modest performance gains for spin locks and significant gains for fetchandop.
An Improved Lower Bound for the Time Complexity of Mutual Exclusion (Extended Abstract)
 IN PROCEEDINGS OF THE 20TH ANNUAL ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING
, 2001
"... We establish a lower bound of 23 N= log log N) remote memory references for Nprocess mutual exclusion algorithms based on reads, writes, or comparison primitives such as testandset and compareand swap. Our bound improves an earlier lower bound of 32 log N= log log log N) established by Cyph ..."
Abstract

Cited by 41 (12 self)
 Add to MetaCart
We establish a lower bound of 23 N= log log N) remote memory references for Nprocess mutual exclusion algorithms based on reads, writes, or comparison primitives such as testandset and compareand swap. Our bound improves an earlier lower bound of 32 log N= log log log N) established by Cypher. Our lower bound is of importance for two reasons. First, it almost matches the (log N) time complexity of the bestknown algorithms based on reads, writes, or comparison primitives. Second, our lower bound suggests that it is likely that, from an asymptotic standpoint, comparison primitives are no better than reads and writes when implementing localspin mutual exclusion algorithms. Thus, comparison primitives may not be the best choice to provide in hardware if one is interested in scalable synchronization.
SchedulerConscious Synchronization
, 1994
"... Efficient synchronization is important for achieving good performance in parallel programs, especially on largescale multiprocessors. Most synchronization algorithms have been designed to run on a dedicated machine, with one application process per processor, and can suffer serious performance degr ..."
Abstract

Cited by 39 (7 self)
 Add to MetaCart
Efficient synchronization is important for achieving good performance in parallel programs, especially on largescale multiprocessors. Most synchronization algorithms have been designed to run on a dedicated machine, with one application process per processor, and can suffer serious performance degradation in the presence of multiprogramming. Problems arise when running processes block or, worse, busywait for action on the part of a process that the scheduler has chosen not to run. In this paper we describe and evaluate a set of schedulerconscious synchronization algorithms that perform well in the presence of multiprogramming while maintaining good performance on dedicated machines. We consider both large and small machines, with a particular focus on scalability, and examine mutualexclusion locks, readerwriter locks, and barriers. The algorithms we study fall into two classes: those that heuristically determine appropriate behavior and those that use scheduler information to guide their behavior. We show that while in some cases either method is sufficient, in general sharing information across the kerneluser interface both eases the design of synchronization algorithms and improves their performance.
TimeAdaptive Algorithms for Synchronization
 SIAM J. Comput
, 1994
"... We consider concurrent systems in which there is an unknown upper bound on memory access time. Such a model is inherently different from asynchronous model where no such bound exists, and also from timingbased models where such a bound exists and is known a priori. The appeal of our model lies in t ..."
Abstract

Cited by 27 (6 self)
 Add to MetaCart
We consider concurrent systems in which there is an unknown upper bound on memory access time. Such a model is inherently different from asynchronous model where no such bound exists, and also from timingbased models where such a bound exists and is known a priori. The appeal of our model lies in the fact that while it abstracts from implementation details, it is a better approximation of real concurrent systems compared to the asynchronous model. Furthermore, it is stronger than the asynchronous model enabling us to design algorithms for problems that are unsolvable in the asynchronous model. Two basic synchronization problems, consensus and mutual exclusion, are investigated in a shared memory environment that supports atomic read/write registers. We show that \Theta(\Delta log \Delta log log \Delta ) is an upper and lower bound on the time complexity of consensus, where \Delta is the (unknown) upper bound on memory access time. For the mutual exclusion problem, we design an effic...
Fast and Scalable Mutual Exclusion
 In Proceedings of the 13th International Symposium on Distributed Computing
, 1999
"... . We present an Nprocess algorithm for mutual exclusion under read/write atomicity that has O(1) time complexity in the absence of contention and \Theta(log N) time complexity under contention, where "time" is measured by counting remote memory references. This is the first such algorithm to ac ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
. We present an Nprocess algorithm for mutual exclusion under read/write atomicity that has O(1) time complexity in the absence of contention and \Theta(log N) time complexity under contention, where "time" is measured by counting remote memory references. This is the first such algorithm to achieve these time complexity bounds. Our algorithm is obtained by combining a new "fastpath" mechanism with an arbitrationtree algorithm presented previously by Yang and Anderson. 1 Introduction Recent work on mutual exclusion [3] has focused on the design of "scalable" algorithms that minimize the impact of the processortomemory bottleneck through the use of local spinning . A mutual exclusion algorithm is scalable if its performance degrades only slightly as the number of contending processes increases. In localspin mutual exclusion algorithms, good scalability is achieved by requiring all busywaiting loops to be readonly loops in which only locallyaccessible shared variables ar...
Contentionfree Complexity of Shared Memory Algorithms
 Information and Computation
, 1994
"... Worstcase time complexity is a measure of the maximumtime needed to solve a problem over all runs. Contentionfree time complexity indicates the maximum time needed when a process executes by itself, without competition from other processes. Since contention is rare in welldesigned systems, it is ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
Worstcase time complexity is a measure of the maximumtime needed to solve a problem over all runs. Contentionfree time complexity indicates the maximum time needed when a process executes by itself, without competition from other processes. Since contention is rare in welldesigned systems, it is important to design algorithms which perform well in the absence of contention. We study the contentionfree time complexity of shared memory algorithms using two measures: step complexity, which counts the number of accesses to shared registers; and register complexity, which measures the number of different registers accessed. Depending on the system architecture, one of the two measures more accurately reflects the elapsed time. We provide lower and upper bounds for the contentionfree step and register complexity of solving the mutual exclusion problem as a function of the number of processes and the size of the largest register that can be accessed in one atomic step. We also present bo...
Lamport on Mutual Exclusion: 27 Years of Planting Seeds
 In 20th ACM Symposium on Principles of Distributed Computing
, 2001
"... Mutual exclusion is a topic that Leslie Lamport has returned to many times throughout his career. This article, which is being written in celebration of Lamport's sixtieth birthday, is an attempt to survey some of his many contributions to research on this topic. ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Mutual exclusion is a topic that Leslie Lamport has returned to many times throughout his career. This article, which is being written in celebration of Lamport's sixtieth birthday, is an attempt to survey some of his many contributions to research on this topic.
A New FastPath Mechanism for Mutual Exclusion
 Distributed Computing
, 1999
"... In 1993, Yang and Anderson presented an Nprocess algorithm for mutual exclusion under read/write atomicity that has \Theta(log N) time complexity, where "time" is measured by counting remote memory references. In this algorithm, instances of a twoprocess mutual exclusion algorithm are embedded w ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
In 1993, Yang and Anderson presented an Nprocess algorithm for mutual exclusion under read/write atomicity that has \Theta(log N) time complexity, where "time" is measured by counting remote memory references. In this algorithm, instances of a twoprocess mutual exclusion algorithm are embedded within a binary arbitration tree. In the twoprocess algorithm that was used, all busywaiting is done by "local spinning." Performance studies presented by Yang and Anderson showed that their Nprocess algorithm exhibits scalable performance under heavy contention. One drawback of using an arbitration tree, however, is that each process is required to perform \Theta(log N) remote memory operations even when there is no contention. To remedy this problem, Yang and Anderson presented a variant of their algorithm that includes a "fastpath" mechanism that allows the arbitration tree to be bypassed in the absence of contention. This algorithm has the desirable property that contentionfre...
BOOLE: A Boundary Evaluation System for Boolean Combinations of Sculptured Solids
, 2000
"... In this paper we describe a system, BOOLE, that generates the boundary representations (Breps) of solids given as a CSG expression in the form of trimmed B'ezier patches. The system makes use of techniques from computational geometry, numerical linear algebra and symbolic computation to generate ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
In this paper we describe a system, BOOLE, that generates the boundary representations (Breps) of solids given as a CSG expression in the form of trimmed B'ezier patches. The system makes use of techniques from computational geometry, numerical linear algebra and symbolic computation to generate the Breps. Given two solids, the system first computes the intersection curve between the two solids using our surface intersection algorithm. Using the topological information of each solid, it computes various components within each solid generated by the intersection curve and their connectivity. The component classification step is performed by rayshooting. Depending on the Boolean operation performed, appropriate components are put together to obtain the final solid. We also present techniques to parallelize this system on shared memory multiprocessor machines. The system has been successfully used to generate Breps for a number of large industrial models including parts of ...