Results 1  10
of
48
Adding Networks
, 2001
"... An adding network is a distributed data structure that supports a concurrent, lockfree, lowcontention implementation of a fetch&add counter; a counting network is an instance of an adding network that supports only fetch&increment. We present a lower bound showing that adding networks have inher ..."
Abstract

Cited by 106 (31 self)
 Add to MetaCart
An adding network is a distributed data structure that supports a concurrent, lockfree, lowcontention implementation of a fetch&add counter; a counting network is an instance of an adding network that supports only fetch&increment. We present a lower bound showing that adding networks have inherently high latency. Any adding network powerful enough to support addition by at least two values a and b, where a > b > 0, has sequential executions in which each token traverses Ω(n/c) switching elements, where n is the number of concurrent processes, and c is a quantity we call oneshot contention; for a large class of switching networks and for conventional counting networks the oneshot contention is constant. On the contrary, counting networks have O(log n) latency [4,7]. This bound is tight. We present the first concurrent, lockfree, lowcontention networked data structure that supports arbitrary fetch&add operations.
Diffracting trees
 In Proceedings of the 5th Annual ACM Symposium on Parallel Algorithms and Architectures. ACM
, 1994
"... Shared counters are among the most basic coordination structures in multiprocessor computation, with applications ranging from barrier synchronization to concurrentdatastructure design. This article introduces diffracting trees, novel data structures for shared counting and load balancing in a dis ..."
Abstract

Cited by 58 (11 self)
 Add to MetaCart
Shared counters are among the most basic coordination structures in multiprocessor computation, with applications ranging from barrier synchronization to concurrentdatastructure design. This article introduces diffracting trees, novel data structures for shared counting and load balancing in a distributed/parallel environment. Empirical evidence, collected on a simulated distributed sharedmemory machine and several simulated messagepassing architectures, shows that diffracting trees scale better and are more robust than both combining trees and counting networks, currently the most effective known methods for implementing concurrent counters in software. The use of a randomized coordination method together with a combinatorial data structure overcomes the resiliency drawbacks of combining trees. Our simulations show that to handle the same load, diffracting trees and counting networks should have a similar width w, yet the depth of a diffracting tree is O(log w), whereas counting networks have depth O(log 2 w). Diffracting trees have already been used to implement highly efficient producer/consumer queues, and we believe diffraction will prove to be an effective alternative paradigm to combining and queuelocking in the design of many concurrent data structures.
Sharedmemory mutual exclusion: Major research trends since
 Distributed Computing
, 1986
"... * Exclusion: At most one process executes its critical section at any time. ..."
Abstract

Cited by 47 (7 self)
 Add to MetaCart
* Exclusion: At most one process executes its critical section at any time.
Hundreds of Impossibility Results for Distributed Computing
 Distributed Computing
, 2003
"... We survey results from distributed computing that show tasks to be impossible, either outright or within given resource bounds, in various models. The parameters of the models considered include synchrony, faulttolerance, different communication media, and randomization. The resource bounds refe ..."
Abstract

Cited by 44 (4 self)
 Add to MetaCart
We survey results from distributed computing that show tasks to be impossible, either outright or within given resource bounds, in various models. The parameters of the models considered include synchrony, faulttolerance, different communication media, and randomization. The resource bounds refer to time, space and message complexity. These results are useful in understanding the inherent difficulty of individual problems and in studying the power of different models of distributed computing.
Can a SharedMemory Model Serve as a Bridging Model for Parallel Computation?
, 1999
"... There has been a great deal of interest recently in the development of generalpurpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style fo ..."
Abstract

Cited by 42 (11 self)
 Add to MetaCart
There has been a great deal of interest recently in the development of generalpurpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style for designing algorithms when compared with the PRAM model. Indeed, while many consider data parallelism as a convenient style, and the sharedmemory abstraction as an easytouse platform, the bandwidth limitations of current machines have diverted much attention to messagepassing and distributedmemory models (such as the BSP and LogP) that account more properly for these limitations. In this paper we consider the question of whether a sharedmemory model can serve as an effective bridging model for parallel computation. In particular, can a sharedmemory model be as effective as, say, the BSP? As a candidate for a bridging model, we introduce the Queuing SharedMemory (QSM) model, which accounts for limited communication bandwidth while still providing a simple sharedmemory abstraction. We substantiate the ability of the QSM to serve as a bridging model by providing a simple workpreserving emulation of the QSM on both the BSP, and on a related model, the (d, x)BSP. We present evidence that the features of the QSM are essential to its effectiveness as a bridging model. In addition, we describe scenarios
A Combinatorial Treatment of Balancing Networks
, 1999
"... Balancing networks, originally introduced by Aspnes et al. (Proc. of the 23rd Annual ACM Symposium on Theory of Computing, pp. 348358, May 1991), represent a new class of distributed, lowcontention data structures suitable for solving many fundamental multiprocessor coordination problems that can ..."
Abstract

Cited by 23 (11 self)
 Add to MetaCart
Balancing networks, originally introduced by Aspnes et al. (Proc. of the 23rd Annual ACM Symposium on Theory of Computing, pp. 348358, May 1991), represent a new class of distributed, lowcontention data structures suitable for solving many fundamental multiprocessor coordination problems that can be expressed as balancing problems. In this work, we present a mathematical study of the combinatorial structure of balancing networks, andavariety of its applications. Our study identies important combinatorial transfer parameters of balancing networks. In turn, necessary and sucient combinatorial conditions are established, expressed in terms of transfer parameters, which precisely characterize many important and well studied classes of balancing networks suchascounting networks and smoothing networks.We propose these combinatorial conditions to be \balancing analogs" of the well known ZeroOne principle holding for sorting networks.
The QueueRead QueueWrite PRAM Model: Accounting for Contention in Parallel Algorithms
 Proc. 5th ACMSIAM Symp. on Discrete Algorithms
, 1997
"... Abstract. This paper introduces the queueread queuewrite (qrqw) parallel random access machine (pram) model, which permits concurrent reading and writing to sharedmemory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. Prior to thi ..."
Abstract

Cited by 23 (10 self)
 Add to MetaCart
Abstract. This paper introduces the queueread queuewrite (qrqw) parallel random access machine (pram) model, which permits concurrent reading and writing to sharedmemory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. Prior to this work there were no formal complexity models that accounted for the contention to memory locations, despite its large impact on the performance of parallel programs. The qrqw pram model reflects the contention properties of most commercially available parallel machines more accurately than either the wellstudied crcw pram or erew pram models: the crcw model does not adequately penalize algorithms with high contention to sharedmemory locations, while the erew model is too strict in its insistence on zero contention at each step. The�qrqw pram is strictly more powerful than the erew pram. This paper shows a separation of log n between the two models, and presents faster and more efficient qrqw algorithms for several basic problems, such as linear compaction, leader election, and processor allocation. Furthermore, we present a workpreserving emulation of the qrqw pram with only logarithmic slowdown on Valiant’s bsp model, and hence on hypercubetype noncombining networks, even when latency, synchronization, and memory granularity overheads are taken into account. This matches the bestknown emulation result for the erew pram, and considerably improves upon the bestknown efficient emulation for the crcw pram on such networks. Finally, the paper presents several lower bound results for this model, including lower bounds on the time required for broadcasting and for leader election.
The QueueRead QueueWrite Asynchronous PRAM Model
 EuroPar'96 Parallel Processing, Lecture Notes in Computer Science
, 1998
"... This paper presents results for the queueread, queuewrite asynchronous parallel random access machine (qrqw asynchronous pram) model, which is the asynchronous variant of the qrqw pram model. The qrqw pram family of models, which was introduced earlier by the authors, permit concurrent reading ..."
Abstract

Cited by 23 (8 self)
 Add to MetaCart
This paper presents results for the queueread, queuewrite asynchronous parallel random access machine (qrqw asynchronous pram) model, which is the asynchronous variant of the qrqw pram model. The qrqw pram family of models, which was introduced earlier by the authors, permit concurrent reading and writing to shared memory locations, but each memory location is viewed as having a queue which can service at most one request at a time. In the basic qrqw pram model each processor executes a series of reads to shared memory locations, a series of local computation steps, and a series of writes to shared memory locations, and then synchronizes with all other processors; thus this can be viewed as a bulksynchronous model. In contrast, in the qrqw asynchronous pram model discussed in this paper, there is no imposed bulksynchronization between processors, and each processor proceeds at its own pace. Thus, the qrqw asynchronous pram serves as a better model for designing and analyz...
Time/Contention Tradeoffs for Multiprocessor Synchronization
 Information and Computation
, 1996
"... We establish tradeoffs between time complexity and write and accesscontention for solutions to the mutual exclusion problem. The writecontention (accesscontention) of a concurrent program is the number of processes that may be simultaneously enabled to write (access by reading and/or writing) t ..."
Abstract

Cited by 23 (7 self)
 Add to MetaCart
We establish tradeoffs between time complexity and write and accesscontention for solutions to the mutual exclusion problem. The writecontention (accesscontention) of a concurrent program is the number of processes that may be simultaneously enabled to write (access by reading and/or writing) the same shared variable. Our notion of time complexity distinguishes between local and remote accesses of shared memory. We show that, for any Nprocess mutual exclusion algorithm, if writecontention is w, and if at most v remote variables can be accessed by a single atomic operation, then there exists an execution involving only one process in which that process executes\Omega\Gammaecu vw N) remote operations for entry into its critical section. We further show that, among these operations,\Omega\Gamma p log vw N) distinct remote variables are accessed. For algorithms with accesscontention c, we show that the latter bound can be improved to \Omega\Gamma/51 vc N ). The last two of thes...
ContentionAware Metrics for Distributed Algorithms: Comparison of Atomic Broadcast Algorithms
 in Proc. 9th IEEE Int’l Conf. on Computer Communications and Networks (IC3N 2000
, 2000
"... Resource contention is widely recognized as having a major impact on the performance of distributed algorithms. Nevertheless, the metrics that are commonly used to predict their performance take little or no account of contention. In this paper, we define two performance metrics for distributed algo ..."
Abstract

Cited by 23 (14 self)
 Add to MetaCart
Resource contention is widely recognized as having a major impact on the performance of distributed algorithms. Nevertheless, the metrics that are commonly used to predict their performance take little or no account of contention. In this paper, we define two performance metrics for distributed algorithms that account for network contention as well as CPU contention. We then illustrate the use of these metrics by comparing four Atomic Broadcast algorithms, and show that our metrics allow for a deeper understanding of performance issues than conventional metrics. 1 Introduction Performance prediction and evaluation are a central part of every scientific and engineering activity, including the construction of distributed applications. Engineers of distributed systems rely heavily on various performance evaluation techniques and have developed the necessary techniques for this activity. In contrast, algorithm designers invest considerable effort in proving the correctness of their algor...