Results 1 - 10
of
40
Adding Networks
, 2001
"... An adding network is a distributed data structure that supports a concurrent, lock-free, low-contention implementation of a fetch&add counter; a counting network is an instance of an adding network that supports only fetch&increment. We present a lower bound showing that adding networks have inher ..."
Abstract
-
Cited by 84 (28 self)
- Add to MetaCart
An adding network is a distributed data structure that supports a concurrent, lock-free, low-contention implementation of a fetch&add counter; a counting network is an instance of an adding network that supports only fetch&increment. We present a lower bound showing that adding networks have inherently high latency. Any adding network powerful enough to support addition by at least two values a and b, where |a |> |b |> 0, has sequential executions in which each token traverses Ω(n/c) switching elements, where n is the number of concurrent processes, and c is a quantity we call one-shot contention; for a large class of switching networks and for conventional counting networks the one-shot contention is constant. On the contrary, counting networks have O(log n) latency [4,7]. This bound is tight. We present the first concurrent, lock-free, lowcontention networked data structure that supports arbitrary fetch&add operations.
Diffracting trees
- In Proceedings of the 5th Annual ACM Symposium on Parallel Algorithms and Architectures. ACM
, 1994
"... Shared counters are among the most basic coordination structures in multiprocessor computation, with applications ranging from barrier synchronization to concurrent-data-structure design. This article introduces diffracting trees, novel data structures for shared counting and load balancing in a dis ..."
Abstract
-
Cited by 52 (10 self)
- Add to MetaCart
Shared counters are among the most basic coordination structures in multiprocessor computation, with applications ranging from barrier synchronization to concurrent-data-structure design. This article introduces diffracting trees, novel data structures for shared counting and load balancing in a distributed/parallel environment. Empirical evidence, collected on a simulated distributed shared-memory machine and several simulated message-passing architectures, shows that diffracting trees scale better and are more robust than both combining trees and counting networks, currently the most effective known methods for implementing concurrent counters in software. The use of a randomized coordination method together with a combinatorial data structure overcomes the resiliency drawbacks of combining trees. Our simulations show that to handle the same load, diffracting trees and counting networks should have a similar width w, yet the depth of a diffracting tree is O(log w), whereas counting networks have depth O(log 2 w). Diffracting trees have already been used to implement highly efficient producer/consumer queues, and we believe diffraction will prove to be an effective alternative paradigm to combining and queue-locking in the design of many concurrent data structures.
Can a Shared-Memory Model Serve as a Bridging Model for Parallel Computation?
, 1999
"... There has been a great deal of interest recently in the development of general-purpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style fo ..."
Abstract
-
Cited by 41 (11 self)
- Add to MetaCart
There has been a great deal of interest recently in the development of general-purpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style for designing algorithms when compared with the PRAM model. Indeed, while many consider data parallelism as a convenient style, and the shared-memory abstraction as an easyto-use platform, the bandwidth limitations of current machines have diverted much attention to message-passing and distributed-memory models (such as the BSP and LogP) that account more properly for these limitations. In this paper we consider the question of whether a shared-memory model can serve as an effective bridging model for parallel computation. In particular, can a shared-memory model be as effective as, say, the BSP? As a candidate for a bridging model, we introduce the Queuing Shared-Memory (QSM) model, which accounts for limited communication bandwidth while still providing a simple shared-memory abstraction. We substantiate the ability of the QSM to serve as a bridging model by providing a simple work-preserving emulation of the QSM on both the BSP, and on a related model, the (d, x)-BSP. We present evidence that the features of the QSM are essential to its effectiveness as a bridging model. In addition, we describe scenarios
Shared-memory mutual exclusion: Major research trends since
- Distributed Computing
, 1986
"... * Exclusion: At most one process executes its critical section at any time. ..."
Abstract
-
Cited by 38 (7 self)
- Add to MetaCart
* Exclusion: At most one process executes its critical section at any time.
Hundreds of Impossibility Results for Distributed Computing
- Distributed Computing
, 2003
"... We survey results from distributed computing that show tasks to be impossible, either outright or within given resource bounds, in various models. The parameters of the models considered include synchrony, fault-tolerance, different communication media, and randomization. The resource bounds refe ..."
Abstract
-
Cited by 32 (4 self)
- Add to MetaCart
We survey results from distributed computing that show tasks to be impossible, either outright or within given resource bounds, in various models. The parameters of the models considered include synchrony, fault-tolerance, different communication media, and randomization. The resource bounds refer to time, space and message complexity. These results are useful in understanding the inherent difficulty of individual problems and in studying the power of different models of distributed computing.
A Combinatorial Treatment of Balancing Networks
, 1999
"... Balancing networks, originally introduced by Aspnes et al. (Proc. of the 23rd Annual ACM Symposium on Theory of Computing, pp. 348-358, May 1991), represent a new class of distributed, low-contention data structures suitable for solving many fundamental multi-processor coordination problems that can ..."
Abstract
-
Cited by 23 (11 self)
- Add to MetaCart
Balancing networks, originally introduced by Aspnes et al. (Proc. of the 23rd Annual ACM Symposium on Theory of Computing, pp. 348-358, May 1991), represent a new class of distributed, low-contention data structures suitable for solving many fundamental multi-processor coordination problems that can be expressed as balancing problems. In this work, we present a mathematical study of the combinatorial structure of balancing networks, andavariety of its applications. Our study identies important combinatorial transfer parameters of balancing networks. In turn, necessary and sucient combinatorial conditions are established, expressed in terms of transfer parameters, which precisely characterize many important and well studied classes of balancing networks suchascounting networks and smoothing networks.We propose these combinatorial conditions to be \balancing analogs" of the well known Zero-One principle holding for sorting networks.
Contention-Aware Metrics for Distributed Algorithms: Comparison of Atomic Broadcast Algorithms
- in Proc. 9th IEEE Int’l Conf. on Computer Communications and Networks (IC3N 2000
, 2000
"... Resource contention is widely recognized as having a major impact on the performance of distributed algorithms. Nevertheless, the metrics that are commonly used to predict their performance take little or no account of contention. In this paper, we define two performance metrics for distributed algo ..."
Abstract
-
Cited by 21 (12 self)
- Add to MetaCart
Resource contention is widely recognized as having a major impact on the performance of distributed algorithms. Nevertheless, the metrics that are commonly used to predict their performance take little or no account of contention. In this paper, we define two performance metrics for distributed algorithms that account for network contention as well as CPU contention. We then illustrate the use of these metrics by comparing four Atomic Broadcast algorithms, and show that our metrics allow for a deeper understanding of performance issues than conventional metrics. 1 Introduction Performance prediction and evaluation are a central part of every scientific and engineering activity, including the construction of distributed applications. Engineers of distributed systems rely heavily on various performance evaluation techniques and have developed the necessary techniques for this activity. In contrast, algorithm designers invest considerable effort in proving the correctness of their algor...
The Queue-Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms
- Proc. 5th ACM-SIAM Symp. on Discrete Algorithms
, 1997
"... Abstract. This paper introduces the queue-read queue-write (qrqw) parallel random access machine (pram) model, which permits concurrent reading and writing to shared-memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. Prior to thi ..."
Abstract
-
Cited by 21 (10 self)
- Add to MetaCart
Abstract. This paper introduces the queue-read queue-write (qrqw) parallel random access machine (pram) model, which permits concurrent reading and writing to shared-memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. Prior to this work there were no formal complexity models that accounted for the contention to memory locations, despite its large impact on the performance of parallel programs. The qrqw pram model reflects the contention properties of most commercially available parallel machines more accurately than either the well-studied crcw pram or erew pram models: the crcw model does not adequately penalize algorithms with high contention to shared-memory locations, while the erew model is too strict in its insistence on zero contention at each step. The�qrqw pram is strictly more powerful than the erew pram. This paper shows a separation of log n between the two models, and presents faster and more efficient qrqw algorithms for several basic problems, such as linear compaction, leader election, and processor allocation. Furthermore, we present a work-preserving emulation of the qrqw pram with only logarithmic slowdown on Valiant’s bsp model, and hence on hypercube-type noncombining networks, even when latency, synchronization, and memory granularity overheads are taken into account. This matches the bestknown emulation result for the erew pram, and considerably improves upon the best-known efficient emulation for the crcw pram on such networks. Finally, the paper presents several lower bound results for this model, including lower bounds on the time required for broadcasting and for leader election.
The Queue-Read Queue-Write Asynchronous PRAM Model
- Euro-Par'96 Parallel Processing, Lecture Notes in Computer Science
, 1998
"... This paper presents results for the queue-read, queue-write asynchronous parallel random access machine (qrqw asynchronous pram) model, which is the asynchronous variant of the qrqw pram model. The qrqw pram family of models, which was introduced earlier by the authors, permit concurrent reading ..."
Abstract
-
Cited by 21 (8 self)
- Add to MetaCart
This paper presents results for the queue-read, queue-write asynchronous parallel random access machine (qrqw asynchronous pram) model, which is the asynchronous variant of the qrqw pram model. The qrqw pram family of models, which was introduced earlier by the authors, permit concurrent reading and writing to shared memory locations, but each memory location is viewed as having a queue which can service at most one request at a time. In the basic qrqw pram model each processor executes a series of reads to shared memory locations, a series of local computation steps, and a series of writes to shared memory locations, and then synchronizes with all other processors; thus this can be viewed as a bulk-synchronous model. In contrast, in the qrqw asynchronous pram model discussed in this paper, there is no imposed bulksynchronization between processors, and each processor proceeds at its own pace. Thus, the qrqw asynchronous pram serves as a better model for designing and analyz...
Time/Contention Trade-offs for Multiprocessor Synchronization
- Information and Computation
, 1996
"... We establish trade-offs between time complexity and write- and access-contention for solutions to the mutual exclusion problem. The write-contention (access-contention) of a concurrent program is the number of processes that may be simultaneously enabled to write (access by reading and/or writing) t ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
We establish trade-offs between time complexity and write- and access-contention for solutions to the mutual exclusion problem. The write-contention (access-contention) of a concurrent program is the number of processes that may be simultaneously enabled to write (access by reading and/or writing) the same shared variable. Our notion of time complexity distinguishes between local and remote accesses of shared memory. We show that, for any N-process mutual exclusion algorithm, if write-contention is w, and if at most v remote variables can be accessed by a single atomic operation, then there exists an execution involving only one process in which that process executes\Omega\Gammaecu vw N) remote operations for entry into its critical section. We further show that, among these operations,\Omega\Gamma p log vw N) distinct remote variables are accessed. For algorithms with access-contention c, we show that the latter bound can be improved to \Omega\Gamma/51 vc N ). The last two of thes...

