## Diffracting trees (1994)

### Cached

### Download Links

- [www.cs.bgu.ac.il]
- [www.cs.bgu.ac.il]
- [theory.csail.mit.edu]
- [theory.csail.mit.edu]
- [www.math.tau.ac.il]
- [www.math.tau.ac.il]
- [www.cs.brown.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the 5th Annual ACM Symposium on Parallel Algorithms and Architectures. ACM |

Citations: | 56 - 12 self |

### BibTeX

@INPROCEEDINGS{Shavit94diffractingtrees,

author = {Nir Shavit and Asaph Zemach},

title = {Diffracting trees},

booktitle = {In Proceedings of the 5th Annual ACM Symposium on Parallel Algorithms and Architectures. ACM},

year = {1994}

}

### Years of Citing Articles

### OpenURL

### Abstract

Shared counters are among the most basic coordination structures in multiprocessor computation, with applications ranging from barrier synchronization to concurrent-data-structure design. This article introduces diffracting trees, novel data structures for shared counting and load balancing in a distributed/parallel environment. Empirical evidence, collected on a simulated distributed shared-memory machine and several simulated message-passing architectures, shows that diffracting trees scale better and are more robust than both combining trees and counting networks, currently the most effective known methods for implementing concurrent counters in software. The use of a randomized coordination method together with a combinatorial data structure overcomes the resiliency drawbacks of combining trees. Our simulations show that to handle the same load, diffracting trees and counting networks should have a similar width w, yet the depth of a diffracting tree is O(log w), whereas counting networks have depth O(log 2 w). Diffracting trees have already been used to implement highly efficient producer/consumer queues, and we believe diffraction will prove to be an effective alternative paradigm to combining and queue-locking in the design of many concurrent data structures.

### Citations

3584 | Optimization by Simulated Annealing
- Kirkpatrick, Gelatt, et al.
- 1983
(Show Context)
Citation Context ...gh to a lesser extent than a combining tree), the dynamic flow patterns of diffracting trees make layout optimization much less effective. In our experiments we used the simulated annealing algorithm =-=[30]-=- to attempt to minimize the average distance traveled per message for each data structure. Figure 24 compares the performance of combining and diffracting trees, with and without layout optimization, ... |

1543 |
A.: Distributed Algorithms
- Lynch
- 2007
(Show Context)
Citation Context ...an be assumed to take place at a unique point in time. We assume the machine's shared memory to be a collection of "memory locations," each of which follows the specification of of an atomic=-= register [34]-=-. The operations on each memory location (and therefore the values it takes) can be ordered chronologically, and atomicity assures us that this ordering is well defined. Thus one can draw a time-line ... |

928 | Linearizability: A Correctness Condition for Concurrent Objects - Herlihy, Wing - 1990 |

730 | Wait-free synchronization
- Herlihy
- 1991
(Show Context)
Citation Context ...ust in terms of their ability to handle unexpected latencies and differing loads. Note also that like counting networks but unlike combining trees, diffracting trees can be implemented in a wait-free =-=[26]-=- manner (given the appropriate hardware primitives). By this we mean that for each increment operation termination is guaranteed in a bounded number of steps independently of the pace or even a possib... |

506 | Sorting networks and their applications
- BATCHER
- 1968
(Show Context)
Citation Context ...onic counting network of Aspnes, Herlihy, and Shavit [7] of width 64. A Bitonic counting network is a network of two-input-two-output balancers having a layout isomorphic to a Bitonic sorting network =-=[8]-=-. Each processor performing an increment operation travels 16 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 32 64 96 128 160 192 224 256 Processors CNet[64] CTree[n] DTree[32] MCS Exp. Backo... |

488 | Algorithms for scalable synchronization on shared-memory multiprocessors
- Mellor-Crummey, Scott
- 1991
(Show Context)
Citation Context ...ard to imagine a program that doesn't count something, and indeed, on multiprocessor machines shared counters are the key to solving a variety of coordination problems such as barrier synchronization =-=[40]-=-, index distribution, shared program counters [41] and the design of concurrent data structures such as queues and stacks (see also [19, 22, 47]). In its purest form, a counter is an object that holds... |

402 | Scheduling multithreaded computations by work stealing - BLUMOFE, LEISERSON - 1994 |

366 | Hierarchical correctness proofs for distributed algorithms
- LYNCH, TUTTLE
- 1987
(Show Context)
Citation Context ... Trees Count This section contains a formal proof that a counting tree's outputs will achieve the desired step property in any quiescent state. Our formal model for multiprocessor computation follows =-=[7, 36]-=-. First a formal description of a balancer is given, then it is shown that any Binary counting tree counts, that is, its outputs have the step property. Let the state of a balancer at a given time be ... |

320 | A Methodology for Implementing Highly Concurrent Data Objects - Herlihy - 1993 |

227 | PROTEUS: A High-Performance Parallel-Architecture Simulator,” MIT
- Brewer, Dellarocas, et al.
- 1991
(Show Context)
Citation Context ...isions" of a combining tree. We compared the performance of diffracting trees to the above methods in simulated shared memory and message passing environments. The Proteus Parallel Hardware Simul=-=ator [10, 11]-=- of Brewer, Dellarocas, Colbrook and Weihl was used to evaluate performance in a shared memory architecture similar to the Alewife machine of Agarwal, Chaiken, Johnson, Krantz, Kubiatowicz, Kurihara, ... |

219 |
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors
- Anderson
- 1990
(Show Context)
Citation Context ...g need to develop effective software-based counting methods. The simplest way to implement a counter is to place it in a spin-lock protected critical section, adding an exponential-back-off mechanism =-=[1, 6, 23]-=- or a queue lock as devised by Anderson [6] and Mellor-Crummey and Scott [40] to reduce contention [20, 49]. Unfortunately, such centralized methods are inherently non-parallel and cannot hope to scal... |

212 |
hot spot’ contention and combining in multistage interconnection networks
- Pfister, Norton
- 1985
(Show Context)
Citation Context ...s that it can support the same kind of throughput to w independent counters with much 2 lower latency. However, it seems that we are back to square one since the root of the tree will be a "hot s=-=pot" [20, 42]-=- and a sequential bottleneck that is no better than a centralized counter implementation. This would indeed be true if one were to use the accepted (counting network) implementation of a balancer -- a... |

200 | LimitLESS Directories: A Scalable Cache Coherence Scheme - Chaiken, Kubiatowicz, et al. - 1991 |

144 | Efficient synchronization primitives for large-scale cache-coherent multiprocessors
- Goodman, Vernon, et al.
- 1989
(Show Context)
Citation Context ... contention on memory and interconnect, and are parallel, and thus allow many requests to be dealt with concurrently. The combining trees of Yew, Tzeng, and Lawrie [49] and Goodman, Vernon, and Woest =-=[21]-=-, and the counting networks of Aspnes, Herlihy, and Shavit [7], both meet the above criteria, and indeed were found to be the most effective methods for concurrent counting in software. A combining tr... |

143 | The MIT Alewife machine: A large-scale distributed-memory multiprocessor
- Agarwal, Chaiken, et al.
- 1991
(Show Context)
Citation Context ...olbrook and Weihl was used to evaluate performance in a shared memory architecture similar to the Alewife machine of Agarwal, Chaiken, Johnson, Krantz, Kubiatowicz, Kurihara, Lim, Maa, and Nussbaumet =-=[3]-=-. Netsim, part of the Rice Parallel Processing Testbed [15, 29] developed by Covington, Dwarkadas, Jump, Sinclair, and Madala was used for testing in message passing architectures. We found that, in s... |

103 | Odd-Even Counting Networks
- Busch, Mavronicolas
(Show Context)
Citation Context ...nd the latency in traversing them is a high O(log 2 w). There is a wide body of theoretical research analyzing the performance of counting networks and attempting to improve on their O(log 2 w) depth =-=[2, 5, 7, 12, 13, 18, 27, 32, 33]. The most-=- effective is the elegant combinatorial design due to Klugerman and Plaxton [32, 33] of depth close to O(log w). Unfortunately, the "exponentially large" constants involved make these constr... |

99 |
Distributed Hot-Spot Addressing in Large-Scale Multiprocessors
- Yew, Tzeng, et al.
- 1987
(Show Context)
Citation Context ...ace it in a spin-lock protected critical section, adding an exponential-back-off mechanism [1, 6, 23] or a queue lock as devised by Anderson [6] and Mellor-Crummey and Scott [40] to reduce contention =-=[20, 49]-=-. Unfortunately, such centralized methods are inherently non-parallel and cannot hope to scale well. This is true also of hardware supported fetch-and-increment operations unless the hardware itself e... |

94 | Sparcle: An evolutionary processor design for large-scale multiprocessors
- Agarwal, Kubiatowicz, et al.
- 1993
(Show Context)
Citation Context ... 4 To illustrate this property, consider an execution in which tokens traverse the tree sequentially, one completely after the other. The left-hand side of Figure 1 shows such an execution on a Binary=-=[4]-=- type counting tree (width 4) which we define formally below. As can be seen, the network moves input tokens to output wires in increasing order modulo w. Balancing trees having this property are call... |

93 |
Synchronization algorithms for shared-memory multiprocessors
- Granunke, Thakkar
- 1990
(Show Context)
Citation Context ...g need to develop effective software-based counting methods. The simplest way to implement a counter is to place it in a spin-lock protected critical section, adding an exponential-back-off mechanism =-=[1, 6, 23]-=- or a queue lock as devised by Anderson [6] and Mellor-Crummey and Scott [40] to reduce contention [20, 49]. Unfortunately, such centralized methods are inherently non-parallel and cannot hope to scal... |

89 | Basic techniques for the efficient coordination of very large numbers of cooperating sequential processors
- Gottlieb, Lubachevsky, et al.
- 1983
(Show Context)
Citation Context ...ariety of coordination problems such as barrier synchronization [40], index distribution, shared program counters [41] and the design of concurrent data structures such as queues and stacks (see also =-=[19, 22, 47]-=-). In its purest form, a counter is an object that holds an integer value and provides a fetch-and-increment operation, incrementing the counter and returning its previous value. Given that the majori... |

77 | A simple load balancing scheme for task allocation in parallel machines
- RUDOLPH, SLIVKIN-ALLALOUF, et al.
(Show Context)
Citation Context ... Cray T3D [48]. A recent paper by Shavit and Touitou [44] introduces "Elimination Trees," a new form of Diffracting trees that can be used to create highly parallel producer/consumer pools a=-=nd stacks [38, 43]-=-. The algorithms provide superior response (on average just a few machine instructions) under high loads with a guaranteed logarithmic (in w) number of steps under sparse request patterns. On the more... |

77 | Decentralized cache scheme for an MIMD parallel processor - Rudolph - 1983 |

62 | Contention in shared memory algorithms
- Dwork, Herlihy, et al.
- 1997
(Show Context)
Citation Context ...nd prism width in a non empirical way. It would also be interesting to formally analyze diffracting tree behavior using newly developed models of contention such as that of Dwork, Herlihy, and Waarts =-=[17]-=-. Finally, it would be interesting to extend the use of diffraction to other forms of counting networks such as those of Felten, LaMarca, and Ladner [18], Aiello, Venkatesan, and Yung [5], and Busch a... |

61 | Advanced Computer Architecture - Hwang - 1993 |

56 | S.: The efficient simulation of parallel computer systems
- Covington, Dwarkada, et al.
- 1991
(Show Context)
Citation Context ...ared memory architecture similar to the Alewife machine of Agarwal, Chaiken, Johnson, Krantz, Kubiatowicz, Kurihara, Lim, Maa, and Nussbaumet [3]. Netsim, part of the Rice Parallel Processing Testbed =-=[15, 29]-=- developed by Covington, Dwarkadas, Jump, Sinclair, and Madala was used for testing in message passing architectures. We found that, in shared-memory systems, diffracting trees substantially outperfor... |

53 |
Adaptive Backoff Synchronization Techniques
- Agarwal, Cherian
- 1989
(Show Context)
Citation Context ...g need to develop effective software-based counting methods. The simplest way to implement a counter is to place it in a spin-lock protected critical section, adding an exponential-back-off mechanism =-=[1, 6, 23]-=- or a queue lock as devised by Anderson [6] and Mellor-Crummey and Scott [40] to reduce contention [20, 49]. Unfortunately, such centralized methods are inherently non-parallel and cannot hope to scal... |

52 | A software instruction counter - Mellor-Crummey, LeBlanc |

50 | Reactive synchronization algorithms for multiprocessors, in
- Lim, Agarwal
- 1994
(Show Context)
Citation Context ...idth is optimal increases with tree size, the wider tree can usually be used without fear. Also, the application of an adaptive scheme for changing diffracting tree size "on the fly" (see fo=-=r example [37]-=-) will most likely not result in frequent changes among different width trees. In summary, diffracting trees scale substantially better than the other methods tested as they have small depth and enjoy... |

44 | A Dynamic Distributed Load Balancing Algorithm with Provable Good Performance - Luling, Monien - 1993 |

43 | Counting Networks and Multi-Processor Coordination
- Aspnes, Herlihy, et al.
- 1991
(Show Context)
Citation Context ...hus allow many requests to be dealt with concurrently. The combining trees of Yew, Tzeng, and Lawrie [49] and Goodman, Vernon, and Woest [21], and the counting networks of Aspnes, Herlihy, and Shavit =-=[7]-=-, both meet the above criteria, and indeed were found to be the most effective methods for concurrent counting in software. A combining tree is a distributed binary-tree based data structure with a sh... |

42 | Elimination trees and the construction of pools and stacks
- Shavit, Touitou
- 1995
(Show Context)
Citation Context ...ine is due to become operational in 1996. We are also developing a version of diffracting trees for non-coherent shared memory machines such as the Cray T3D [48]. A recent paper by Shavit and Touitou =-=[44] introduce-=-s "Elimination Trees," a new form of Diffracting trees that can be used to create highly parallel producer/consumer pools and stacks [38, 43]. The algorithms provide superior response (on av... |

39 | Small-Depth Counting Networks
- Klugerman, Plaxton
- 1992
(Show Context)
Citation Context ...nd the latency in traversing them is a high O(log 2 w). There is a wide body of theoretical research analyzing the performance of counting networks and attempting to improve on their O(log 2 w) depth =-=[2, 5, 7, 12, 13, 18, 27, 32, 33]. The most-=- effective is the elegant combinatorial design due to Klugerman and Plaxton [32, 33] of depth close to O(log w). Unfortunately, the "exponentially large" constants involved make these constr... |

38 | Counting Networks with Arbitrary Fan-Out - Aharonson, Attiya |

30 | Process Coordination with Fetch-and-Increment
- Freudenthal, Gottlieb
- 1991
(Show Context)
Citation Context ...ariety of coordination problems such as barrier synchronization [40], index distribution, shared program counters [41] and the design of concurrent data structures such as queues and stacks (see also =-=[19, 22, 47]-=-). In its purest form, a counter is an object that holds an integer value and provides a fetch-and-increment operation, incrementing the counter and returning its previous value. Given that the majori... |

28 | Performance of Spin lock alternatives for shared memory multiprocessors - Anderson - 1990 |

25 |
Processing ‘hot spots’ in high performance systems
- Gawlick
- 1985
(Show Context)
Citation Context ...ace it in a spin-lock protected critical section, adding an exponential-back-off mechanism [1, 6, 23] or a queue lock as devised by Anderson [6] and Mellor-Crummey and Scott [40] to reduce contention =-=[20, 49]-=-. Unfortunately, such centralized methods are inherently non-parallel and cannot hope to scale well. This is true also of hardware supported fetch-and-increment operations unless the hardware itself e... |

25 | Hot-Spot" Contention and Combining in Multistage Interconnection Networks - Pfister, Norton - 1985 |

24 | Scalable concurrent counting - Herlihy, Lim, et al. - 1995 |

23 | A Combinatorial Treatment of Balancing Networks
- Busch, Mavronicolas
- 1994
(Show Context)
Citation Context ...nd the latency in traversing them is a high O(log 2 w). There is a wide body of theoretical research analyzing the performance of counting networks and attempting to improve on their O(log 2 w) depth =-=[2, 5, 7, 12, 13, 18, 27, 32, 33]. The most-=- effective is the elegant combinatorial design due to Klugerman and Plaxton [32, 33] of depth close to O(log w). Unfortunately, the "exponentially large" constants involved make these constr... |

21 | Building Counting Networks from Larger Balancers
- Felten, LaMarca, et al.
- 1993
(Show Context)
Citation Context |

19 | The NYU Ultracomputer designing an MIMD parallel computer - Gottlieb, Grishman, et al. - 1984 |

16 | Low contention linearizable counting, in - Herlihy, Shavit, et al. - 1991 |

16 | A steady state analysis of diffracting trees - Shavit, Upfal, et al. - 1998 |

16 | Coins, Weights and Contention in Balancing Networks - Aiello, Venkatesan, et al. - 1994 |

16 | E cient Synchronization Primitives for Large-Scale Cache-Coherent Multiprocessors - Goodman, Vernon, et al. - 1989 |

14 | Basic Techniques for the E cient Coordination of Very Large - Gottleib, Lubachevsky, et al. - 1983 |

13 |
On maintaining dynamic information in a concurrent environment
- Manber
- 1986
(Show Context)
Citation Context ... Cray T3D [48]. A recent paper by Shavit and Touitou [44] introduces "Elimination Trees," a new form of Diffracting trees that can be used to create highly parallel producer/consumer pools a=-=nd stacks [38, 43]-=-. The algorithms provide superior response (on average just a few machine instructions) under high loads with a guaranteed logarithmic (in w) number of steps under sparse request patterns. On the more... |

11 |
weights and contention in balancing networks
- Coins
- 1994
(Show Context)
Citation Context |

11 |
Database applications of the fetch-and-add instruction
- Stone
- 1984
(Show Context)
Citation Context ...ariety of coordination problems such as barrier synchronization [40], index distribution, shared program counters [41] and the design of concurrent data structures such as queues and stacks (see also =-=[19, 22, 47]-=-). In its purest form, a counter is an object that holds an integer value and provides a fetch-and-increment operation, incrementing the counter and returning its previous value. Given that the majori... |

10 | T3D system architecture overview - Research, “CRAY - 1993 |