## Communication-Efficient Parallel Algorithms for Distributed Random-Access Machines (1988)

Venue: | Algorithmica |

Citations: | 38 - 2 self |

### BibTeX

@ARTICLE{Leiserson88communication-efficientparallel,

author = {Charles Leiserson and Bruce M. Maggs},

title = {Communication-Efficient Parallel Algorithms for Distributed Random-Access Machines},

journal = {Algorithmica},

year = {1988},

volume = {3},

pages = {53--77}

}

### OpenURL

### Abstract

This paper introduces a model for parallel computation, called the distributed random-access machine (DRAM), in which the communication requirements of parallel algorithms can be evaluated. A DRAM is an abstraction of a parallel computer in which memory accesses are implemented by routing messages through a communication network. A DRAM explicitly models the congestion of messages across cuts of the network. We introduce the notion of a conservative algorithm as one whose communication requirements at each step can be bounded by the congestion of pointers of the input data structure across cuts of a DRAM. We give a simple lemma that shows how to "shortcut" pointers in a data structure so that remote processors can communicate without causing undue congestion. We give O(lg n)-step, linear-processor, linear-space, conservative algorithms for a variety of problems on n- node trees, such as computing treewalk numberings, finding the separator of a tree, and evaluating all subexpressions ...

### Citations

717 |
A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations
- CHERNOFF
- 1952
(Show Context)
Citation Context ...ber of leaves are nearly equal. The second lemma provides an elementary bound on the expectation of a discrete random variable with a finite upper bound. The last lemma presents a Chernoff-type bound =-=[4]-=- on the tail of a binomial distribution. Lemma 0.6 Suppose T = (V; E) is a rooted binary tree, and let V 0 , V 1 and V 2 denote the sets of nodes in T (excluding the root), with zero, one, or two chil... |

603 | Data Structures and Networks Algorithms - Tarjan - 1983 |

513 | Tarjan, “A new approach to the maximum-flow problem
- Goldberg, E
- 1988
(Show Context)
Citation Context ... is certainly conservative since if M is a subset of M 0 , then we have load(M)sload(M 0 ). For example, synchronous distributed algorithms, such as the network flow algorithms of Goldberg and Tarjan =-=[8, 9]-=-, are conservative for this reason. We do not wish to restrict our attention to this limited class of conservative algorithms because synchronous distributed algorithms cannot efficiently solve certai... |

272 | Parallel prefix computation
- Ladner, Fischer
- 1980
(Show Context)
Citation Context ...inearspace, conservative "tree contraction" algorithm based on the ideas of Miller and Reif [22]. Section 6 presents treefix computations, which are generalizations of the parallel prefix co=-=mputation [3, 7, 23]-=- to trees. We show that treefix computations can be performed using the tree contraction algorithm of Section 5. Section 7 gives short, efficient, parallel algorithms for tree and graph problems, most... |

238 | The parallel evaluation of general arithmetic expressions
- Brent
- 1974
(Show Context)
Citation Context ...ternal node has an operator from f+; \Gamma; \Delta; \Xig, com24 pute for each internal node the subexpression rooted at that node. A single leaffix-like computation suffices using the ideas of Brent =-=[2]-=- and Miller and Reif [22]. Performance: O(lg n). Minimum-cost spanning forest. Given an undirected input graph G = (V; E) and a cost function w : E ! R, determine a set F ` E of edges such that each v... |

225 |
Fat-trees: Universal networks for hardware-efficient supercomputing
- Leiserson
- 1985
(Show Context)
Citation Context ...ation bandwidth in the underlying network. In a communication network, we can measure the cost of communication in terms of the number of messages that must cross a cut of the network, as in [10] and =-=[18]-=-. Specifically, a cut S of a network 1 is a subset of the nodes of the network. The capacity cap(S) is the number of wires connecting processors in S with processors in the rest of the network S, i.e.... |

212 |
hot spot’ contention and combining in multistage interconnection networks
- Pfister, Norton
- 1985
(Show Context)
Citation Context ...d by the processors. The generalization to the case when processors, memories, and switches are distinct entities is straightforward, but complicates the definitions. identified by Pfister and Norton =-=[24]-=-. When many processors send messages to a single other processor, large delays can be experienced as messages queue for access to that other processor. In this situation, the load factor on the cut th... |

207 |
A scheme for fast parallel communication
- Valiant
- 1982
(Show Context)
Citation Context ...c factor as an upper bound on many networks, including volume and areauniversal networks, such as fat-trees [10, 18], as well as the standard universal routing networks, such as the Boolean hypercube =-=[29]-=-. The lower bound is weak on the standard universal routing networks because every cut of these networks is large relative to the number of processors in the smaller side of the cut, but these network... |

181 | A regular layout for parallel adders
- Brent, Kung
- 1982
(Show Context)
Citation Context ...inearspace, conservative "tree contraction" algorithm based on the ideas of Miller and Reif [22]. Section 6 presents treefix computations, which are generalizations of the parallel prefix co=-=mputation [3, 7, 23]-=- to trees. We show that treefix computations can be performed using the tree contraction algorithm of Section 5. Section 7 gives short, efficient, parallel algorithms for tree and graph problems, most... |

163 |
Tarjan, Applications of a planar separator theorem
- Lipton, E
- 1980
(Show Context)
Citation Context ...side of the edge. For each incidence ring, compute the maximum of these values. A vertex with the minimum of these maximum values is a centroid. Performance: O(lg n). Separator of a tree. A separator =-=[20]-=- is a partition of the vertices of an n-vertex tree into three sets A, B, and C, with jAjs2 3 n, jBj = 1, and jCjs2 3 n, such that no edge of the tree goes between a vertex in A and a vertex in C. Det... |

132 | A framework for solving VLSI graph layout problems
- Bhatt, Leighton
- 1984
(Show Context)
Citation Context ...its mate. The use of spare nodes allows the algorithm to distribute the space for the internal nodes of the contraction tree uniformly over the elements in the list. (Spare internal nodes are used in =-=[1]-=- and [17] for similar reasons, but in a different context.) We now describe the operation of Algorithm LC, which is illustrated in Figure 4 for the example of Figure 3. (A description in pseudocode ca... |

120 |
Parallel tree contraction and its applications
- Miller, Reif
- 1985
(Show Context)
Citation Context ... can be used to perform many of the same functions as on lists as recursive doubling. Section 5 presents a linearspace, conservative "tree contraction" algorithm based on the ideas of Miller=-= and Reif [22]-=-. Section 6 presents treefix computations, which are generalizations of the parallel prefix computation [3, 7, 23] to trees. We show that treefix computations can be performed using the tree contracti... |

118 | The Art of Computer Programming, Volume 1 - Knuth - 1998 |

116 |
An O(log n) parallel connectivity algorithm
- SHILOACH, VISHKIN
- 1982
(Show Context)
Citation Context ...d(i) / d(i) + d(p(i)). In a PRAM model, the running time on a list of length n is O(lg n). Variants of this technique are used for path compression, vertex numbering, and parallel prefix computations =-=[22, 25, 27, 30]-=-. We now show that recursive doubling can be expensive even when a data structure has a good embedding in a network. Figure 1 shows a cut of capacity 3 separating the two halves of a linked list of 16... |

105 | A complexity theory for VLSI
- Thompson
- 1980
(Show Context)
Citation Context ...pinboundedness of a region is measured by its surface area. In this model, the largest universal network that can fit in a given volume V has only about V 2=3 nodes. In the two-dimensional VLSI model =-=[28]-=-, where pinboundedness is measured by perimeter, the bound is even worse. Since the density of processors in a physical implementation of a universal network is low, it is natural to wonder whether th... |

82 |
Systolic Arrays (for VLSI
- Kung, Leiscrson
- 1978
(Show Context)
Citation Context ...h a good separator theorem [20] exists can be embedded well. Examples include meshes, trees, planar graphs, and multigrids. Situations in which a mesh might be used include systolic array computation =-=[15, 17]-=- and image processing. Planar graphs and multigrids arise from the solution of sparse linear systems of equations based on the finite-element method. Consequently, conservative DRAM algorithms operati... |

68 |
Solving minimum-cost flow problems by successive approximations, extended abstract, submitted to STOC 87
- Goldberg
- 1986
(Show Context)
Citation Context ... is certainly conservative since if M is a subset of M 0 , then we have load(M)sload(M 0 ). For example, synchronous distributed algorithms, such as the network flow algorithms of Goldberg and Tarjan =-=[8, 9]-=-, are conservative for this reason. We do not wish to restrict our attention to this limited class of conservative algorithms because synchronous distributed algorithms cannot efficiently solve certai... |

56 |
Parallel Hashing --- an Efficient Implementation of Shared Memory
- Karlin, Upfal
- 1986
(Show Context)
Citation Context ...niversal networks, such as the Boolean hypercube [29]. Universal networks are capable of simulating any PRAM program with at most polylogarithmic degradation in time (see, for example, the simulation =-=[12]-=- of an EREW-PRAM on a butterfly network). In light of this work, one might wonder why the DRAM model should be studied at all. A potential problem with universal networks is that they may be difficult... |

55 |
Deterministic coin tossing and accelerating cascades: micro and macro techniques for designing parallel algorithms
- Cole, Vishkin
- 1986
(Show Context)
Citation Context ...the height of the contraction tree and the number of steps on a DRAM are both O(lg n), where n is the number of elements in the input list. A deterministic variant based on deterministic coin tossing =-=[5]-=- runs in O(lg n lg m) steps, where m is the number of processors in the DRAM, and produces a contraction tree of height O(lg n). The recursive pairing strategy is illustrated in Figure 3 for a list (A... |

51 | Randomized routing on fat-trees
- Greenberg, Leiserson
- 1989
(Show Context)
Citation Context ... communication bandwidth in the underlying network. In a communication network, we can measure the cost of communication in terms of the number of messages that must cross a cut of the network, as in =-=[10]-=- and [18]. Specifically, a cut S of a network 1 is a subset of the nodes of the network. The capacity cap(S) is the number of wires connecting processors in S with processors in the rest of the networ... |

49 |
The complexity of parallel computations
- Wyllie
- 1979
(Show Context)
Citation Context ..., again a situation that can be modeled by a DRAM. In the second situation, the congestion is produced by an algorithm. As an example, consider the "recursive doubling" or "pointer jump=-=ing" technique [30]-=- used extensively by PRAM algorithms in the literature. The idea is that each element i of a list initially has a pointer p(i) to the next element in the list. At each step, element i computes p(i) / ... |

47 |
Steele Jr., \Data Parallel Algorithms
- Hillis, L
- 1986
(Show Context)
Citation Context ...ing volume-universal networks such as fat-trees [18]. A natural way to embed a data structure in a DRAM is to put one record of the data structure into each processor, as in the "data parallel&qu=-=ot; model [11]-=-. The record can contain data, including pointers to records in other processors. We measure the quality of an embedding by treating the data structure as a set of pointers and generalizing the concep... |

41 |
Three-Dimensional Circuit Layouts
- Leighton, Rosenberg
- 1986
(Show Context)
Citation Context ... full complement of processors, then pin limitations preclude the universal network from being assembled. The impact of pin constraints can be modeled theoretically in the threedimensional VLSI model =-=[16, 18]-=- where hardware cost is measured by volume and the pinboundedness of a region is measured by its surface area. In this model, the largest universal network that can fit in a given volume V has only ab... |

40 |
Area-Efficient VLSI Computation
- Leiserson
- 1982
(Show Context)
Citation Context .... The use of spare nodes allows the algorithm to distribute the space for the internal nodes of the contraction tree uniformly over the elements in the list. (Spare internal nodes are used in [1] and =-=[17]-=- for similar reasons, but in a different context.) We now describe the operation of Algorithm LC, which is illustrated in Figure 4 for the example of Figure 3. (A description in pseudocode can be foun... |

40 |
Finding Biconnected Components and Computing Tree Functions in Logarithmic Parallel Time
- Tarjan, Vishkin
- 1985
(Show Context)
Citation Context ...d(i) / d(i) + d(p(i)). In a PRAM model, the running time on a list of length n is O(lg n). Variants of this technique are used for path compression, vertex numbering, and parallel prefix computations =-=[22, 25, 27, 30]-=-. We now show that recursive doubling can be expensive even when a data structure has a good embedding in a network. Figure 1 shows a cut of capacity 3 separating the two halves of a linked list of 16... |

32 |
On the algorithmic complexity of discrete functions
- Ofman
- 1963
(Show Context)
Citation Context ...inearspace, conservative "tree contraction" algorithm based on the ideas of Miller and Reif [22]. Section 6 presents treefix computations, which are generalizations of the parallel prefix co=-=mputation [3, 7, 23]-=- to trees. We show that treefix computations can be performed using the tree contraction algorithm of Section 5. Section 7 gives short, efficient, parallel algorithms for tree and graph problems, most... |

31 |
Computer recreations
- Dewdney
- 1985
(Show Context)
Citation Context ...hest leaf from the root. Reroot the tree at this leaf. The distance from the new root to the farthest leaf is the diameter. (This algorithm is based on an analog algorithm attributed to J. Wennmacker =-=[6]-=-.) A center of the tree can be determined by finding a median element of the path that realizes the diameter. Performance: O(lg n). Centroid of a tree. A centroid is a vertex v such that the largest s... |

12 | Communication-efficient parallel graph algorithms for distributed random-access machines - Leiserson, Maggs - 1988 |

3 |
Locality in Parallel Computation
- Maggs
- 1989
(Show Context)
Citation Context ...estion. Whereas the Shortcut Lemma presented in this paper holds for any network, for particular networks, other shortcut lemmas may hold. For example, another shortcut lemma for fat-trees is used in =-=[21]-=- to show that an optimal reordering of a linear list in a fat-tree can be determined efficiently by a conservative algorithm on the fat-tree. As a final comment, we note that the notion of a conservat... |