## Graph Algorithms for Multicores with Multilevel Caches (2009)

Citations: | 1 - 0 self |

### BibTeX

@MISC{Blakeley09graphalgorithms,

author = {Brandon Blakeley},

title = {Graph Algorithms for Multicores with Multilevel Caches},

year = {2009}

}

### OpenURL

### Abstract

Historically, the primary model of computation employed in the design and analysis of algorithms has been the sequential RAM model. However, recent developments in computer architecture have reduced the efficacy of the sequential RAM model for algorithmic development. In response, theoretical computer scientists have developed models of computation which better reflect these modern architectures. In this project, we consider a variety of graph problems on parallel, cache-efficient, and multicore models of computation. We introduce each model by defining the analysis of algorithms on these models. Then, for each model, we present current results for the problems of prefix sums, list ranking, various tree problems, connected components, and minimum spanning tree. Finally, we present our novel results, which include the multicore oblivious extension of current results on a private cache multicore model to a more general multilevel multicore

### Citations

636 |
An Introduction to Parallel Algorithms
- J'aJ'a
- 1992
(Show Context)
Citation Context ...tific computing applications have introduced 4parallelism as a tool for faster computation. In response, theoretical computer scientists have developed the PRAM (parallel random access memory) model =-=[19, 18]-=- as a compromise between simplicity and realism for modeling parallel computation. In the PRAM model, we have multiple processors which share access to a main memory and we are interested in the numbe... |

272 | Parallel prefix computation
- Ladner, Fischer
- 1980
(Show Context)
Citation Context ... with a binary operation denoted by +. We define the i th partial sum si of such a sequence to be si = x1 + x2 + · · · + xi. The prefix sums of such a sequence, originally posed by Ladner and Fischer =-=[20]-=-, is the n partial sums of the sequence. There is a trivial sequential algorithm which recursively computes si using the fact that si = si−1 + xi for 2 ≤ i ≤ n and hence takes O(n) time. Consider a li... |

108 | Deterministic coin tossing with applications to optimal list ranking - Cole, Vishkin - 1986 |

64 | Cache-oblivious priority queue and graph algorithm applications - Arge, Bender, et al. - 2002 |

63 |
Efficient parallel algorithms for some graph problems
- Lam, Chen
- 1982
(Show Context)
Citation Context ...mponents and minimum spanning tree. This algorithm is an adaptation of the single-processor external memory algorithm of Chiang et al. [8], which in turn is based on the PRAM algorithm of Chin et al. =-=[9]-=-. This multicore algorithm follows the same strategy as the external memory algorithm, but instead using the appropriate multicore subroutines and concluding the recursion with a PRAM optimal algorith... |

48 |
The complexity of parallel computation
- Wyllie
- 1979
(Show Context)
Citation Context ...nd hence takes O(n) time. Consider a linked list L of n nodes. We define the rank r(i) of a node i to be its distance from the end of the list. The list ranking problem, originally posed by Wyllie in =-=[22]-=-, is to determine the ranks of every node in a list. The linked list L is typically represented by a successor array S where S(i) contains a pointer to the next node following node i in L. We addition... |

47 |
Computing connected components on parallel computers
- Chandra, Sarwate
- 1979
(Show Context)
Citation Context ...) and (p(v),v) since each arc between these two arcs is in the subtree rooted at v. 2.6 Connected Components In this section, we describe an algorithm as presented by Hirschberg, Chandra, and Sarwate =-=[17]-=- for connected components, assuming the input is provided as an adjacency list. We recursively solve the problem as follows. For each node, we select the edge incident to the smallest ordered vertex. ... |

45 |
Deterministic parallel list ranking
- Anderson, Miller
- 1988
(Show Context)
Citation Context ...mic Time Algorithm The shortcoming of the algorithm presented in the previous section comes from the expensive contraction operations. In this section, we present a method, due to Anderson and Miller =-=[2]-=-, to shrink the list of n nodes to O(n/ log n) nodes without the need to recursively contract out nodes in the list. At a high level, their strategy is to divide the array S into n/ log n contiguous s... |

41 |
Faster optimal parallel prefix sums and list ranking
- Cole, Vishkin
- 1989
(Show Context)
Citation Context ... to O(n/ log n). In fact, we can do so by recursively constructing a reduced list by contracting the nodes of an independent set of size at least n/c for some c>1, using ideas due to Cole and Vishkin =-=[14]-=-. We first describe their algorithm abstractly, and then discuss how to construct an independent set using nearly constant time and linear work. List Ranking by Contraction (1) Create reduced list by ... |

34 | Provably good multicore cache performance for divide-and-conquer algorithms - Blelloch, Chowdhury, et al. - 2008 |

32 | Effectively sharing a cache among threads
- Blelloch, Gibbons
- 2004
(Show Context)
Citation Context ...Multicores Alongside the introduction of chip multiprocessors (or multicores) has been the development of theoretical frameworks to maximally exploit these emerging architectures. A variety of papers =-=[11, 6, 12, 16, 7, 10]-=- have begun to realize this goal. Initial approaches focused on the development of schedulers with provably good performance [6]. Chowdhury and Ramachandran [10] introduced the evaluation of parallel ... |

31 |
and Vijaya Ramachandran, “Parallel algorithms for shared-memory machines
- Karp
- 1990
(Show Context)
Citation Context ...tific computing applications have introduced 4parallelism as a tool for faster computation. In response, theoretical computer scientists have developed the PRAM (parallel random access memory) model =-=[19, 18]-=- as a compromise between simplicity and realism for modeling parallel computation. In the PRAM model, we have multiple processors which share access to a main memory and we are interested in the numbe... |

22 | Oblivious algorithms for multicores and network of processors
- Chowdhury, Silvestri, et al.
- 2010
(Show Context)
Citation Context ...Multicores Alongside the introduction of chip multiprocessors (or multicores) has been the development of theoretical frameworks to maximally exploit these emerging architectures. A variety of papers =-=[11, 6, 12, 16, 7, 10]-=- have begun to realize this goal. Initial approaches focused on the development of schedulers with provably good performance [6]. Chowdhury and Ramachandran [10] introduced the evaluation of parallel ... |

22 | The cache complexity of multithreaded cache oblivious algorithms
- Frigo, Strumpen
- 2006
(Show Context)
Citation Context ...Multicores Alongside the introduction of chip multiprocessors (or multicores) has been the development of theoretical frameworks to maximally exploit these emerging architectures. A variety of papers =-=[11, 6, 12, 16, 7, 10]-=- have begun to realize this goal. Initial approaches focused on the development of schedulers with provably good performance [6]. Chowdhury and Ramachandran [10] introduced the evaluation of parallel ... |

18 |
The I/O complexity of sorting and related problems
- Aggarwal, Vitter
- 1987
(Show Context)
Citation Context ...n of the running time is the memory access time. Initially, external memory algorithms were developed for certain applications with large datasets, such as sorting checks by account numbers for banks =-=[1]-=-. More recently, however, with the development of cache hierarchies, the benefit of these algorithms extends to more mainstream applications and architectures, as L1 caches behave essentially the same... |

12 | Cache-oblivious algorithms. Extended abstract submitted for publication
- Frigo, Leiserson, et al.
- 1999
(Show Context)
Citation Context ...emory transfers between two adjacent cache levels. However, with multiple cache levels, algorithm design tuned to cache parameters becomes cumbersome and rarely portable. A result due to Frigo et al. =-=[15]-=- states that, if we design good cache efficient algorithms without reference to the particular parameters of memory hierarchy, then this cache-oblivious algorithm will perform well on any multilevel m... |

8 | Parallel external memory graph algorithms
- Arge, Goodrich, et al.
- 2009
(Show Context)
Citation Context ... that the PRAM prefix-sums algorithm (with a reasonable scheduler) is also multicore efficient (and so, multicore oblivious). Next, we extend the list ranking and tree problems results of Arge et al. =-=[3]-=- from the private external memory multicore model to the multilevel multicore model. Finally, we present algorithms solving the connected components and minimum spanning tree problems on the multileve... |

8 | The cacheoblivious Gaussian elimination paradigm: theoretical framework, parallelization and experimental evaluation
- Chowdhury, Ramachandran
- 2007
(Show Context)
Citation Context |

4 |
Nodari Sitchinava. Fundamental parallel algorithms for private-cache chip multiprocessors
- Arge, Goodrich, et al.
- 2008
(Show Context)
Citation Context ... processor speeds decelerate, multicore architectures have been introduced to restore the performance improvements to its historical orders of magnitude. Consequently, a variety of theoretical models =-=[6, 5]-=- have been presented to fully realize these advances in computer architecture. In this section, we discuss multicore models and we describe a number of graph algorithm results from [5] for the private... |