Results 1 -
2 of
2
Graph Algorithms for Multicores with Multilevel Caches
, 2009
"... Historically, the primary model of computation employed in the design and analysis of algorithms has been the sequential RAM model. However, recent developments in computer architecture have reduced the efficacy of the sequential RAM model for algorithmic development. In response, theoretical comput ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Historically, the primary model of computation employed in the design and analysis of algorithms has been the sequential RAM model. However, recent developments in computer architecture have reduced the efficacy of the sequential RAM model for algorithmic development. In response, theoretical computer scientists have developed models of computation which better reflect these modern architectures. In this project, we consider a variety of graph problems on parallel, cache-efficient, and multicore models of computation. We introduce each model by defining the analysis of algorithms on these models. Then, for each model, we present current results for the problems of prefix sums, list ranking, various tree problems, connected components, and minimum spanning tree. Finally, we present our novel results, which include the multicore oblivious extension of current results on a private cache multicore model to a more general multilevel multicore
Efficient Scheduling for Parallel Memory Hierarchies (Regular Submission)
"... This paper presents a scheduling algorithm for efficiently implementing nested-parallel computations on parallel memory hierarchies (trees of caches). To capture the cache cost of nested-parallel computations we introduce a parallel version of the ideal cache model. In the model algorithms can be wr ..."
Abstract
- Add to MetaCart
This paper presents a scheduling algorithm for efficiently implementing nested-parallel computations on parallel memory hierarchies (trees of caches). To capture the cache cost of nested-parallel computations we introduce a parallel version of the ideal cache model. In the model algorithms can be written cache obliviously (no choices are made based on machine parameters) and analyzed using a single level of cache with parameters Z (cache size) and L (cache line size), and a parameter α specifying the algorithm’s parallelism (for input size n, n α represents the number of processors that can be effectively used). For several fundamental algorithms we show that the cache cost in the parallel ideal cache model is optimal, matching the sequential bounds, with a parallelism α → 1. For example, for cache-oblivious sorting of n keys, the cache cost is Q ∗ (n; Z, L) = Θ((n/L)log Z+2 n). Our scheduler guarantees that the number of misses across all caches at each level i of the machine’s hierarchy is at most the cache cost Q ∗ (n; Zi/3, Li) as analyzed for an algorithm. Machine hierarchies are modeled as trees of caches using a symmetric variant of the parallel memory hierarchy (PMH) model. In this model, every cache at level i is of size Zi, has line size Li, transfer cost Ci (the cost of fetching a line of data from its parent cache at level i + 1), and child fanout fi. Each leaf node (level 0) is a processor, with parameters set so that its cost corresponds to the processor’s work (i.e., its instruction count). Finally, we show that if the algorithm parallelism exceeds the machine parallelism (as defined in the paper) the work is balanced including the cost of cache misses. In particular for an h-level memory hierarchy, our scheduler guarantees a total runtime of T(n) = O ( ∑h−1 i=0 Ci ̂ Qα(n; Zi/3, Li)

