## Accessing Multiple Sequences Through Set Associative Caches (1999)

Venue: | In Proc |

Citations: | 19 - 4 self |

### BibTeX

@INPROCEEDINGS{Sanders99accessingmultiple,

author = {Peter Sanders},

title = {Accessing Multiple Sequences Through Set Associative Caches},

booktitle = {In Proc},

year = {1999},

pages = {655--664},

publisher = {Springer-Verlag}

}

### Years of Citing Articles

### OpenURL

### Abstract

The cache hierarchy prevalent in todays high performance processors has to be taken into account in order to design algorithms which perform well in practice. We start from the empirical observation that external memory algorithms often turn out to be good algorithms for cached memory. This is not self evident since caches have a fixed and quite restrictive algorithm choosing the content of the cache. We investigate the impact of this restriction for the frequently occurring case of access to multiple sequences. We show that any access pattern to k = \Theta(M=B ) sequential data streams can be efficiently supported on an a-way set associative cache with capacity M and line size B. The bounds are tight up to lower order terms.

### Citations

3973 |
Computer Architecture: A Quantitative Approach, 3 rd ed
- Hennessy, Patterson, et al.
- 2002
(Show Context)
Citation Context ...faults which are tight up to lower order terms for the range of inputs which allow efficient operations. Related Work Caches are intensively studied in computer architecture and compiler design (e.g. =-=[7]-=-). Evaluations are usually based on simulations. This yields useful quantitative results if traces of meaningful benchmarks are simulated. Simulations also have the advantage that interactions between... |

234 | Algorithms for parallel memory I: Two level memories
- Vitter, Shriver
- 1994
(Show Context)
Citation Context ...hms for large inputs. The general approach of this paper is to model one cache level and the main memory by the single disk single processor variant of the external memory model by Vitter and Shriver =-=[22] where M is the size of the internal memory, B is the b-=-lock transfer size, i.e., we use the word pairs "cache line" and "memory block", "cache" and "internal memory", "main memory" and "external memor... |

149 | The bu er tree: A new technique for optimal I/O-algorithms
- Arge
- 1995
(Show Context)
Citation Context ...us principle behind efficient external memory algorithms is to read or write k = O(M=B) sequential streams of data [21]. For example, k-way merge sort is based on reading and radix sort, buffer trees =-=[1]-=- or external memory list ranking [18] are based on writing k sequences. Empirically, many of these algorithms also perform well on cached memory. For example, in a study by LaMarca and Ladner [11], k-... |

112 | The influence of caches on the performance of sorting
- LaMarca, Ladner
- 1997
(Show Context)
Citation Context ...rees [1] or external memory list ranking [18] are based on writing k sequences. Empirically, many of these algorithms also perform well on cached memory. For example, in a study by LaMarca and Ladner =-=[11]-=-, k-way merging performs best among algorithms tried and even Sibeyn's quite involved external memory list ranking algorithm [18] performs better than a simple pointer chasing although the latter exec... |

68 | The Influence of Caches on the Performance of Heaps
- LaMarca, Ladner
- 1996
(Show Context)
Citation Context ...tional architectural optimizations mentioned above cannot completely hide the general structure of the cache defined by the parameters M , B and a. Simple analytical cache models have long been known =-=[15, 10]-=-. However, in these independent reference models the cache lines are assumed to be accessed in random order according to some fixed probability distribution. This assumption is not warranted for acces... |

61 | Simple randomized mergesort on parallel disks
- Barve, Grove, et al.
- 1997
(Show Context)
Citation Context ...terns. External memory algorithms are a well established branch of algorithmics [21, 20]. Our approach to randomize the starting addresses of sequences is similar to the approach used by Barve et al. =-=[2]-=- in order to efficiently use parallel disks for k-way merging. However, we do not want to bound the maximum contention but the fraction of overloaded cache sets. Furthermore, for k-way merging, a clev... |

60 |
Vitter (eds.). External Memory Algorithms
- Abello, S
- 1999
(Show Context)
Citation Context ...isfied from the internal memory. We call this model cached memory. An almost ubiquitous principle behind efficient external memory algorithms is to read or write k = O(M=B) sequential streams of data =-=[21]-=-. For example, k-way merge sort is based on reading and radix sort, buffer trees [1] or external memory list ranking [18] are based on writing k sequences. Empirically, many of these algorithms also p... |

58 |
First Draft of a Report on the EDVAC
- Neumann
- 1945
(Show Context)
Citation Context ...rms. Keywords: Set associative cache, external memory algorithm, memory hierarchy, multi merge. 1 Introduction The mainstream model of computation used by algorithm designers in the last half century =-=[13]-=- assumes a single processor with unit memory access cost. However, the mainstream computers sitting on our desktops have increasingly deviated from this model in the last decade [7--9, 12, 19]. Even w... |

53 | The 21264: A Superscalar Alpha Processor with Out-of-Order Execution - Keller - 1996 |

45 | Fast priority queues for cached memory
- Sanders
- 1999
(Show Context)
Citation Context ...nly a fraction of the instructions. We have designed an external memory priority queue based on k-way merging which performs O((I=B) log M=B I=M) I/Os for any sequence of operations with I insertions =-=[17]-=-. This algorithm is similar to previous algorithms with the same asymptotic performance [1, 3, 5, 4] yet performs at least a factor of three fewer I/Os. Running in the cache hierarchy of a workstation... |

24 |
Performance analysis of cache memories
- Rao
- 1978
(Show Context)
Citation Context ...tional architectural optimizations mentioned above cannot completely hide the general structure of the cache defined by the parameters M , B and a. Simple analytical cache models have long been known =-=[15, 10]-=-. However, in these independent reference models the cache lines are assumed to be accessed in random order according to some fixed probability distribution. This assumption is not warranted for acces... |

13 |
Random permutations on distributed, external and hierarchical memory
- Sanders
- 1998
(Show Context)
Citation Context ... parallel algorithm for generating random permutations turns out to be several times faster on a cached memory than the conventional sequential algorithm which executes only half as many instructions =-=[16]-=-. This algorithm is based on writing k sequences to memory. Unfortunately, most of these algorithms can fail miserably on set associative caches because an adversary can schedule the accesses in such ... |

11 |
From parallel to external list ranking
- Sibeyn
- 1997
(Show Context)
Citation Context ...al memory algorithms is to read or write k = O(M=B) sequential streams of data [21]. For example, k-way merge sort is based on reading and radix sort, buffer trees [1] or external memory list ranking =-=[18]-=- are based on writing k sequences. Empirically, many of these algorithms also perform well on cached memory. For example, in a study by LaMarca and Ladner [11], k-way merging performs best among algor... |

5 |
External heaps combined with effective buffering
- Fadel, Jakobsen, et al.
- 1997
(Show Context)
Citation Context ... on k-way merging which performs O((I=B) log M=B I=M) I/Os for any sequence of operations with I insertions [17]. This algorithm is similar to previous algorithms with the same asymptotic performance =-=[1, 3, 5, 4]-=- yet performs at least a factor of three fewer I/Os. Running in the cache hierarchy of a workstation the algorithm is several times faster than an optimized binary heap implementation which is empiric... |

5 | An analytical cache model
- Fricker, Robert
- 1991
(Show Context)
Citation Context ...ed probability distribution. This assumption is not warranted for accessing sequences and we will see that it can lead to wrong predictions about the impact of the associativity a. Fricker and Robert =-=[6]-=- have proposed a model for accessing sequences. However, it is limited to one particular access schedule while we allow an adversary to schedule the accesses. Furthermore, their model can only be eval... |

4 |
StÃ¸lting Brodal and Jyrki Katajainen. Worst-case efficient external-memory priority queues
- Gerth
- 1998
(Show Context)
Citation Context ... on k-way merging which performs O((I=B) log M=B I=M) I/Os for any sequence of operations with I insertions [17]. This algorithm is similar to previous algorithms with the same asymptotic performance =-=[1, 3, 5, 4]-=- yet performs at least a factor of three fewer I/Os. Running in the cache hierarchy of a workstation the algorithm is several times faster than an optimized binary heap implementation which is empiric... |

2 | Efficient priority queues in external memory. working paper
- Crauser, Ferragina, et al.
- 1997
(Show Context)
Citation Context ... on k-way merging which performs O((I=B) log M=B I=M) I/Os for any sequence of operations with I insertions [17]. This algorithm is similar to previous algorithms with the same asymptotic performance =-=[1, 3, 5, 4]-=- yet performs at least a factor of three fewer I/Os. Running in the cache hierarchy of a workstation the algorithm is several times faster than an optimized binary heap implementation which is empiric... |