Results 1  10
of
21
Fast Priority Queues for Cached Memory
 ACM Journal of Experimental Algorithmics
, 1999
"... This paper advocates the adaption of external memory algorithms to this purpose. This idea and the practical issues involved are exemplified by engineering a fast priority queue suited to external memory and cached memory that is based on kway merging. It improves previous external memory algorithm ..."
Abstract

Cited by 51 (8 self)
 Add to MetaCart
(Show Context)
This paper advocates the adaption of external memory algorithms to this purpose. This idea and the practical issues involved are exemplified by engineering a fast priority queue suited to external memory and cached memory that is based on kway merging. It improves previous external memory algorithms by constant factors crucial for transferring it to cached memory. Running in the cache hierarchy of a workstation the algorithm is at least two times faster than an optimized implementation of binary heaps and 4ary heaps for large inputs
Efficient Sorting Using Registers and Caches
 in Proceedings of the 4th Workshop on Algorithm Engineering (WAE 2000
, 2000
"... Modern computer systems have increasingly complex memory systems.Common machine models for algorithm analysis do not reflect many of the features... ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
(Show Context)
Modern computer systems have increasingly complex memory systems.Common machine models for algorithm analysis do not reflect many of the features...
Adapting Radix Sort to the Memory Hierarchy
 In ALENEX, Workshop on Algorithm Engineering and Experimentation
, 2000
"... this paper, we focus on one such: the integer sorting algorithm least signicant bit (LSB) radix sort. LSB radix sort sorts wbit integer keys with an rbit radix in O(dw=re(n+2 ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
this paper, we focus on one such: the integer sorting algorithm least signicant bit (LSB) radix sort. LSB radix sort sorts wbit integer keys with an rbit radix in O(dw=re(n+2
Efficient sorting using registers and caches
 WAE, WORKSHOP ON ALGORITHM ENGINEERING , LECTURE NOTES IN COMPUTER SCIENCE
, 2000
"... Modern computer systems have increasingly complex memory systems. Common machine models for algorithm analysis do not reflect many of the features of these systems, e.g., large register sets, lockupfree caches, cache hierarchies, associativity, cache line fetching, and streaming behavior. Inadequat ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
Modern computer systems have increasingly complex memory systems. Common machine models for algorithm analysis do not reflect many of the features of these systems, e.g., large register sets, lockupfree caches, cache hierarchies, associativity, cache line fetching, and streaming behavior. Inadequate models lead to poor algorithmic choices and an incomplete understanding of algorithm behavior on real machines. A key step toward developing better models is to quantify the performance effects of features not reflected in the models. This paper explores the effect of memory system features on sorting performance. We introduce a new cacheconscious sorting algorithm, Rmerge, which achieves better performance in practice over algorithms that are superior in the theoretical models. Rmerge is designed to minimize memory stall cycles rather than cache misses by considering features common to many system designs.
Scanning Multiple Sequences Via Cache Memory
 Algorithmica
, 2003
"... We consider the simple problem of scanning multiple sequences. There are k sequences of total length N which are to be scanned concurrently. One pointer into each sequence is maintained and an adversary specifies which pointer is to be advanced. The concept of scanning multiple sequence is ubiquitou ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
We consider the simple problem of scanning multiple sequences. There are k sequences of total length N which are to be scanned concurrently. One pointer into each sequence is maintained and an adversary specifies which pointer is to be advanced. The concept of scanning multiple sequence is ubiquitous in algorithms designed for hierarchical memory.
Random Arc Allocation and Applications to Disks, Drums and DRAMs
, 2001
"... The paper considers a generalization of the well known random placement of balls into bins. Given n ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The paper considers a generalization of the well known random placement of balls into bins. Given n
Tail bounds and expectations for random arc allocation and applications
 COMBINATORICS, PROBABILITY AND COMPUTING
, 2002
"... The paper considers a generalization of the well known random placement of balls into bins. Given n circular arcs of lengths αi, 0 ¡ ¢ i n we study the maximum number of overlapping arcs on a circle if the starting points of the arcs are chosen randomly. We give almost exact tail bounds on the maxim ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The paper considers a generalization of the well known random placement of balls into bins. Given n circular arcs of lengths αi, 0 ¡ ¢ i n we study the maximum number of overlapping arcs on a circle if the starting points of the arcs are chosen randomly. We give almost exact tail bounds on the maximum overlap of the arcs. These tail bounds yield a complete characterization of the expected maximum overlap that is tight up to constant factors in the lower order terms. We illustrate the strength of our results by presenting new performance guarantees for several application: Minimizing rotational delays of disks, scheduling accesses to parallel disks and allocating memory to limit cache interference misses.
Towards a Theory of CacheEfficient Algorithms ∗
, 2000
"... We describe a model that enables us to analyze the running time of an algorithm in a computer with a memory hierarchy with limited associativity, in terms of various cache parameters. Our model, an extension of Aggarwal and Vitter’s I/O model, enables us to establish useful relationships between the ..."
Abstract
 Add to MetaCart
We describe a model that enables us to analyze the running time of an algorithm in a computer with a memory hierarchy with limited associativity, in terms of various cache parameters. Our model, an extension of Aggarwal and Vitter’s I/O model, enables us to establish useful relationships between the cache complexity and the I/O complexity of computations. As a corollary, we obtain cacheoptimal algorithms for some fundamental problems like sorting, FFT, and an important subclass of permutations in the singlelevel cache model. We also show that ignoring associativity concerns could lead to inferior performance, by analyzing the averagecase cache behavior of mergesort. We further extend our model to multiple levels of cache with limited associativity and present optimal algorithms for matrix transpose and sorting. Our techniques may be used for systematic exploitation of the memory hierarchy starting from the algorithm design stage, and dealing with the hitherto unresolved problem of limited associativity. 1
© 2002 SpringerVerlag New York Inc. Scanning Multiple Sequences via Cache Memory1
"... Abstract. We consider the simple problem of scanning multiple sequences. There are k sequences of total length N which are to be scanned concurrently. One pointer into each sequence is maintained and an adversary specifies which pointer is to be advanced. The concept of scanning multiple sequence is ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. We consider the simple problem of scanning multiple sequences. There are k sequences of total length N which are to be scanned concurrently. One pointer into each sequence is maintained and an adversary specifies which pointer is to be advanced. The concept of scanning multiple sequence is ubiquitous in algorithms designed for hierarchical memory. In the external memory model of computation with block size B, a memory consisting of m blocks, and at most m sequences the problem is trivially solved with N/B memory misses by reserving one block of memory for each sequence. However, in a cache memory with associativity a, every access may lead to a cache fault if k> a. For a direct mapped cache (a = 1) two sequences suffice. We show that by randomizing the starting addresses of the sequences the number of cache misses can be kept to O(N/B) provided that k = O(m/B1/a), i.e., the number of sequences that can be supported is decreased by a factor B1/a. We also show a corresponding lower bound. Our result leads to a general method for converting sequence based algorithms designed for the external memory model of computation to cache memory even for caches with small associativity.