Results 1 -
5 of
5
Cache-aware and cache-oblivious adaptive sorting
- In Proc. 32nd International Colloquium on Automata, Languages, and Programming, Lecture Notes in Computer Science
, 2005
"... Abstract. Two new adaptive sorting algorithms are introduced which perform an optimal number of comparisons with respect to the number of inversions in the input. The first algorithm is based on a new linear time reduction to (non-adaptive) sorting. The second algorithm is based on a new division pr ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Abstract. Two new adaptive sorting algorithms are introduced which perform an optimal number of comparisons with respect to the number of inversions in the input. The first algorithm is based on a new linear time reduction to (non-adaptive) sorting. The second algorithm is based on a new division protocol for the GenericSort algorithm by Estivill-Castro and Wood. From both algorithms we derive I/O-optimal cache-aware and cache-oblivious adaptive sorting algorithms. These are the first I/Ooptimal adaptive sorting algorithms. 1
A Cache Oblivious Approach for the Problem of Computing Single Source Shortest Paths on Undirected Graphs
, 2005
"... This report presents the current state of my research activity in the framework of my Doctorate program. The main problem I am studying is how to obtain an I/O efficient cache oblivious Single Source Shortest Paths (SSSP) algorithm for undirected graphs. The background of my work is described in Sec ..."
Abstract
- Add to MetaCart
This report presents the current state of my research activity in the framework of my Doctorate program. The main problem I am studying is how to obtain an I/O efficient cache oblivious Single Source Shortest Paths (SSSP) algorithm for undirected graphs. The background of my work is described in Section 1, where we explain why it is important to develop algorithms making an efficient usage of memory hierarchies, and we describe two important models for studying the efficiency of such algorithms. Section 2 deals with existing I/O efficient algorithms for the SSSP and for a closely related problem, Breadth First Search (BFS). Section 3 is focused on my ongoing research, and describes how some of the techniques presented in the previous section may be reused to obtain the desired algorithm. In Section 4 we draw conclusions, and point out how I intend to organize my work for the forthcoming year. 1 Memory hierarchies: organization and models
Low Depth Cache-Oblivious Algorithms
, 2009
"... In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on a variety of parallel cache architectures. The approach is to design nested parallel algorithms that have low depth (span, critical path length) and for which the natural s ..."
Abstract
- Add to MetaCart
In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on a variety of parallel cache architectures. The approach is to design nested parallel algorithms that have low depth (span, critical path length) and for which the natural sequential evaluation order has low cache complexity in the cache-oblivious model. We describe several cache-oblivious algorithms with optimal work, polylogarithmic depth, and sequential cache complexities that match the best sequential algorithms, including the first such algorithms for sorting and for sparse-matrix vector multiply on matrices with good vertex separators. Our sorting algorithm yields the first cache-oblivious algorithms with polylogarithmic depth and low sequential cache complexities for list ranking, Euler tour tree labeling, tree contraction, least common ancestors, graph connectivity, and minimum spanning forest. Using known mappings, our results lead to low cache complexities on multi-core processors (and sharedmemory multiprocessors) with a single level of private caches or a single shared cache. We generalize these mappings to a multi-level parallel tree-of-caches model that reflects current and future trends in multi-core cache hierarchies—these new mappings imply that our algorithms also have low cache complexities on such hierarchies. The key factor in obtaining these low parallel cache complexities is the low depth of the
Techniques for Parallel Memory Hierarchies
"... Shared memory parallel machines are a popular choice for processing large data sets as they present a comparatively simple interface for programmers. Despite this, writing scalable parallel programs on them is very difficult because of the large disparity in the costs of accessing memory locations a ..."
Abstract
- Add to MetaCart
Shared memory parallel machines are a popular choice for processing large data sets as they present a comparatively simple interface for programmers. Despite this, writing scalable parallel programs on them is very difficult because of the large disparity in the costs of accessing memory locations and limited interconnect bandwidth. For most data intensive applications, performance on these machines is bound not by the number of the processors, but by the capabilities of interconnect between processors, caches, and memory and how well the program is optimized for data locality. By using parallel memory hierarchies- a tree of caches in essence- as an approximate model for such machines, I propose to develop software and hardware approaches to make designing scalable nested parallel programs easier. The three techniques I plan to study (in theory and practice) are: • Designing parallel algorithms with low “memory access costs ” for frequently used problems, along with a new cost model to measure these costs. • Thread schedulers with provable guarantees on running times for a broad class of nested parallel programs, and an experimental set-up for expressing and testing them. • Combinable memory-block transactions: A scheme for scalable implementation of atomic operations required for managing concurrency in run-time systems or any other concurrent program.
Cache-Oblivious String Dictionaries
"... We present static cache-oblivious dictionary structures for strings which provide analogues of tries and suffix trees in the cache-oblivious model. Our construction takes as input either a set of strings to store, a single string for which all suffixes are to be stored, a trie, a compressed trie, or ..."
Abstract
- Add to MetaCart
We present static cache-oblivious dictionary structures for strings which provide analogues of tries and suffix trees in the cache-oblivious model. Our construction takes as input either a set of strings to store, a single string for which all suffixes are to be stored, a trie, a compressed trie, or a suffix tree, and creates a cacheoblivious data structure which performs prefix queries in O(log B n + |P |/B) I/Os, where n is the number of leaves in the trie, P is the query string, and B is the block size. This query cost is optimal for unbounded alphabets. The data structure uses linear space. 1

