Results 1 
7 of
7
Cacheaware and cacheoblivious adaptive sorting
 In Proc. 32nd International Colloquium on Automata, Languages, and Programming, Lecture Notes in Computer Science
, 2005
"... Abstract. Two new adaptive sorting algorithms are introduced which perform an optimal number of comparisons with respect to the number of inversions in the input. The first algorithm is based on a new linear time reduction to (nonadaptive) sorting. The second algorithm is based on a new division pr ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
Abstract. Two new adaptive sorting algorithms are introduced which perform an optimal number of comparisons with respect to the number of inversions in the input. The first algorithm is based on a new linear time reduction to (nonadaptive) sorting. The second algorithm is based on a new division protocol for the GenericSort algorithm by EstivillCastro and Wood. From both algorithms we derive I/Ooptimal cacheaware and cacheoblivious adaptive sorting algorithms. These are the first I/Ooptimal adaptive sorting algorithms. 1
A Cache Oblivious Approach for the Problem of Computing Single Source Shortest Paths on Undirected Graphs
, 2005
"... This report presents the current state of my research activity in the framework of my Doctorate program. The main problem I am studying is how to obtain an I/O efficient cache oblivious Single Source Shortest Paths (SSSP) algorithm for undirected graphs. The background of my work is described in Sec ..."
Abstract
 Add to MetaCart
This report presents the current state of my research activity in the framework of my Doctorate program. The main problem I am studying is how to obtain an I/O efficient cache oblivious Single Source Shortest Paths (SSSP) algorithm for undirected graphs. The background of my work is described in Section 1, where we explain why it is important to develop algorithms making an efficient usage of memory hierarchies, and we describe two important models for studying the efficiency of such algorithms. Section 2 deals with existing I/O efficient algorithms for the SSSP and for a closely related problem, Breadth First Search (BFS). Section 3 is focused on my ongoing research, and describes how some of the techniques presented in the previous section may be reused to obtain the desired algorithm. In Section 4 we draw conclusions, and point out how I intend to organize my work for the forthcoming year. 1 Memory hierarchies: organization and models
Low Depth CacheOblivious Algorithms
, 2009
"... In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on a variety of parallel cache architectures. The approach is to design nested parallel algorithms that have low depth (span, critical path length) and for which the natural s ..."
Abstract
 Add to MetaCart
In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on a variety of parallel cache architectures. The approach is to design nested parallel algorithms that have low depth (span, critical path length) and for which the natural sequential evaluation order has low cache complexity in the cacheoblivious model. We describe several cacheoblivious algorithms with optimal work, polylogarithmic depth, and sequential cache complexities that match the best sequential algorithms, including the first such algorithms for sorting and for sparsematrix vector multiply on matrices with good vertex separators. Our sorting algorithm yields the first cacheoblivious algorithms with polylogarithmic depth and low sequential cache complexities for list ranking, Euler tour tree labeling, tree contraction, least common ancestors, graph connectivity, and minimum spanning forest. Using known mappings, our results lead to low cache complexities on multicore processors (and sharedmemory multiprocessors) with a single level of private caches or a single shared cache. We generalize these mappings to a multilevel parallel treeofcaches model that reflects current and future trends in multicore cache hierarchies—these new mappings imply that our algorithms also have low cache complexities on such hierarchies. The key factor in obtaining these low parallel cache complexities is the low depth of the
Techniques for Parallel Memory Hierarchies
"... Shared memory parallel machines are a popular choice for processing large data sets as they present a comparatively simple interface for programmers. Despite this, writing scalable parallel programs on them is very difficult because of the large disparity in the costs of accessing memory locations a ..."
Abstract
 Add to MetaCart
Shared memory parallel machines are a popular choice for processing large data sets as they present a comparatively simple interface for programmers. Despite this, writing scalable parallel programs on them is very difficult because of the large disparity in the costs of accessing memory locations and limited interconnect bandwidth. For most data intensive applications, performance on these machines is bound not by the number of the processors, but by the capabilities of interconnect between processors, caches, and memory and how well the program is optimized for data locality. By using parallel memory hierarchies a tree of caches in essence as an approximate model for such machines, I propose to develop software and hardware approaches to make designing scalable nested parallel programs easier. The three techniques I plan to study (in theory and practice) are: • Designing parallel algorithms with low “memory access costs ” for frequently used problems, along with a new cost model to measure these costs. • Thread schedulers with provable guarantees on running times for a broad class of nested parallel programs, and an experimental setup for expressing and testing them. • Combinable memoryblock transactions: A scheme for scalable implementation of atomic operations required for managing concurrency in runtime systems or any other concurrent program.
CacheOblivious String Dictionaries
"... We present static cacheoblivious dictionary structures for strings which provide analogues of tries and suffix trees in the cacheoblivious model. Our construction takes as input either a set of strings to store, a single string for which all suffixes are to be stored, a trie, a compressed trie, or ..."
Abstract
 Add to MetaCart
We present static cacheoblivious dictionary structures for strings which provide analogues of tries and suffix trees in the cacheoblivious model. Our construction takes as input either a set of strings to store, a single string for which all suffixes are to be stored, a trie, a compressed trie, or a suffix tree, and creates a cacheoblivious data structure which performs prefix queries in O(log B n + P /B) I/Os, where n is the number of leaves in the trie, P is the query string, and B is the block size. This query cost is optimal for unbounded alphabets. The data structure uses linear space. 1
DataOblivious Graph Algorithms for Secure Computation and Outsourcing
"... This work treats the problem of designing dataoblivious algorithms for classical and widely used graph problems. A dataoblivious algorithm is defined as having the same sequence of operations regardless of the input data and dataindependent memory accesses. Such algorithms are suitable for secure ..."
Abstract
 Add to MetaCart
This work treats the problem of designing dataoblivious algorithms for classical and widely used graph problems. A dataoblivious algorithm is defined as having the same sequence of operations regardless of the input data and dataindependent memory accesses. Such algorithms are suitable for secure processing in outsourced and similar environments, which serves as the main motivation for this work. We provide dataoblivious algorithms for breadthfirst search, singlesource singledestination shortest path, minimum spanning tree, and maximum flow, the asymptotic complexities of which are optimal, or close to optimal, for dense graphs.
Intel Labs Pittsburgh
, 2009
"... In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on a variety of parallel cache architectures. The approach is to design nested parallel algorithms that have low depth (span, critical path length) and for which the natural s ..."
Abstract
 Add to MetaCart
In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on a variety of parallel cache architectures. The approach is to design nested parallel algorithms that have low depth (span, critical path length) and for which the natural sequential evaluation order has low cache complexity in the cacheoblivious model. We describe several cacheoblivious algorithms with optimal work, polylogarithmic depth, and sequential cache complexities that match the best sequential algorithms, including the first such algorithms for sorting and for sparsematrix vector multiply on matrices with good vertex separators. Our sorting algorithm yields the first cacheoblivious algorithms with polylogarithmic depth and low sequential cache complexities for list ranking, Euler tour tree labeling, tree contraction, least common ancestors, graph connectivity, and minimum spanning forest. Using known mappings, our results lead to low cache complexities on multicore processors (and sharedmemory