Results 1 -
8 of
8
STXXL: Standard template library for XXL data sets
- In: Proc. of ESA 2005. Volume 3669 of LNCS
, 2005
"... for processing huge data sets that can fit only on hard disks. It supports parallel disks, overlapping between disk I/O and computation and it is the first I/O-efficient algorithm library that supports the pipelining technique that can save more than half of the I/Os. STXXL has been applied both in ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
for processing huge data sets that can fit only on hard disks. It supports parallel disks, overlapping between disk I/O and computation and it is the first I/O-efficient algorithm library that supports the pipelining technique that can save more than half of the I/Os. STXXL has been applied both in academic and industrial environments for a range of problems including text processing, graph algorithms, computational geometry, gaussian elimination, visualization, and analysis of microscopic images, differential cryptographic analysis, etc. The performance of STXXL and its applications is evaluated on synthetic and real-world inputs. We present the design of the library, how its performance features are supported, and demonstrate how the library integrates with STL. KEY WORDS: very large data sets; software library; C++ standard template library; algorithm engineering 1.
Cache-oblivious algorithms and data structures
- In SWAT
, 2004
"... Abstract. Frigo, Leiserson, Prokop and Ramachandran in 1999 introduced the ideal-cache model as a formal model of computation for developing algorithms in environments with multiple levels of caching, and coined the terminology of cache-oblivious algorithms. Cache-oblivious algorithms are described ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract. Frigo, Leiserson, Prokop and Ramachandran in 1999 introduced the ideal-cache model as a formal model of computation for developing algorithms in environments with multiple levels of caching, and coined the terminology of cache-oblivious algorithms. Cache-oblivious algorithms are described as standard RAM algorithms with only one memory level, i.e. without any knowledge about memory hierarchies, but are analyzed in the two-level I/O model of Aggarwal and Vitter for an arbitrary memory and block size and an optimal off-line cache replacement strategy. The result are algorithms that automatically apply to multi-level memory hierarchies. This paper gives an overview of the results achieved on cache-oblivious algorithms and data structures since the seminal paper by Frigo et al. 1
Computer-aided design of highperformance algorithms
, 2008
"... High-performance algorithms play an important role in many areas of computer science and are core components of many software systems used in real-world applications. Traditionally, the creation of these algorithms requires considerable expertise and experience, often in combination with a substanti ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
High-performance algorithms play an important role in many areas of computer science and are core components of many software systems used in real-world applications. Traditionally, the creation of these algorithms requires considerable expertise and experience, often in combination with a substantial amount of trial and error. Here, we outline a new approach to the process of designing high-performance algorithms that is based on the use of automated procedures for exploring potentially very large spaces of candidate designs. We contrast this computer-aided design approach with the traditional approach and discuss why it can be expected to yield better performing, yet simpler algorithms. Finally, we sketch out the high-level design of a software environment that supports our new design approach. Existing work on algorithm portfolios, algorithm selection, algorithm configuration and parameter tuning, but also on general methods for discrete and continuous optimisation methods fits naturally into our design approach and can be integrated into the proposed software environment. 1
A Novel Parallel Sorting Algorithm for Contemporary Architectures
, 2007
"... Traditionally, the field of scientific computing has been dominated by numerical methods. However, modern scientific codes often combine numerical methods with combinatorial methods. Sorting, a widely studied problem in computer science, is an important primitive for combinatorial scientific computi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Traditionally, the field of scientific computing has been dominated by numerical methods. However, modern scientific codes often combine numerical methods with combinatorial methods. Sorting, a widely studied problem in computer science, is an important primitive for combinatorial scientific computing. As high
An Experimental Study of Sorting and Branch Prediction
"... Sorting is one of the most important and well studied problems in Computer Science. Many good algorithms are known which offer various trade-offs in efficiency, simplicity, memory use, and other factors. However, these algorithms do not take into account features of modern computer architectures tha ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Sorting is one of the most important and well studied problems in Computer Science. Many good algorithms are known which offer various trade-offs in efficiency, simplicity, memory use, and other factors. However, these algorithms do not take into account features of modern computer architectures that significantly influence performance. Caches and branch predictors are two such features, and while there has been a significant amount of research into the cache performance of general purpose sorting algorithms, there has been little research on their branch prediction properties. In this paper we empirically examine the behaviour of the branches in all the most common sorting algorithms. We also consider the interaction of cache optimization on the predictability of the branches in these algorithms. We find insertion sort to have the fewest branch mispredictions of any comparison-based sorting algorithm, that bubble and shaker sort operate in a fashion which makes their branches highly unpredictable, that the unpredictability of shellsort’s branches improves its caching behaviour and that several cache optimizations have little effect on mergesort’s branch mispredictions. We find also that optimizations to quicksort – for example the choice of pivot – have a strong influence on the predictability of its branches. We point out a simple way of removing branch instructions from a classic heapsort implementation, and show also that unrolling a loop in a cache optimized heapsort implementation improves the predicitability of its branches. Finally, we note that when sorting random data two-level adaptive branch predictors are usually no better than simpler bimodal predictors. This is despite the fact that two-level adaptive predictors are almost always superior to bimodal predictors in general.
unknown title
"... Partitioning has been used to improve the performance of the hash join in the main memory; however, cache-conscious partitioning requires the knowledge about the cache parameters, such as the capacity and unit size, of a chosen level of the CPU caches, e.g., the L2 cache. Obtaining this knowledge an ..."
Abstract
- Add to MetaCart
Partitioning has been used to improve the performance of the hash join in the main memory; however, cache-conscious partitioning requires the knowledge about the cache parameters, such as the capacity and unit size, of a chosen level of the CPU caches, e.g., the L2 cache. Obtaining this knowledge and subsequently tuning the algorithm may be inconvenient, and sometimes infeasible, for complex systems. As evidence, our experiments on three different hardware platforms show that, on each platform, the best partitioning granularity was none of the cache parameters. Therefore, we propose a cache-oblivious approach to partitioned hash joins, in which the algorithm is aware of the existence of the memory hierarchy but requires no knowledge about the parameter values. In specific, we perform binary partitioning on a join relation recursively until the base case is reached. To improve the efficiency, we have designed a novel cacheoblivious buffering structure to facilitate this partitioning and have proposed a cache-oblivious cost model to estimate the base case size. Our theoretical and empirical results both show that this cache-oblivious join matches the performance of its manually tuned, cache-conscious counterparts. 1
Low Depth Cache-Oblivious Algorithms
, 2009
"... In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on a variety of parallel cache architectures. The approach is to design nested parallel algorithms that have low depth (span, critical path length) and for which the natural s ..."
Abstract
- Add to MetaCart
In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on a variety of parallel cache architectures. The approach is to design nested parallel algorithms that have low depth (span, critical path length) and for which the natural sequential evaluation order has low cache complexity in the cache-oblivious model. We describe several cache-oblivious algorithms with optimal work, polylogarithmic depth, and sequential cache complexities that match the best sequential algorithms, including the first such algorithms for sorting and for sparse-matrix vector multiply on matrices with good vertex separators. Our sorting algorithm yields the first cache-oblivious algorithms with polylogarithmic depth and low sequential cache complexities for list ranking, Euler tour tree labeling, tree contraction, least common ancestors, graph connectivity, and minimum spanning forest. Using known mappings, our results lead to low cache complexities on multi-core processors (and sharedmemory multiprocessors) with a single level of private caches or a single shared cache. We generalize these mappings to a multi-level parallel tree-of-caches model that reflects current and future trends in multi-core cache hierarchies—these new mappings imply that our algorithms also have low cache complexities on such hierarchies. The key factor in obtaining these low parallel cache complexities is the low depth of the

