## STXXL: Standard template library for XXL data sets (2005)

### Cached

### Download Links

- [algo2.iti.uni-karlsruhe.de]
- [i10www.ira.uka.de]
- [algo2.iti.kit.edu]
- [www.mpi-sb.mpg.de]
- [www.mpi-inf.mpg.de]
- [i10www.ira.uka.de]
- [algo2.iti.kit.edu]
- [www.mpi-sb.mpg.de]
- [www.mpi-inf.mpg.de]
- [www.mpi-sb.mpg.de]
- [algo2.iti.uni-karlsruhe.de]
- [i10www.ira.uka.de]
- [algo2.iti.kit.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In: Proc. of ESA 2005. Volume 3669 of LNCS |

Citations: | 38 - 5 self |

### BibTeX

@INPROCEEDINGS{Dementiev05stxxl:standard,

author = {R. Dementiev and L. Kettner},

title = {STXXL: Standard template library for XXL data sets},

booktitle = {In: Proc. of ESA 2005. Volume 3669 of LNCS},

year = {2005},

pages = {640--651},

publisher = {Springer}

}

### OpenURL

### Abstract

for processing huge data sets that can fit only on hard disks. It supports parallel disks, overlapping between disk I/O and computation and it is the first I/O-efficient algorithm library that supports the pipelining technique that can save more than half of the I/Os. STXXL has been applied both in academic and industrial environments for a range of problems including text processing, graph algorithms, computational geometry, gaussian elimination, visualization, and analysis of microscopic images, differential cryptographic analysis, etc. The performance of STXXL and its applications is evaluated on synthetic and real-world inputs. We present the design of the library, how its performance features are supported, and demonstrate how the library integrates with STL. KEY WORDS: very large data sets; software library; C++ standard template library; algorithm engineering 1.

### Citations

8523 |
Introduction to Algorithms
- Cormen, Leiserson, et al.
- 2001
(Show Context)
Citation Context ...ficient on other levels. The Copyright c○ 2007 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2007; 00:1–7 Prepared using speauth.clss4 R. DEMENTIEV , L. KETTNER AND P. SANDERS cache-oblivious model in =-=[8]-=- avoids this problem by not providing the knowledge of the block size B and main memory size M to the algorithm. The benefit of such an algorithm is that it is I/O-efficient on all levels of the memor... |

637 | LEDA: A platform for combinatorial and geometric computing
- Mehlhorn, Naher
- 1995
(Show Context)
Citation Context ...work on the TPIE project is in progress. For our experiments we have used a TPIE version from September 2005. The LEDA-SM [27] external memory library was designed as an extension to the LEDA library =-=[31]-=- for handling large data sets. The library offers implementations of I/O-efficient sorting, external memory stack, queue, radix heap, array heap, buffer tree, array, B + -tree, string, suffix array, m... |

537 |
The input/output complexity of sorting and related problems
- Aggarwal, Vitter
- 1988
(Show Context)
Citation Context ...ing speauth.clss34 R. DEMENTIEV , L. KETTNER AND P. SANDERS each other quickly, etc. We explain how to overlap I/O and computation despite this irregularity using the I/O model of Aggarwal and Vitter =-=[74]-=- that allows access to D arbitrary blocks from D disks within one parallel I/O step. To model overlapping of I/O and computation, we assume that an I/O step takes time L and can be done in parallel wi... |

319 | External memory algorithms and data structures: dealing with massive data
- Vitter
(Show Context)
Citation Context ... data sets Theoretically, I/O-efficient algorithms and data structures have been developed for many problem domains: graph algorithms, string processing, computational geometry, etc. (see the surveys =-=[15, 16]-=-). Some of them have been implemented: sorting, matrix multiplication [17], search trees [18, 19, 20, 21], priority queues [22], text processing [23]. However only few of the existing I/O-efficient al... |

242 | The Standard Template Library
- Stepanov, Lee
- 1994
(Show Context)
Citation Context ...an abstract way, and provide well-engineered and robust implementations of basic external memory algorithms and data structures. 1.4. C++ standard template library The Standard Template Library (STL) =-=[24]-=- is a C++ library which is included in every C++ compiler distribution. It provides basic data structures (called containers) and algorithms. STL containers are generic and can store any built-in or u... |

172 | External-memory graph algorithms - Chiang, Goodrich, et al. - 1995 |

171 |
Generative programming: methods, tools, and applications
- Czarnecki, Eisenecker
- 2000
(Show Context)
Citation Context ... maximal independent set if none of its visited neighbours is already in the MIS. The neighbour nodes of the MIS nodes are stored as events in a priority queue. In Lines 6–7, the template metaprogram =-=[12]-=- PRIORITY QUEUE GENERATOR computes the type of priority queue that will store events. The metaprogram finds the optimal values for numerous tuning parameters (the number and the maximum arity of exter... |

152 |
Organization and maintenance of large ordered indices
- Bayer, McCreight
- 1970
(Show Context)
Citation Context ...e data field is large and the comparison function is simple. 5.5. Map Themap is an STL interface for search trees with unique keys. Our implementation ofmap is a variant of a B + -tree data structure =-=[58]-=- supporting the operations insert, erase, find, lower bound and upper bound in optimal O � log B (n) � I/Os. Operations of map use iterators to refer to the elements stored in the container, e.g. find... |

149 | The bu er tree: A new technique for optimal I/O-algorithms
- Arge
- 1995
(Show Context)
Citation Context ... Priority queue External memory priority queues are the central data structures for many I/O efficient graph algorithms [53, 51, 15]. The main technique in these algorithms is time-forward processing =-=[51, 54]-=-, easily realizable by an I/O efficient priority queue. This approach evaluates a DAG with labeled nodes. The Copyright c○ 2007 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2007; 00:1–7 Prepared using... |

106 | Implementation and performance of integrated application-controlled caching, prefetching and disk scheduling
- Cao, Felten, et al.
(Show Context)
Citation Context ...d three passes over data even for relatively small inputs. Prefetch buffers for disk load balancing and overlapping of I/O and computation have been intensively studied for external memory merge sort =-=[64, 65, 66, 47, 60, 67]-=-. But we have not seen results that guarantee overlapping of I/O and computation during the parallel disk merging of arbitrary runs. There are many good practical implementations of sorting (e.g. [68,... |

103 | GPUTeraSort: High Performance Graphics Coprocessor Sorting for Large Database Management
- Govindaraju, Gray, et al.
- 2006
(Show Context)
Citation Context ...ort (Indy 2004, 2005) and Byte-Split-Index Sort (Daytona 2006) are published. The authors of the THSort, SheenkSort and Byte-Split-Index have used systems with four disks. The GpuTeraSort (Indy 2006) =-=[73]-=- uses a graphic processing unit (GPU) for internal sorting, mapping a bitonic sorting network to GPU rasterization operations and using the GPUs programmable hardware and high bandwidth memory interfa... |

100 |
Texas: An efficient, portable persistent store
- SINGHAL, KAKKAD, et al.
- 1992
(Show Context)
Citation Context ...arching, scanning. We are currently working on using MCSTL to parallelize the internal work of STXXL algorithms and data structures. There is a number of libraries which provide persistent containers =-=[37, 38, 39, 40, 41, 42]-=-. Persistent STL-compatible containers are implemented in [43, 44]. These containers can keep (some of) the elements in external memory transparently to the user. In contrast to STXXL, these libraries... |

72 | A locality-preserving cache-oblivious dynamic dictionary
- Bender, Duan, et al.
- 2002
(Show Context)
Citation Context ...t on all levels of the memory hierarchy across many systems without fine tuning for any particular real machine parameters. Many basic algorithms and data structures have been designed for this model =-=[8, 9, 10, 11]-=-. A drawback of cache-oblivious algorithms playing a role in practice is that they are only asymptotically I/O-optimal. The constants hidden in the O-notation of their I/O-complexity are significantly... |

69 | I/O complexity of graph algorithms
- Munagala
- 1999
(Show Context)
Citation Context ...t machine. Suffix arrays for long strings up to 4 billion characters could be computed in hours. The project [81] has compared experimentally two external memory breadth-first search (BFS) algorithms =-=[82, 83]-=-. The pipelining technique of STXXL has helped to save a factor of 2–3 in I/O volume of the BFS implementations. Using STXXL, it became possible to compute BFS decomposition of node-set of large grid ... |

64 | Cache-oblivious priority queue and graph algorithm applications
- Arge, Bender, et al.
- 2002
(Show Context)
Citation Context ...t on all levels of the memory hierarchy across many systems without fine tuning for any particular real machine parameters. Many basic algorithms and data structures have been designed for this model =-=[8, 9, 10, 11]-=-. A drawback of cache-oblivious algorithms playing a role in practice is that they are only asymptotically I/O-optimal. The constants hidden in the O-notation of their I/O-complexity are significantly... |

64 | Near-optimal parallel prefetching and caching
- Kimbrel, Karlin
- 1996
(Show Context)
Citation Context ...d three passes over data even for relatively small inputs. Prefetch buffers for disk load balancing and overlapping of I/O and computation have been intensively studied for external memory merge sort =-=[64, 65, 66, 47, 60, 67]-=-. But we have not seen results that guarantee overlapping of I/O and computation during the parallel disk merging of arbitrary runs. There are many good practical implementations of sorting (e.g. [68,... |

61 | Simple randomized mergesort on parallel disks - Barve, Grove, et al. - 1997 |

60 | AlphaSort: A RISC Machine Sort
- Nyberg, Barclay, et al.
- 1994
(Show Context)
Citation Context ... 67]. But we have not seen results that guarantee overlapping of I/O and computation during the parallel disk merging of arbitrary runs. There are many good practical implementations of sorting (e.g. =-=[68, 69, 70, 71]-=-) that address parallel disks, overlapping of I/O and computation, and have a low internal overhead. However, we are not aware of fast implementations that give good performance guarantees for all inp... |

58 |
First Draft of a Report on the EDVAC
- Neumann
- 1945
(Show Context)
Citation Context ...the virtual memory mechanism that extends the working space for applications, mapping an external memory file (page/swap file) to virtual addresses. This idea supports the Random Access Machine model =-=[5]-=- in which a program has an infinitely large main memory. With virtual memory the application does not know where its data is located: in the main memory or in the swap file. This abstraction does not ... |

47 | External-memory breadth-first search with sublinear I/O
- Mehlhorn, Meyer
- 2002
(Show Context)
Citation Context ...t machine. Suffix arrays for long strings up to 4 billion characters could be computed in hours. The project [81] has compared experimentally two external memory breadth-first search (BFS) algorithms =-=[82, 83]-=-. The pipelining technique of STXXL has helped to save a factor of 2–3 in I/O volume of the BFS implementations. Using STXXL, it became possible to compute BFS decomposition of node-set of large grid ... |

45 | Fast priority queues for cached memory
- Sanders
- 1999
(Show Context)
Citation Context ...s per vector element. EM priority queues are used for time-forward processing technique in external graph algorithms [10, 2] and online sorting. The Stxxl implementation of priority queue is based on =-=[11]-=-. This queue needs less than a third of I/Os used by other similar cache (I/O) efficient priority queues. The implementation supports parallel disks and overlaps I/O and computation. The current versi... |

40 | Ecient bulk operations on dynamic r-trees
- Arge, Hinrichs, et al.
- 1999
(Show Context)
Citation Context ...ny problem domains: graph algorithms, string processing, computational geometry, etc. (see the surveys [15, 16]). Some of them have been implemented: sorting, matrix multiplication [17], search trees =-=[18, 19, 20, 21]-=-, priority queues [22], text processing [23]. However only few of the existing I/O-efficient algorithms have been studied experimentally. As new algorithmic results rely on previous ones, researchers,... |

37 | A Super Scalar Sort Algorithm for RISC Processors
- Agarwal
- 1996
(Show Context)
Citation Context ... 67]. But we have not seen results that guarantee overlapping of I/O and computation during the parallel disk merging of arbitrary runs. There are many good practical implementations of sorting (e.g. =-=[68, 69, 70, 71]-=-) that address parallel disks, overlapping of I/O and computation, and have a low internal overhead. However, we are not aware of fast implementations that give good performance guarantees for all inp... |

34 | I/O-efficient scientific computation using TPIE
- Vengroff, Vitter
- 1996
(Show Context)
Citation Context ...en developed for many problem domains: graph algorithms, string processing, computational geometry, etc. (see the surveys [15, 16]). Some of them have been implemented: sorting, matrix multiplication =-=[17]-=-, search trees [18, 19, 20, 21], priority queues [22], text processing [23]. However only few of the existing I/O-efficient algorithms have been studied experimentally. As new algorithmic results rely... |

34 | A transparent parallel I/O environment
- Vengroff
- 1994
(Show Context)
Citation Context ...s. The library provides implementations of basic parallel disk algorithms. STXXL is the only external memory algorithm library supporting parallel disks. Such a feature was announced for TPIE in 1996 =-=[29, 28]-=-. • The library is able to handle problems of a very large size (up to dozens of terabytes). • Improved utilization of computer resources. STXXL explicit supports overlapping between I/O and computati... |

32 | Implementing I/O-efficient data structures using TPIE
- Arge, Procopiuc, et al.
- 2002
(Show Context)
Citation Context ...algorithms, string processing, computational geometry, etc. (for a survey see [2]). Some of them have been implemented: ⋆ Partially supported by DFG grant SA 933/1-2. 1ssorting, matrix multiplication =-=[3]-=-, (geometric) search trees [3], priority queues [4], suffix array construction [4]. However there is an increasing gap between theoretical achievements of external memory (EM) algorithms and their pra... |

31 | Minimizing Stall Time in SIngle and Parallel Disk Systems
- Albers, Garg, et al.
(Show Context)
Citation Context ...d three passes over data even for relatively small inputs. Prefetch buffers for disk load balancing and overlapping of I/O and computation have been intensively studied for external memory merge sort =-=[64, 65, 66, 47, 60, 67]-=-. But we have not seen results that guarantee overlapping of I/O and computation during the parallel disk merging of arbitrary runs. There are many good practical implementations of sorting (e.g. [68,... |

30 | Better external memory suffix array construction
- Dementiev, Kärkkäinen, et al.
- 2008
(Show Context)
Citation Context ...tics) as their STL counterparts. The Streaming layer provides efficient support for pipelining EM algorithms. The algorithms for external memory suffix array construction implemented with this module =-=[8]-=- require only 1/3 of I/Os which must be performed by implementations that use conventional data structures and algorithms (either from Stxxl STL-user layer, or LEDA-SM, or TPIE). The rest of this sect... |

29 |
CT. Out-of-core rendering of large, unstructured grids
- Farias, Silva
(Show Context)
Citation Context ...bytes of geographically-referenced information that includes the whole Earth. In computer graphics one has to visualize highly complex scenes using only a conventional workstation with limited memory =-=[1]-=-. Billing systems of telecommunication companies evaluate terabytes of phone call log files [2]. One is interested in analyzing huge network instances like a web graph [3] or a phone call graph. Searc... |

26 | Bkdtree: A dynamic scalable kd-tree
- Procopiuc, Agarwal, et al.
- 2003
(Show Context)
Citation Context ...ny problem domains: graph algorithms, string processing, computational geometry, etc. (see the surveys [15, 16]). Some of them have been implemented: sorting, matrix multiplication [17], search trees =-=[18, 19, 20, 21]-=-, priority queues [22], text processing [23]. However only few of the existing I/O-efficient algorithms have been studied experimentally. As new algorithmic results rely on previous ones, researchers,... |

25 | Cache-oblivious data structures and algorithms for undirected breadth-first search and shortest paths
- GS, Fagerberg, et al.
(Show Context)
Citation Context ...t on all levels of the memory hierarchy across many systems without fine tuning for any particular real machine parameters. Many basic algorithms and data structures have been designed for this model =-=[8, 9, 10, 11]-=-. A drawback of cache-oblivious algorithms playing a role in practice is that they are only asymptotically I/O-optimal. The constants hidden in the O-notation of their I/O-complexity are significantly... |

24 |
Latency lags bandwith
- Patterson
(Show Context)
Citation Context ...crepancy between the speed of CPUs and the latency of the lower hierarchy levels grows very quickly: the speed of processors is improved by about 55 % yearly, the hard disk access latency only by 9 % =-=[7]-=-. Therefore, the algorithms which are aware of the memory hierarchy will continue to benefit in the future and the development of such algorithms is an important trend in computer science. The PDM mod... |

23 | Engineering a cache-oblivious sorting algorithm
- Brodal, Fagerberg, et al.
(Show Context)
Citation Context ... tuned cache-oblivious funnel sort implementation [12] is 2.6–4.0 times slower than our I/O-efficient sorter from STXXL (Section 6) for out-of-memory inputs [13]. A similar funnel sort implementation =-=[14]-=- is up to two times slower than the I/O-efficient sorter from the TPIE library (Section 1.7) for large inputs. The reason for this is that these I/O-efficient sorters are highly optimized to minimize ... |

22 | Asynchronous parallel disk sorting
- Dementiev, Sanders
- 2003
(Show Context)
Citation Context ...ation/deallocation allowing several block-to-disk assignment strategies: striping, randomized striping, randomized cycling, etc. The BM layer provides implementation of parallel disk buffered writing =-=[7]-=-, optimal prefetching [7], and block caching. The implementations are fully asynchronous and designed to explicitly support overlapping between I/O and computation. The top of Stxxl consists of two mo... |

21 | Theoretical and experimental study on the construction of suffix arrays in external memory
- Crauser, Ferragina
(Show Context)
Citation Context ... computational geometry, etc. (see the surveys [15, 16]). Some of them have been implemented: sorting, matrix multiplication [17], search trees [18, 19, 20, 21], priority queues [22], text processing =-=[23]-=-. However only few of the existing I/O-efficient algorithms have been studied experimentally. As new algorithmic results rely on previous ones, researchers, which would like to engineer practical impl... |

21 |
External memory graph algorithms
- Chiang, Goodrich, et al.
- 1995
(Show Context)
Citation Context ...removal, and inspection of the element at the top of the stack. Due to the restricted set of operations a stack can be implemented I/O-efficiently and applied in many external memory algorithms (e.g. =-=[51, 52]-=-). Four implementations of a stack are available in STXXL, which are optimized for different access patterns (long or short random insert/remove sequences) and manage their memory space differently (o... |

20 |
Sibeyn, editors. Algorithms for Memory Hierarchies
- Meyer, Sanders, et al.
- 2003
(Show Context)
Citation Context ... data sets Theoretically, I/O-efficient algorithms and data structures have been developed for many problem domains: graph algorithms, string processing, computational geometry, etc. (see the surveys =-=[15, 16]-=-). Some of them have been implemented: sorting, matrix multiplication [17], search trees [18, 19, 20, 21], priority queues [22], text processing [23]. However only few of the existing I/O-efficient al... |

20 | Columnsort lives! an efficient out-of-core sorting program
- Chaudhry, Wisniewski, et al.
- 2001
(Show Context)
Citation Context ...ory size artificially to obtain a nontrivial number of runs. Additionally, our implementation is not a prototype, it has a generic interface and is a part of the software library STXXL. Algorithms in =-=[61, 62, 63]-=- have the theoretical advantage of being deterministic. However, they need three passes over data even for relatively small inputs. Prefetch buffers for disk load balancing and overlapping of I/O and ... |

18 | Dynamic and I/O-efficient algorithms for computational geometry and graph problems: theoretical and experimental results
- Chiang
- 1995
(Show Context)
Citation Context ...ny problem domains: graph algorithms, string processing, computational geometry, etc. (see the surveys [15, 16]). Some of them have been implemented: sorting, matrix multiplication [17], search trees =-=[18, 19, 20, 21]-=-, priority queues [22], text processing [23]. However only few of the existing I/O-efficient algorithms have been studied experimentally. As new algorithmic results rely on previous ones, researchers,... |

17 | Algorithms and experiments for the webgraph
- Laura, Leonardi, et al.
- 2002
(Show Context)
Citation Context ...kstation with limited memory [1]. Billing systems of telecommunication companies evaluate terabytes of phone call log files [2]. One is interested in analyzing huge network instances like a web graph =-=[3]-=- or a phone call graph. Search engines like Google and Yahoo provide fast text search in their data bases indexing billions of web pages. A precise simulation of the Earth’s climate needs to manipulat... |

17 | Duality between prefetching and queued writing with parallel disks - Hutchinson, Sanders, et al. - 2001 |

17 |
Optimal prefetching and caching for parallel I/O systems
- Kallahalla, Varman
- 2001
(Show Context)
Citation Context ...k algorithm [49] that can be viewed as the immediate ancestor of our algorithm. Innovations with respect to our sorting are: a different allocation strategy that enables better theoretical I/O bounds =-=[47, 60]-=-; a prefetching algorithm that optimizes the number of I/O steps and never evicts data previously fetched; overlapping of I/O and computation; a completely asynchronous implementation that reacts flex... |

16 |
A framework for simple sorting algorithms on parallel disk systems
- Rajasekaran
- 1998
(Show Context)
Citation Context ...ory size artificially to obtain a nontrivial number of runs. Additionally, our implementation is not a prototype, it has a generic interface and is a part of the software library STXXL. Algorithms in =-=[61, 62, 63]-=- have the theoretical advantage of being deterministic. However, they need three passes over data even for relatively small inputs. Prefetch buffers for disk load balancing and overlapping of I/O and ... |

15 | An Experimental Study of Priority Queues in External Memory
- Brengel, Crauser, et al.
- 1999
(Show Context)
Citation Context ...ms, string processing, computational geometry, etc. (see the surveys [15, 16]). Some of them have been implemented: sorting, matrix multiplication [17], search trees [18, 19, 20, 21], priority queues =-=[22]-=-, text processing [23]. However only few of the existing I/O-efficient algorithms have been studied experimentally. As new algorithmic results rely on previous ones, researchers, which would like to e... |

14 | Engineering an external memory minimum spanning tree algorithm
- Dementiev, Sanders, et al.
- 2004
(Show Context)
Citation Context ...llion edges in less than a day, and for random sparse graphs within an hour. Simple algorithms for computing minimum spanning trees (MST), connected components, and spanning forests were developed in =-=[14]-=-. Their implementations were built using STL-user-level algorithms and data structures of Stxxl. The largest solved MST problem had 2 32 nodes, the input graph edges occupied 96 GBytes. The computatio... |

13 | I/O-Efficient Algorithms for Shortest Path Related Problems
- Zeh
- 2002
(Show Context)
Citation Context ...Os in the worst case. Sequential scanning of the vector costs O(1/DB) amortized I/Os per vector element. EM priority queues are used for time-forward processing technique in external graph algorithms =-=[10, 2]-=- and online sorting. The Stxxl implementation of priority queue is based on [11]. This queue needs less than a third of I/Os used by other similar cache (I/O) efficient priority queues. The implementa... |

13 | Distribution sort with randomized cycling
- Vitter, Hutchinson
- 2001
(Show Context)
Citation Context ...ating and deallocating external memory space on disks. The manager supports four parallel disk allocation strategies: simple striping, fully randomized, simple randomized [49], and randomized cycling =-=[50]-=-. The BM layer also delivers a set of helper classes that efficiently implement frequently used sequential patterns of interaction with parallel disks. The optimal parallel disk queued writing [47] is... |

13 | Getting more from out-of-core columnsort
- Chaudhry, Cormen
(Show Context)
Citation Context ...ory size artificially to obtain a nontrivial number of runs. Additionally, our implementation is not a prototype, it has a generic interface and is a part of the software library STXXL. Algorithms in =-=[61, 62, 63]-=- have the theoretical advantage of being deterministic. However, they need three passes over data even for relatively small inputs. Prefetch buffers for disk load balancing and overlapping of I/O and ... |

12 | CRB-tree: An efficient indexing scheme for range aggregate queries
- Govindarajan, Agarwal, et al.
- 2003
(Show Context)
Citation Context |

11 |
EAM. Algorithms for parallel memory
- JS, Shriver
- 1994
(Show Context)
Citation Context ...tep the algorithms try to transfer D blocks between the main memory of size M and D disks (one block from each disk). This model has been formalized by Vitter and Shriver as Parallel Disk Model (PDM) =-=[1]-=- and is the standard theoretical model for designing and analyzing I/O-efficient algorithms. In this model, N is the input size and B is the block size measured in bytes. Theoretically I/O-efficient a... |