STXXL: Standard template library for XXL data sets
 In: Proc. of ESA 2005. Volume 3669 of LNCS
, 2005
Cited by 38 (5 self)
for processing huge data sets that can fit only on hard disks. It supports parallel disks, overlapping between disk I/O and computation and it is the first I/Oefficient algorithm library that supports the pipelining technique that can save more than half of the I/Os. STXXL has been applied both in academic and industrial environments for a range of problems including text processing, graph algorithms, computational geometry, gaussian elimination, visualization, and analysis of microscopic images, differential cryptographic analysis, etc. The performance of STXXL and its applications is evaluated on synthetic and realworld inputs. We present the design of the library, how its performance features are supported, and demonstrate how the library integrates with STL. KEY WORDS: very large data sets; software library; C++ standard template library; algorithm engineering 1.
Tight lower bounds for query processing on streaming and external memory data
 ICALP
, 2005
Cited by 26 (12 self)
Abstract. We study a clean machine model for external memory and stream processing. We show that the number of scans of the external data induces a strict hierarchy (as long as work space is sufficiently small, e.g., polylogarithmic in the size of the input). We also show that neither joins nor sorting are feasible if the product of the number r(n) of scans of the external memory and the size s(n) of the internal memory buffers is sufficiently small, e.g., of size o ( 5 √ n). We also establish tight bounds for the complexity of XPath evaluation and filtering. 1
CacheOblivious Data Structures and Algorithms for Undirected BreadthFirst Search and Shortest Paths
 IN PROCEEDINGS OF THE 9TH SCANDINAVIAN WORKSHOP ON ALGORITHM THEORY
, 2004
Cited by 25 (9 self)
We present improved cacheoblivious data structures and algorithms for breadthfirst search and the singlesource shortest path problem on undirected graphs with nonnegative edge weights. Our results close the performance gap between the currently best cacheaware algorithms for these problems and their cacheoblivious counterparts. Our shortestpath algorithm relies on a new data structure, called bucket heap, which is the first cacheoblivious priority queue to efficiently support a weak DecreaseKey operation.
Lower bounds for sorting with few random accesses to external memory
 PODS
, 2005
Cited by 20 (9 self)
We consider a scenario where we want to query a large dataset that is stored in external memory and does not fit into main memory. The most constrained resources in such a situation are the size of the main memory and the number of random accesses to external memory. We note that sequentially streaming data from external memory through main memory is much less prohibitive. We propose an abstract model of this scenario in which we restrict the size of the main memory and the number of random accesses to external memory, but do not restrict sequential reads. A distinguishing feature of our model is that it admits the usage of unlimited external memory for storing intermediate results, such as several hard disks that can be accessed in parallel. In practice, such auxiliary external memory can be crucial. For example, in a first sequential pass the data can be annotated, and in a second pass this annotation can be used to answer the query. Koch’s [9] ARB system for answering XPath queries is based on such a strategy. In this model, we prove lower bounds for sorting the input data. As opposed to related results for models without auxiliary external memory for intermediate results, we cannot rely on communication complexity to establish these lower bounds. Instead, we simulate our model by a nonuniform computation model for which we can establish the lower bounds by combinatorial means. 1.
Engineering an External Memory Minimum Spanning Tree Algorithm
 IN PROC. 3RD IFIP INTL. CONF. ON THEORETICAL COMPUTER SCIENCE
, 2004
Cited by 14 (3 self)
We develop an external memory algorithm for computing minimum spanning trees. The algorithm is considerably simpler than previously known external memory algorithms for this problem and needs a factor of at least four less I/Os for realistic inputs. Our implementation indicates that this algorithm processes graphs only limited by the disk capacity of most current machines in time no more than a factor 2–5 of a good internal algorithm with sufficient memory space.
Cacheoblivious algorithms and data structures
 IN SWAT
, 2004
Cited by 10 (1 self)
Frigo, Leiserson, Prokop and Ramachandran in 1999 introduced the idealcache model as a formal model of computation for developing algorithms in environments with multiple levels of caching, and coined the terminology of cacheoblivious algorithms. Cacheoblivious algorithms are described as standard RAM algorithms with only one memory level, i.e. without any knowledge about memory hierarchies, but are analyzed in the twolevel I/O model of Aggarwal and Vitter for an arbitrary memory and block size and an optimal offline cache replacement strategy. The result are algorithms that automatically apply to multilevel memory hierarchies. This paper gives an overview of the results achieved on cacheoblivious algorithms and data structures since the seminal paper by Frigo et al.
The complexity of querying external memory and streaming data
 In Proceedings of FCT’05, Springer LNCS volume 3623
, 2005
Cited by 8 (4 self)
Abstract. We review a recently introduced computation model for streaming and external memory data. An important feature of this model is that it distinguishes between sequentially reading (streaming) data from external memory (through main memory) and randomly accessing external memory data at specific memory locations; it is wellknown that the latter is much more expensive in practice. We explain how a number of lower bound results are obtained in this model and how they can be applied for proving lower bounds for XML query processing. 1
Lower Bounds for Processing Data with Few Random Accesses to External Memory
Cited by 5 (1 self)
We consider a scenario where we want to query a large dataset that is stored in external memory and does not fit into main memory. The most constrained resources in such a situation are the size of the main memory and the number of random accesses to external memory. We note that sequentially streaming data from external memory through main memory is much less prohibitive. We propose an abstract model of this scenario in which we restrict the size of the main memory and the number of random accesses to external memory, but admit arbitrary sequential access. A distinguishing feature of our model is that it allows the usage of unlimited external memory for storing intermediate results, such as several hard disks that can be accessed in parallel. In this model, we prove lower bounds for the problem of sorting a sequence of strings (or numbers), the problem of deciding whether two given sets of strings are equal, and two closely related decision problems. Intuitively, our results say that there is no algorithm for the problems that uses internal memory space bounded by N 1−ε and at most o(log N) random accesses to external memory, but unlimited “streaming access”, both for writing to and reading from external memory. (Here N denotes the size of the input and ε is an arbitrary constant greater than 0.) We even permit randomized algorithms with onesided bounded error. We also consider the problem
Reversal complexity revisited
 Theor. Comput. Sci
, 2008
Cited by 4 (2 self)
Abstract. We study a generalized version of reversal bounded Turing machines where, apart from several tapes on which the number of head reversals is bounded by r(n), there are several further tapes on which head reversals remain unrestricted, but size is bounded by s(n) (where n denotes the input length). Recently [9,10], such machines were introduced as a formalization of a computation model that restricts random access to external memory and internal memory space. Here, each of the tapes with a restriction on the head reversals corresponds to an external memory device, and the tapes of restricted size model internal memory. We use ST(r(n), s(n), O(1)) to denote the class of all problems that can be solved by deterministic Turing machines that comply to the above resource bounds. Similarly, NST( · · ·) and RST( · · ·), respectively, are used for the corresponding nondeterministic and randomized classes. While previous papers focused on lower bounds for particular problems, including sorting, the set equality problem, and several query evaluation problems, the present paper addresses the relations between the (R,N)ST( · · ·)classes and classical complexity classes and investigates the structural complexity of the (R,N)ST( · · ·)classes. Our main results are (1) a tradeoff between internal memory space and external memory head reversals, (2) correspondences between the (R,N)ST( · · · ) classes and “classical ” timebounded, spacebounded, reversalbounded, and circuit complexity classes, and (3) hierarchies of (R)ST( · · ·)classes in terms of increasing numbers of head reversals on external memory tapes. 1
A Parallel ExternalMemory Frontier BreadthFirst Traversal Algorithm for Clusters of Workstations
Cited by 2 (0 self)
Abstract — This paper presents a parallel externalmemory algorithm for performing a breadthfirst traversal of an implicit graph on a cluster of workstations. The algorithm is a parallel version of the sortingbased externalmemory frontier breadthfirst traversal with delayed duplicate detection algorithm. The algorithm distributes the workload according to intervals that are computed at runtime via a samplingbased process. We present an experimental evaluation of the algorithm where we compare its performance to that of its sequential counterpart on the implicit graphs of two classic planning problems. The speedups attained by the algorithm over its sequential counterpart are consistently near linear and frequently above linear. Analysis reveals that the algorithm is proficient at distributing the workload and that increasing the number of samples obtained by the samplingbased process improves workload distribution. Analysis also reveals that the algorithm benefits from the caching of external memory in internal memory that is done by the operating system. I.