Results 1  10
of
15
External Memory Data Structures
, 2001
"... In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worstcase efficient external memory dynami ..."
Abstract

Cited by 81 (36 self)
 Add to MetaCart
In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worstcase efficient external memory dynamic data structures. We also briefly discuss some of the most popular external data structures used in practice.
STXXL: Standard template library for XXL data sets
 In: Proc. of ESA 2005. Volume 3669 of LNCS
, 2005
"... for processing huge data sets that can fit only on hard disks. It supports parallel disks, overlapping between disk I/O and computation and it is the first I/Oefficient algorithm library that supports the pipelining technique that can save more than half of the I/Os. STXXL has been applied both in ..."
Abstract

Cited by 41 (5 self)
 Add to MetaCart
for processing huge data sets that can fit only on hard disks. It supports parallel disks, overlapping between disk I/O and computation and it is the first I/Oefficient algorithm library that supports the pipelining technique that can save more than half of the I/Os. STXXL has been applied both in academic and industrial environments for a range of problems including text processing, graph algorithms, computational geometry, gaussian elimination, visualization, and analysis of microscopic images, differential cryptographic analysis, etc. The performance of STXXL and its applications is evaluated on synthetic and realworld inputs. We present the design of the library, how its performance features are supported, and demonstrate how the library integrates with STL. KEY WORDS: very large data sets; software library; C++ standard template library; algorithm engineering 1.
I/OEfficient Algorithms for Problems on Gridbased Terrains (Extended Abstract)
 In Proc. Workshop on Algorithm Engineering and Experimentation
, 2000
"... Lars Arge Laura Toma Jeffrey Scott Vitter Center for Geometric Computing Department of Computer Science Duke University Durham, NC 277080129 Abstract The potential and use of Geographic Information Systems (GIS) is rapidly increasing due to the increasing availability of massive amoun ..."
Abstract

Cited by 32 (14 self)
 Add to MetaCart
Lars Arge Laura Toma Jeffrey Scott Vitter Center for Geometric Computing Department of Computer Science Duke University Durham, NC 277080129 Abstract The potential and use of Geographic Information Systems (GIS) is rapidly increasing due to the increasing availability of massive amounts of geospatial data from projects like NASA's Mission to Planet Earth. However, the use of these massive datasets also exposes scalability problems with existing GIS algorithms. These scalability problems are mainly due to the fact that most GIS algorithms have been designed to minimize internal computation time, while I/O communication often is the bottleneck when processing massive amounts of data.
On sorting, heaps, and minimum spanning trees
 Algorithmica
"... Let A be a set of size m. Obtaining the first k ≤ m elements of A in ascending order can be done in optimal O(m + k log k) time. We present Incremental Quicksort (IQS), an algorithm (online on k) which incrementally gives the next smallest element of the set, so that the first k elements are obtaine ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Let A be a set of size m. Obtaining the first k ≤ m elements of A in ascending order can be done in optimal O(m + k log k) time. We present Incremental Quicksort (IQS), an algorithm (online on k) which incrementally gives the next smallest element of the set, so that the first k elements are obtained in optimal expected time for any k. Based on IQS, we present the Quickheap (QH), a simple and efficient priority queue for main and secondary memory. Quickheaps are comparable with classical binary heaps in simplicity, yet are more cachefriendly. This makes them an excellent alternative for a secondary memory implementation. We show that the expected amortized CPU cost per operation over a Quickheap of m elements is O(log m), and this translates into O((1/B)log(m/M)) I/O cost with main memory size M and block size B, in a cacheoblivious fashion. As a direct application, we use our techniques to implement classical Minimum Spanning Tree (MST) algorithms. We use IQS to implement Kruskal’s MST algorithm and QHs to implement Prim’s. Experimental results show that IQS, QHs, external QHs, and our Kruskal’s and Prim’s MST variants are competitive, and in many cases better in practice than current stateoftheart alternative (and much more sophisticated) implementations.
Locally Compressed Suffix Arrays
"... Compressed text (self)indexes have matured up to a point where they can replace a text by a data structure that requires less space and, in addition to giving access to arbitrary text passages, support indexed text searches. At this point those indexes are competitive with traditional text indexes ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Compressed text (self)indexes have matured up to a point where they can replace a text by a data structure that requires less space and, in addition to giving access to arbitrary text passages, support indexed text searches. At this point those indexes are competitive with traditional text indexes (which are very large) for counting the number of occurrences of a pattern in the text. Yet, they are still hundreds to thousands of times slower when it comes to locating those occurrences in the text. In this paper we introduce a new, local, compression scheme for suffix arrays which permits locating the occurrences extremely fast, while still being much smaller than classical indexes. The core of our contribution is the identification of the regularities exploited by the compression based on function Ψ, used for long time in compressed text indexing, with those exploited by RePair on the differential suffix array. The latter enjoys the locality properties that the former methods lack. As another consequence of this locality, we show that our index can be implemented in secondary memory, where its access time improve thanks to compression, instead of worsening as is the norm in other selfindexes. Finally, some byproducts of our work, such as a compressed dictionary representation for RePair, can be of independent interest. Categories and Subject Descriptors: F.2.2 [Analysis of algorithms and problem complexity]: Nonnumerical algorithms and problems—Pattern matching, Computations on discrete structures,
Fuzzycast: Media Broadcasting for Multiple Asynchronous Receivers
, 2001
"... When using an ondemand media streaming system on top of a network with Multicast support, it is sometimes more efficient to use broadcast to distribute popular content, especially when client demand is high. There has been a lot of research in broadcasting ondemand content to multiple, asynchronou ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
When using an ondemand media streaming system on top of a network with Multicast support, it is sometimes more efficient to use broadcast to distribute popular content, especially when client demand is high. There has been a lot of research in broadcasting ondemand content to multiple, asynchronous receivers. In this paper, we propose a family of novel, practical techniques for broadcasting ondemand media, which achieve lowest known server/network bandwidth usage and I/O efficient client buffer management, while retain the simplicity of a framebased single channel scheme. We also propose playout scheduling strategies that make it practicable for serving both constant bitrate (CBR) and variable bitrate (VBR) media.
Design and Analysis of Sequential and Parallel . . .
, 2002
"... We study the performance of algorithms for the SingleSource ShortestPaths (SSSP) problem on graphs with nodes and edges with nonnegative random weights. All previously known SSSP algorithms for directed graphs required superlinear time. We give the first SSSP algorithms that provably achieve lin ..."
Abstract
 Add to MetaCart
We study the performance of algorithms for the SingleSource ShortestPaths (SSSP) problem on graphs with nodes and edges with nonnegative random weights. All previously known SSSP algorithms for directed graphs required superlinear time. We give the first SSSP algorithms that provably achieve linear averagecase execution time on arbitrary directed graphs with random edge weights. For independent edge weights, the lineartime bound holds with high probability, too. Additionally, our result implies improved averagecase bounds for the AllPairs ShortestPaths (APSP) problem on sparse graphs, and it yields the first theoretical averagecase analysis for the “Approximate Bucket Implementation” of Dijkstra’s SSSP algorithm (ABI–Dijkstra). Furthermore, we give constructive proofs for the existence of graph classes with random edge weights on which ABI–Dijkstra and several other wellknown SSSP algorithms require superlinear averagecase time. Besides the classical sequential (single processor) model of computation we also consider parallel computing: we give the currently fastest averagecase linearwork parallel SSSP algorithms for large graph classes with random edge weights, e.g., sparse random graphs and graphs modeling the WWW, telephone calls or social networks.
Design and Analysis of Sequential . . .
, 2002
"... We study the performance of algorithms for the SingleSource ShortestPaths (SSSP) problem on graphs withÒnodes andÑedges with nonnegative random weights. All previously known SSSP algorithms for directed graphs required superlinear time. We give the first SSSP algorithms that provably achieve line ..."
Abstract
 Add to MetaCart
We study the performance of algorithms for the SingleSource ShortestPaths (SSSP) problem on graphs withÒnodes andÑedges with nonnegative random weights. All previously known SSSP algorithms for directed graphs required superlinear time. We give the first SSSP algorithms that provably achieve linearÇÒÑaveragecase execution time on arbitrary directed graphs with random edge weights. For independent edge weights, the lineartime bound holds with high probability, too. Additionally, our result implies improved averagecase bounds for the AllPairs ShortestPaths (APSP) problem on sparse graphs, and it yields the first theoretical averagecase analysis for the “Approximate Bucket Implementation” of Dijkstra’s SSSP algorithm (ABI–Dijkstra). Furthermore, we give constructive proofs for the existence of graph classes with random edge weights on which ABI–Dijkstra and several other wellknown SSSP algorithms require superlinear averagecase time. Besides the classical sequential (single processor) model of computation we also consider parallel computing: we give the currently fastest averagecase linearwork parallel SSSP algorithms for large graph classes with random edge weights, e.g., sparse random graphs and graphs modeling the WWW, telephone calls or social networks.
Combining the SweepLine Method with the use of an Externalmemory Priority Queue ⋆
"... Abstract. The sweepline method is an explicitstate model checking technique that uses a notion of progress to delete states from internal memory during state space exploration and thereby reduce peak memory usage. The sweepline algorithm relies on the use of a priority queue where the progress va ..."
Abstract
 Add to MetaCart
Abstract. The sweepline method is an explicitstate model checking technique that uses a notion of progress to delete states from internal memory during state space exploration and thereby reduce peak memory usage. The sweepline algorithm relies on the use of a priority queue where the progress value assigned to a state determines the priority of the state. In earlier implementations of the sweepline method the progress priority queue is kept in internal memory together with the current layer of states being explored. In this paper we investigate a scheme where the current layer is stored in internal memory while the priority queue is stored in external memory. From the perspective of the sweepline method, we show that this combination can yield a significant reduction in peak memory usage compared to a pure internal memory implementation. On an average of 60 example instances, this combination reduced peak memory usage by more than 25 % at the cost of an increase execution time by a factor 2.5. From the perspective of external memory state space exploration, we demonstrate experimentally that the state deletion performed by the sweepline method may reduce the I/O overhead induced by duplicate detection compared to a pure external memory state space exploration method.