Results 1 - 10
of
12
External Memory Data Structures
, 2001
"... In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worst-case efficient external memory dynami ..."
Abstract
-
Cited by 78 (34 self)
- Add to MetaCart
In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worst-case efficient external memory dynamic data structures. We also briefly discuss some of the most popular external data structures used in practice.
STXXL: Standard template library for XXL data sets
- In: Proc. of ESA 2005. Volume 3669 of LNCS
, 2005
"... for processing huge data sets that can fit only on hard disks. It supports parallel disks, overlapping between disk I/O and computation and it is the first I/O-efficient algorithm library that supports the pipelining technique that can save more than half of the I/Os. STXXL has been applied both in ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
for processing huge data sets that can fit only on hard disks. It supports parallel disks, overlapping between disk I/O and computation and it is the first I/O-efficient algorithm library that supports the pipelining technique that can save more than half of the I/Os. STXXL has been applied both in academic and industrial environments for a range of problems including text processing, graph algorithms, computational geometry, gaussian elimination, visualization, and analysis of microscopic images, differential cryptographic analysis, etc. The performance of STXXL and its applications is evaluated on synthetic and real-world inputs. We present the design of the library, how its performance features are supported, and demonstrate how the library integrates with STL. KEY WORDS: very large data sets; software library; C++ standard template library; algorithm engineering 1.
I/O-Efficient Algorithms for Problems on Grid-based Terrains (Extended Abstract)
- In Proc. Workshop on Algorithm Engineering and Experimentation
, 2000
"... Lars Arge Laura Toma Jeffrey Scott Vitter Center for Geometric Computing Department of Computer Science Duke University Durham, NC 27708--0129 Abstract The potential and use of Geographic Information Systems (GIS) is rapidly increasing due to the increasing availability of massive amoun ..."
Abstract
-
Cited by 28 (13 self)
- Add to MetaCart
Lars Arge Laura Toma Jeffrey Scott Vitter Center for Geometric Computing Department of Computer Science Duke University Durham, NC 27708--0129 Abstract The potential and use of Geographic Information Systems (GIS) is rapidly increasing due to the increasing availability of massive amounts of geospatial data from projects like NASA's Mission to Planet Earth. However, the use of these massive datasets also exposes scalability problems with existing GIS algorithms. These scalability problems are mainly due to the fact that most GIS algorithms have been designed to minimize internal computation time, while I/O communication often is the bottleneck when processing massive amounts of data.
Fuzzycast: Media Broadcasting for Multiple Asynchronous Receivers
, 2001
"... When using an on-demand media streaming system on top of a network with Multicast support, it is sometimes more efficient to use broadcast to distribute popular content, especially when client demand is high. There has been a lot of research in broadcasting on-demand content to multiple, asynchronou ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
When using an on-demand media streaming system on top of a network with Multicast support, it is sometimes more efficient to use broadcast to distribute popular content, especially when client demand is high. There has been a lot of research in broadcasting on-demand content to multiple, asynchronous receivers. In this paper, we propose a family of novel, practical techniques for broadcasting on-demand media, which achieve lowest known server/network bandwidth usage and I/O efficient client buffer management, while retain the simplicity of a frame-based single channel scheme. We also propose playout scheduling strategies that make it practicable for serving both constant bitrate (CBR) and variable bitrate (VBR) media.
Locally Compressed Suffix Arrays
"... Compressed text (self-)indexes have matured up to a point where they can replace a text by a data structure that requires less space and, in addition to giving access to arbitrary text passages, support indexed text searches. At this point those indexes are competitive with traditional text indexes ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Compressed text (self-)indexes have matured up to a point where they can replace a text by a data structure that requires less space and, in addition to giving access to arbitrary text passages, support indexed text searches. At this point those indexes are competitive with traditional text indexes (which are very large) for counting the number of occurrences of a pattern in the text. Yet, they are still hundreds to thousands of times slower when it comes to locating those occurrences in the text. In this paper we introduce a new, local, compression scheme for suffix arrays which permits locating the occurrences extremely fast, while still being much smaller than classical indexes. The core of our contribution is the identification of the regularities exploited by the compression based on function Ψ, used for long time in compressed text indexing, with those exploited by Re-Pair on the differential suffix array. The latter enjoys the locality properties that the former methods lack. As another consequence of this locality, we show that our index can be implemented in secondary memory, where its access time improve thanks to compression, instead of worsening as is the norm in other self-indexes. Finally, some byproducts of our work, such as a compressed dictionary representation for Re-Pair, can be of independent interest. Categories and Subject Descriptors: F.2.2 [Analysis of algorithms and problem complexity]: Nonnumerical algorithms and problems—Pattern matching, Computations on discrete structures,
On sorting, heaps, and minimum spanning trees
- Algorithmica
"... Let A be a set of size m. Obtaining the first k ≤ m elements of A in ascending order can be done in optimal O(m + k log k) time. We present Incremental Quicksort (IQS), an algorithm (online on k) which incrementally gives the next smallest element of the set, so that the first k elements are obtaine ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Let A be a set of size m. Obtaining the first k ≤ m elements of A in ascending order can be done in optimal O(m + k log k) time. We present Incremental Quicksort (IQS), an algorithm (online on k) which incrementally gives the next smallest element of the set, so that the first k elements are obtained in optimal expected time for any k. Based on IQS, we present the Quickheap (QH), a simple and efficient priority queue for main and secondary memory. Quickheaps are comparable with classical binary heaps in simplicity, yet are more cache-friendly. This makes them an excellent alternative for a secondary memory implementation. We show that the expected amortized CPU cost per operation over a Quickheap of m elements is O(log m), and this translates into O((1/B)log(m/M)) I/O cost with main memory size M and block size B, in a cache-oblivious fashion. As a direct application, we use our techniques to implement classical Minimum Spanning Tree (MST) algorithms. We use IQS to implement Kruskal’s MST algorithm and QHs to implement Prim’s. Experimental results show that IQS, QHs, external QHs, and our Kruskal’s and Prim’s MST variants are competitive, and in many cases better in practice than current state-of-the-art alternative (and much more sophisticated) implementations.
Fuzzycast: Design and Implementation of a Scalable Media-on-Demand System
- PCF: I, II and III, Information and Computation 163
, 2002
"... FUZZYCAST: DESIGN AND IMPLEMENTATION OF A SCALABLE MEDIA-ON-DEMAND SYSTEM by Ramaprabhu Janakiraman ADVISOR: Professor M. Waldvogel December, 2002 Saint Louis, Missouri Server bandwidth has been identified as a major bottleneck in large Video-on-Demand (VoD) systems. Using multicast delivery to ..."
Abstract
- Add to MetaCart
FUZZYCAST: DESIGN AND IMPLEMENTATION OF A SCALABLE MEDIA-ON-DEMAND SYSTEM by Ramaprabhu Janakiraman ADVISOR: Professor M. Waldvogel December, 2002 Saint Louis, Missouri Server bandwidth has been identified as a major bottleneck in large Video-on-Demand (VoD) systems. Using multicast delivery to serve popular content helps increase scalability by making efficient use of server bandwidth. In addition, recent research has focused on proactive schemes in which the server periodically multicasts popular content without explicit requests from clients. Proactive schemes are attractive because they consume bounded server bandwidth irrespective of client arrival rate.
Efficient Buffer Management for Scalable Multimedia-on-Demand
, 2003
"... Widespread availability of high-speed networks and fast, cheap computation ha ve rendered high-quality Media-on-Demand (MoD) feasible. Research on scalable MoD has resulted in many efficient schemes that involve segmentation and asynchronous broadcast of media data, requiring clients to buffer and r ..."
Abstract
- Add to MetaCart
Widespread availability of high-speed networks and fast, cheap computation ha ve rendered high-quality Media-on-Demand (MoD) feasible. Research on scalable MoD has resulted in many efficient schemes that involve segmentation and asynchronous broadcast of media data, requiring clients to buffer and reorder out-of-order segments efficiently for serial playout. In such schemes, buffer space requirements run to several hundred megabytes a nd hence require efficient buffer management techniques involving both primary memory and secondary storage: while disk sizes have increased exponentially, access speeds have not kept pace at all. The conversion of out-of-order arrival to in-order playout suggests the use o f external memory priority queues, but their content-agnostic nature prevents them from performing well under MoD loads. In this paper, we propose and evaluate a series of simple heuristic schemes which, in simulation studies and in combination with our scalable MoD scheme, achieve significant improvements in storage performance over existing schemes.
Quickheaps: Simple, Efficient, and Cache-Oblivious ⋆
"... Abstract. We present the Quickheap, a simple and efficient data structure for implementing priority queues in main and secondary memory. Quickheaps are comparable with classical binary heaps in simplicity, but are more cache-friendly. This makes them an excellent alternative for a secondary memory i ..."
Abstract
- Add to MetaCart
Abstract. We present the Quickheap, a simple and efficient data structure for implementing priority queues in main and secondary memory. Quickheaps are comparable with classical binary heaps in simplicity, but are more cache-friendly. This makes them an excellent alternative for a secondary memory implementation. We show that the average amortized CPU cost per operation over a Quickheap of m elements is O(log m), and this translates into O((1/B) log(m/M)) I/O cost with block size B, in a cache-oblivious fashion. Our experimental results show that Quickheaps are very competitive with the best alternative external memory heaps. 1

