Results 1 
7 of
7
STXXL: Standard template library for XXL data sets
 In: Proc. of ESA 2005. Volume 3669 of LNCS
, 2005
"... for processing huge data sets that can fit only on hard disks. It supports parallel disks, overlapping between disk I/O and computation and it is the first I/Oefficient algorithm library that supports the pipelining technique that can save more than half of the I/Os. STXXL has been applied both in ..."
Abstract

Cited by 38 (5 self)
 Add to MetaCart
for processing huge data sets that can fit only on hard disks. It supports parallel disks, overlapping between disk I/O and computation and it is the first I/Oefficient algorithm library that supports the pipelining technique that can save more than half of the I/Os. STXXL has been applied both in academic and industrial environments for a range of problems including text processing, graph algorithms, computational geometry, gaussian elimination, visualization, and analysis of microscopic images, differential cryptographic analysis, etc. The performance of STXXL and its applications is evaluated on synthetic and realworld inputs. We present the design of the library, how its performance features are supported, and demonstrate how the library integrates with STL. KEY WORDS: very large data sets; software library; C++ standard template library; algorithm engineering 1.
The filterkruskal minimum spanning tree algorithm
, 2009
"... We present FilterKruskal – a simple modification of Kruskal’s algorithm that avoids sorting edges that are “obviously” not in the MST. For arbitrary graphs with random edge weights FilterKruskal runs in time O (m + n lognlog m n, i.e. in linear time for not too sparse graphs. Experiments indicate ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We present FilterKruskal – a simple modification of Kruskal’s algorithm that avoids sorting edges that are “obviously” not in the MST. For arbitrary graphs with random edge weights FilterKruskal runs in time O (m + n lognlog m n, i.e. in linear time for not too sparse graphs. Experiments indicate that the algorithm has very good practical performance over the entire range of edge densities. An equally simple parallelization seems to be the currently best practical algorithm on multicore machines.
Design and Implementation of a Practical I/Oefficient Shortest Paths Algorithm
"... We report on initial experimental results for a practical I/Oefficient SingleSource ShortestPaths (SSSP) algorithm on general undirected sparse graphs where the ratio between the largest and the smallest edge weight is reasonably bounded (for example integer weights in {1,...,2 32}) and the reali ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We report on initial experimental results for a practical I/Oefficient SingleSource ShortestPaths (SSSP) algorithm on general undirected sparse graphs where the ratio between the largest and the smallest edge weight is reasonably bounded (for example integer weights in {1,...,2 32}) and the realistic assumption holds that main memory is big enough to keep one bit per vertex. While our implementation only guarantees averagecase efficiency, i.e., assuming randomly chosen edgeweights, it turns out that its performance on realworld instances with nonrandom edge weights is actually even better than on the respective inputs with random weights. Furthermore, compared to the currently best implementation for externalmemory BFS [6], which in a sense constitutes a lower bound for SSSP, the running time of our approach always stayed within a factor of five, for the most difficult graph classes the difference was even less than a factor of two. We are not aware of any previous I/Oefficient implementation for the classic general SSSP in a (semi) external setting: in two recent projects [10, 23], Kumar/Schwabelike SSSP approaches on graphs of at most 6 million vertices have been tested, forcing the authors to artificially restrict the main memory size, M, to rather unrealistic 4 to 16 MBytes in order not to leave the semiexternal setting or produce huge running times for larger graphs: for random graphs of 2 20 vertices, the best previous approach needed over six hours. In contrast, for a similar ratio of input size vs. M, but on a 128 times larger and even sparser random graph, our approach was less than seven times slower, a relative gain of nearly 20. On a realworld 24 million node street graph, our implementation was over 40 times faster. Even larger gains of over 500 can be estimated for ran
I/OEfficient Batched UnionFind and Its . . .
"... Despite extensive study over the last four decades and numerous applications, no I/Oefficient algorithm is known for the unionfind problem. In this paper we present an I/Oefficient algorithm for the batched (offline) version of the unionfind problem. Given any sequence of N mixed union andfin ..."
Abstract
 Add to MetaCart
Despite extensive study over the last four decades and numerous applications, no I/Oefficient algorithm is known for the unionfind problem. In this paper we present an I/Oefficient algorithm for the batched (offline) version of the unionfind problem. Given any sequence of N mixed union andfind operations, where each union operation joins two distinct sets, our algorithm uses O(SORT(N)) = O ( NB logM/B NB) I/Os, where M is the memory size and B is the disk block size. This bound isasymptotically optimal in the worst case. If there are union operations that join a set with itself, our algorithm uses O(SORT(N) + MST(N)) I/Os, where MST(N) is the number of I/Os needed to compute the minimum spanning tree of a graph with N edges. We also describe a simple and practical O(SORT(N) log ( NM))I/O algorithm, which we have implemented.The main motivation for our study of the unionfind problem arises from problems in terrain analysis. A terrain can be abstracted as a height function defined over R2, and many problems that deal with suchfunctions require a unionfind data structure. With the emergence of modern mapping technologies, huge amount of data is being generated that is too large to fit in memory, thus I/Oefficient algorithmsare needed to process this data efficiently. In this paper, we study two terrain analysis problems that benefit from a unionfind data structure: (i) computing topological persistence and (ii) constructing thecontour tree. We give the first O(SORT(N))I/O algorithms for these two problems, assuming that theinput terrain is represented as a triangular mesh with N vertices.Finally, we report some preliminary experimental results, showing that our algorithms give orderofmagnitude improvement over previous methods on large data sets that do not fit in memory.
Intersection in Integer Inverted Indices
"... Inverted index data structures are the key to fast search engines. The predominant operation on inverted indices asks for intersecting two sorted lists of document IDs which might have vastly varying lengths. We compare previous theoretical approaches, methods used in practice, and one new algorithm ..."
Abstract
 Add to MetaCart
Inverted index data structures are the key to fast search engines. The predominant operation on inverted indices asks for intersecting two sorted lists of document IDs which might have vastly varying lengths. We compare previous theoretical approaches, methods used in practice, and one new algorithm which exploits that the intersection uses small integer keys. We also take different data compression techniques into account. The new algorithm is very fast, simple, has good space efficiency, and is the only algorithm that performs well over the entire spectrum of relative list length ratios. 1
PROTOTYPE TO DESIGN A LEASED LINE TELEPHONE NETWORK CONNECTING LOCATIONS TO MINIMIZE THE INSTALLATION COST Abstract
"... Network flows have many reallife applications and leasedline installation for telephone network is one of them. In the leased line a major concern is to provide connection of telephone to all the locations. It is required to install the leased line network that reaches all the locations at the min ..."
Abstract
 Add to MetaCart
Network flows have many reallife applications and leasedline installation for telephone network is one of them. In the leased line a major concern is to provide connection of telephone to all the locations. It is required to install the leased line network that reaches all the locations at the minimum cost. This chapter deals with this situation with the help of a network diagram in which each node represents the locations and the edge between each node represents leased line. Each edge has a number attached to it which represents the cost associated with installing that link. Aim of the paper is to determine the leased line network connecting all the locations at minimum cost of installation.