Results 1  10
of
24
Improved algorithms and data structures for solving graph problems in external memory
 In Proc. IEEE Symposium on Parallel and Distributed Processing
, 1996
"... ..."
Fast Concurrent Access to Parallel Disks
"... High performance applications involving large data sets require the efficient and flexible use of multiple disks. In an external memory machine with D parallel, independent disks, only one block can be accessed on each disk in one I/O step. This restriction leads to a load balancing problem that is ..."
Abstract

Cited by 62 (12 self)
 Add to MetaCart
High performance applications involving large data sets require the efficient and flexible use of multiple disks. In an external memory machine with D parallel, independent disks, only one block can be accessed on each disk in one I/O step. This restriction leads to a load balancing problem that is perhaps the main inhibitor for the efficient adaptation of singledisk external memory algorithms to multiple disks. We solve this problem for arbitrary access patterns by randomly mapping blocks of a logical address space to the disks. We show that a shared buffer of O(D) blocks suffices to support efficient writing. The analysis uses the properties of negative association to handle dependencies between the random variables involved. This approach might be of independent interest for probabilistic analysis in general. If two randomly allocated copies of each block exist, N arbitrary blocks can be read within dN=De + 1 I/O steps with high probability. The redundancy can be further reduced from 2 to 1 + 1=r for any integer r without a big impact on reading efficiency. From the point of view of external memory models, these results rehabilitate Aggarwal and Vitter's "singledisk multihead" model [1] that allows access to D arbitrary blocks in each I/O step. This powerful model can be emulated on the physically more realistic independent disk model [2] with small constant overhead factors. Parallel disk external memory algorithms can therefore be developed in the multihead model first. The emulation result can then be applied directly or further refinements can be added.
Externalmemory breadthfirst search with sublinear I/O
 IN PROCEEDINGS OF THE 10TH ANNUAL EUROPEAN SYMPOSIUM ON ALGORITHMS
, 2002
"... Breadthfirst search (BFS) is a basic graph exploration technique. We give the first external memory algorithm for sparse undirected graphs with sublinear I/O. The best previous algorithm requires \Theta (n + n+mD\Delta B \Delta logM=B n+mB) I/Os on a graph with n nodes and m edges and a machine w ..."
Abstract

Cited by 57 (14 self)
 Add to MetaCart
(Show Context)
Breadthfirst search (BFS) is a basic graph exploration technique. We give the first external memory algorithm for sparse undirected graphs with sublinear I/O. The best previous algorithm requires \Theta (n + n+mD\Delta B \Delta logM=B n+mB) I/Os on a graph with n nodes and m edges and a machine with mainmemory of size M, D parallel disks, and block size B. We present two versions of a new algorithm which requires only O i (p 1D\Delta B + p nm) \Delta n+mpD\Delta B \Delta logM=B n+mB
STXXL: Standard template library for XXL data sets
 In: Proc. of ESA 2005. Volume 3669 of LNCS
, 2005
"... for processing huge data sets that can fit only on hard disks. It supports parallel disks, overlapping between disk I/O and computation and it is the first I/Oefficient algorithm library that supports the pipelining technique that can save more than half of the I/Os. STXXL has been applied both in ..."
Abstract

Cited by 56 (5 self)
 Add to MetaCart
(Show Context)
for processing huge data sets that can fit only on hard disks. It supports parallel disks, overlapping between disk I/O and computation and it is the first I/Oefficient algorithm library that supports the pipelining technique that can save more than half of the I/Os. STXXL has been applied both in academic and industrial environments for a range of problems including text processing, graph algorithms, computational geometry, gaussian elimination, visualization, and analysis of microscopic images, differential cryptographic analysis, etc. The performance of STXXL and its applications is evaluated on synthetic and realworld inputs. We present the design of the library, how its performance features are supported, and demonstrate how the library integrates with STL. KEY WORDS: very large data sets; software library; C++ standard template library; algorithm engineering 1.
Towards a theory of cacheefficient algorithms
 PROCEEDINGS OF THE SYMPOSIUM ON DISCRETE
, 2000
"... We present a model that enables us to analyze the running time of an algorithm on a computer with a memory hierarchy with limited associativity, in terms of various cache parameters. Our cache model, an extension of Aggarwal and Vitter’s I/O model, enables us to establish useful relationships betw ..."
Abstract

Cited by 56 (3 self)
 Add to MetaCart
We present a model that enables us to analyze the running time of an algorithm on a computer with a memory hierarchy with limited associativity, in terms of various cache parameters. Our cache model, an extension of Aggarwal and Vitter’s I/O model, enables us to establish useful relationships between the cache complexity and the I/O complexity of computations. As a corollary, we obtain cacheefficient algorithms in the singlelevel cache model for fundamental problems like sorting, FFT, and an important subclass of permutations. We also analyze the averagecase cache behavior of mergesort, show that ignoring associativity concerns could lead to inferior performance, and present supporting experimental evidence. We further extend our model to multiple levels of cache with limited associativity and present optimal algorithms for matrix transpose and sorting. Our techniques may be used for systematic
Efficient External Memory Algorithms by Simulating CoarseGrained Parallel Algorithms
, 2003
"... External memory (EM) algorithms are designed for largescale computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation. In the past, several attempts have been made to ..."
Abstract

Cited by 46 (11 self)
 Add to MetaCart
(Show Context)
External memory (EM) algorithms are designed for largescale computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation. In the past, several attempts have been made to relate the large body of work on parallel algorithms to EM, but with limited success. The combination of EM computing, on multiple disks, with multiprocessor parallelism has been posted as a challenge by the ACMWorking Group on Storage I/O for LargeScale Computing.
A computational study of externalmemory BFS algorithms
 In SODA
, 2006
"... Breadth First Search (BFS) traversal is an archetype for many important graph problems. However, computing a BFS level decomposition for massive graphs was considered nonviable so far, because of the large number of I/Os it incurs. This paper presents the first experimental evaluation of recent exte ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
(Show Context)
Breadth First Search (BFS) traversal is an archetype for many important graph problems. However, computing a BFS level decomposition for massive graphs was considered nonviable so far, because of the large number of I/Os it incurs. This paper presents the first experimental evaluation of recent externalmemory BFS algorithms for general graphs. With our STXXL based implementations exploiting pipelining and diskparallelism, we were able to compute the BFS level decomposition of a webcrawl based graph of around 130 million nodes and 1.4 billion edges in less than 4 hours using single disk and 2.3 hours using 4 disks. We demonstrate that some rather simple externalmemory algorithms perform significantly better (minutes as compared to hours) than internalmemory BFS, even if more than half of the input resides internally. 1
CacheEfficient Matrix Transposition
"... We investigate the memory system performance of several algorithms for transposing an N N matrix inplace, where N is large. Specifically, we investigate the relative contributions of the data cache, the translation lookaside buffer, register tiling, and the array layout function to the overall runn ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
We investigate the memory system performance of several algorithms for transposing an N N matrix inplace, where N is large. Specifically, we investigate the relative contributions of the data cache, the translation lookaside buffer, register tiling, and the array layout function to the overall running time of the algorithms. We use various memory models to capture and analyze the effect of various facets of cache memory architecture that guide the choice of a particular algorithm, and attempt to experimentally validate the predictions of the model. Our major conclusions are as follows: limited associativity in the mapping from main memory addresses to cache sets can significantly degrade running time; the limited number of TLB entries can easily lead to thrashing; the fanciest optimal algorithms are not competitive on real machines even at fairly large problem sizes unless cache miss penalties are quite high; lowlevel performance tuning “hacks”, such as register tiling and array alignment, can significantly distort the effects of improved algorithms; and hierarchical nonlinear layouts are inherently superior to the standard canonical layouts (such as row or columnmajor) for
this problem.
Largescale directed model checking LTL
 In Model Checking Software (SPIN
, 2006
"... Abstract. To analyze larger models for explicitstate model checking, directed model checking applies errorguided search, external model checking uses secondary storage media, and distributed model checking exploits parallel exploration on multiple processors. In this paper we propose an external, ..."
Abstract

Cited by 25 (8 self)
 Add to MetaCart
(Show Context)
Abstract. To analyze larger models for explicitstate model checking, directed model checking applies errorguided search, external model checking uses secondary storage media, and distributed model checking exploits parallel exploration on multiple processors. In this paper we propose an external, distributed and directed onthefly model checking algorithm to check general LTL properties in the model checker SPIN. Previous attempts restricted to checking safety properties. The worstcase I/O complexity is bounded by O(sort(FR)/p + l · scan(FS)), where S and R are the sets of visited states and transitions in the synchronized product of the Büchi automata for the model and the property specification, F is the number of accepting states, l is the length of the shortest counterexample, and p is the number of processors. The algorithm we propose returns minimal lassoshaped counterexamples and includes refinements for propertydriven exploration. 1
External A*
 IN GERMAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (KI
, 2004
"... In this paper we study External A*, a variant of the conventional (internal) A* algorithm that makes use of external memory, e.g., a hard disk. The approach applies to implicit, undirected, unweighted state space problem graphs with consistent estimates. It combines all three aspects of bestfirs ..."
Abstract

Cited by 22 (9 self)
 Add to MetaCart
In this paper we study External A*, a variant of the conventional (internal) A* algorithm that makes use of external memory, e.g., a hard disk. The approach applies to implicit, undirected, unweighted state space problem graphs with consistent estimates. It combines all three aspects of bestfirst search, frontier search and delayed duplicate detection and can still operate on very small internal memory. The complexity of the external algorithm is almost linear in external sorting time and accumulates to O(sort(E) + scan(V I/O operations, where V and E are the set of nodes and edges in the explored portion of the state space graph. Given that delayed duplicate elimination has to be performed, the established bound is I/O optimal. In contrast to the internal algorithm, we exploit memory locality to allow blockwise rather than random access. The algorithmic design refers to external shortest path search in explicit graphs and extends the strategy of delayed duplicate detection recently suggested for breadthfirst search to bestfirst search. We conduct experiments with slidingtile puzzle instances.