Results 1  10
of
14
Improved Algorithms and Data Structures for Solving Graph Problems in External Memory
 In Proc. IEEE Symp. on Parallel and Distributed Processing
, 1996
"... Recently, the study of I/Oefficient algorithms has moved beyond fundamental problems of sorting and permuting and into wider areas such as computational geometry and graph algorithms. With this expansion has come a need for new algorithmic techniques and data structures. In this paper, we present I ..."
Abstract

Cited by 75 (0 self)
 Add to MetaCart
Recently, the study of I/Oefficient algorithms has moved beyond fundamental problems of sorting and permuting and into wider areas such as computational geometry and graph algorithms. With this expansion has come a need for new algorithmic techniques and data structures. In this paper, we present I/Oefficient analogues of wellknown data structures that we show to be useful for obtaining simpler and improved algorithms for several graph problems. Our results include improved algorithms for minimum spanning trees, breadthfirst search, and singlesource shortest paths. The descriptions of these algorithms are greatly simplified by their use of welldefined I/Oefficient data structures with good amortized performance bounds. We expect that I/Oefficient data structures such as these will be a useful tool for the design of I/Oefficient algorithms. 1. Introduction 1.1. Background and model The study of I/Oefficient algorithms has been receiving increased attention as increases in pro...
Towards a theory of cacheefficient algorithms
 PROCEEDINGS OF THE SYMPOSIUM ON DISCRETE
, 2000
"... We present a model that enables us to analyze the running time of an algorithm on a computer with a memory hierarchy with limited associativity, in terms of various cache parameters. Our cache model, an extension of Aggarwal and Vitter’s I/O model, enables us to establish useful relationships betw ..."
Abstract

Cited by 47 (3 self)
 Add to MetaCart
We present a model that enables us to analyze the running time of an algorithm on a computer with a memory hierarchy with limited associativity, in terms of various cache parameters. Our cache model, an extension of Aggarwal and Vitter’s I/O model, enables us to establish useful relationships between the cache complexity and the I/O complexity of computations. As a corollary, we obtain cacheefficient algorithms in the singlelevel cache model for fundamental problems like sorting, FFT, and an important subclass of permutations. We also analyze the averagecase cache behavior of mergesort, show that ignoring associativity concerns could lead to inferior performance, and present supporting experimental evidence. We further extend our model to multiple levels of cache with limited associativity and present optimal algorithms for matrix transpose and sorting. Our techniques may be used for systematic
Externalmemory breadthfirst search with sublinear I/O
 IN PROCEEDINGS OF THE 10TH ANNUAL EUROPEAN SYMPOSIUM ON ALGORITHMS
, 2002
"... Breadthfirst search (BFS) is a basic graph exploration technique. We give the first external memory algorithm for sparse undirected graphs with sublinear I/O. The best previous algorithm requires \Theta (n + n+mD\Delta B \Delta logM=B n+mB) I/Os on a graph with n nodes and m edges and a machine w ..."
Abstract

Cited by 47 (13 self)
 Add to MetaCart
Breadthfirst search (BFS) is a basic graph exploration technique. We give the first external memory algorithm for sparse undirected graphs with sublinear I/O. The best previous algorithm requires \Theta (n + n+mD\Delta B \Delta logM=B n+mB) I/Os on a graph with n nodes and m edges and a machine with mainmemory of size M, D parallel disks, and block size B. We present two versions of a new algorithm which requires only O i (p 1D\Delta B + p nm) \Delta n+mpD\Delta B \Delta logM=B n+mB
Efficient External Memory Algorithms by Simulating CoarseGrained Parallel Algorithms
, 2003
"... External memory (EM) algorithms are designed for largescale computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation. In the past, several attempts have been made to ..."
Abstract

Cited by 41 (10 self)
 Add to MetaCart
External memory (EM) algorithms are designed for largescale computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation. In the past, several attempts have been made to relate the large body of work on parallel algorithms to EM, but with limited success. The combination of EM computing, on multiple disks, with multiprocessor parallelism has been posted as a challenge by the ACMWorking Group on Storage I/O for LargeScale Computing.
STXXL: Standard template library for XXL data sets
 In: Proc. of ESA 2005. Volume 3669 of LNCS
, 2005
"... for processing huge data sets that can fit only on hard disks. It supports parallel disks, overlapping between disk I/O and computation and it is the first I/Oefficient algorithm library that supports the pipelining technique that can save more than half of the I/Os. STXXL has been applied both in ..."
Abstract

Cited by 38 (5 self)
 Add to MetaCart
for processing huge data sets that can fit only on hard disks. It supports parallel disks, overlapping between disk I/O and computation and it is the first I/Oefficient algorithm library that supports the pipelining technique that can save more than half of the I/Os. STXXL has been applied both in academic and industrial environments for a range of problems including text processing, graph algorithms, computational geometry, gaussian elimination, visualization, and analysis of microscopic images, differential cryptographic analysis, etc. The performance of STXXL and its applications is evaluated on synthetic and realworld inputs. We present the design of the library, how its performance features are supported, and demonstrate how the library integrates with STL. KEY WORDS: very large data sets; software library; C++ standard template library; algorithm engineering 1.
CacheEfficient Matrix Transposition
"... We investigate the memory system performance of several algorithms for transposing an N N matrix inplace, where N is large. Specifically, we investigate the relative contributions of the data cache, the translation lookaside buffer, register tiling, and the array layout function to the overall runn ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
We investigate the memory system performance of several algorithms for transposing an N N matrix inplace, where N is large. Specifically, we investigate the relative contributions of the data cache, the translation lookaside buffer, register tiling, and the array layout function to the overall running time of the algorithms. We use various memory models to capture and analyze the effect of various facets of cache memory architecture that guide the choice of a particular algorithm, and attempt to experimentally validate the predictions of the model. Our major conclusions are as follows: limited associativity in the mapping from main memory addresses to cache sets can significantly degrade running time; the limited number of TLB entries can easily lead to thrashing; the fanciest optimal algorithms are not competitive on real machines even at fairly large problem sizes unless cache miss penalties are quite high; lowlevel performance tuning “hacks”, such as register tiling and array alignment, can significantly distort the effects of improved algorithms; and hierarchical nonlinear layouts are inherently superior to the standard canonical layouts (such as row or columnmajor) for
this problem.
Largescale directed model checking LTL
 In Model Checking Software (SPIN
, 2006
"... Abstract. To analyze larger models for explicitstate model checking, directed model checking applies errorguided search, external model checking uses secondary storage media, and distributed model checking exploits parallel exploration on multiple processors. In this paper we propose an external, ..."
Abstract

Cited by 22 (8 self)
 Add to MetaCart
Abstract. To analyze larger models for explicitstate model checking, directed model checking applies errorguided search, external model checking uses secondary storage media, and distributed model checking exploits parallel exploration on multiple processors. In this paper we propose an external, distributed and directed onthefly model checking algorithm to check general LTL properties in the model checker SPIN. Previous attempts restricted to checking safety properties. The worstcase I/O complexity is bounded by O(sort(FR)/p + l · scan(FS)), where S and R are the sets of visited states and transitions in the synchronized product of the Büchi automata for the model and the property specification, F is the number of accepting states, l is the length of the shortest counterexample, and p is the number of processors. The algorithm we propose returns minimal lassoshaped counterexamples and includes refinements for propertydriven exploration. 1
External A*
 IN GERMAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (KI
, 2004
"... In this paper we study External A*, a variant of the conventional (internal) A* algorithm that makes use of external memory, e.g., a hard disk. The approach applies to implicit, undirected, unweighted state space problem graphs with consistent estimates. It combines all three aspects of bestfirs ..."
Abstract

Cited by 19 (8 self)
 Add to MetaCart
In this paper we study External A*, a variant of the conventional (internal) A* algorithm that makes use of external memory, e.g., a hard disk. The approach applies to implicit, undirected, unweighted state space problem graphs with consistent estimates. It combines all three aspects of bestfirst search, frontier search and delayed duplicate detection and can still operate on very small internal memory. The complexity of the external algorithm is almost linear in external sorting time and accumulates to O(sort(E) + scan(V I/O operations, where V and E are the set of nodes and edges in the explored portion of the state space graph. Given that delayed duplicate elimination has to be performed, the established bound is I/O optimal. In contrast to the internal algorithm, we exploit memory locality to allow blockwise rather than random access. The algorithmic design refers to external shortest path search in explicit graphs and extends the strategy of delayed duplicate detection recently suggested for breadthfirst search to bestfirst search. We conduct experiments with slidingtile puzzle instances.
A computational study of externalmemory BFS algorithms
 In SODA
, 2006
"... Breadth First Search (BFS) traversal is an archetype for many important graph problems. However, computing a BFS level decomposition for massive graphs was considered nonviable so far, because of the large number of I/Os it incurs. This paper presents the first experimental evaluation of recent exte ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
Breadth First Search (BFS) traversal is an archetype for many important graph problems. However, computing a BFS level decomposition for massive graphs was considered nonviable so far, because of the large number of I/Os it incurs. This paper presents the first experimental evaluation of recent externalmemory BFS algorithms for general graphs. With our STXXL based implementations exploiting pipelining and diskparallelism, we were able to compute the BFS level decomposition of a webcrawl based graph of around 130 million nodes and 1.4 billion edges in less than 4 hours using single disk and 2.3 hours using 4 disks. We demonstrate that some rather simple externalmemory algorithms perform significantly better (minutes as compared to hours) than internalmemory BFS, even if more than half of the input resides internally. 1