Results 1  10
of
32
External Memory Algorithms and Data Structures
, 1998
"... Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we surve ..."
Abstract

Cited by 360 (23 self)
 Add to MetaCart
(Show Context)
Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we survey the state of the art in the design and analysis of external memory algorithms and data structures (which are sometimes referred to as "EM" or "I/O" or "outofcore" algorithms and data structures). EM algorithms and data structures are often designed and analyzed using the parallel disk model (PDM). The three machineindependent measures of performance in PDM are the number of I/O operations, the CPU time, and the amount of disk space. PDM allows for multiple disks (or disk arrays) and parallel CPUs, and it can be generalized to handle tertiary storage and hierarchical memory. We discuss several important paradigms for how to solve batched and online problems efficiently in external memory. Programming tools and environments are available for simplifying the programming task. The TPIE system (Transparent Parallel I/O programming Environment) is both easy to use and efficient in terms of execution speed. We report on some experiments using TPIE in the domain of spatial databases. The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.
GraphChi: Largescale Graph Computation On just a PC
 In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, OSDI’12
, 2012
"... Current systems for graph computation require a distributed computing cluster to handle very large realworld problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains c ..."
Abstract

Cited by 109 (6 self)
 Add to MetaCart
(Show Context)
Current systems for graph computation require a distributed computing cluster to handle very large realworld problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains challenging, especially to nonexperts. In this work, we present GraphChi, a diskbased system for computing efficiently on graphs with billions of edges. By using a wellknown method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumerlevel computer. We further extend GraphChi to support graphs that evolve over time, and demonstrate that, on a single computer, GraphChi can process over one hundred thousand graph updates per second, while simultaneously performing computation. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives. By repeating experiments reported for existing distributed systems, we show that, with only fraction of the resources, GraphChi can solve the same problems in very reasonable time. Our work makes largescale graph computation available to anyone with a modern PC. 1
Externalmemory breadthfirst search with sublinear I/O
 IN PROCEEDINGS OF THE 10TH ANNUAL EUROPEAN SYMPOSIUM ON ALGORITHMS
, 2002
"... Breadthfirst search (BFS) is a basic graph exploration technique. We give the first external memory algorithm for sparse undirected graphs with sublinear I/O. The best previous algorithm requires \Theta (n + n+mD\Delta B \Delta logM=B n+mB) I/Os on a graph with n nodes and m edges and a machine w ..."
Abstract

Cited by 57 (14 self)
 Add to MetaCart
(Show Context)
Breadthfirst search (BFS) is a basic graph exploration technique. We give the first external memory algorithm for sparse undirected graphs with sublinear I/O. The best previous algorithm requires \Theta (n + n+mD\Delta B \Delta logM=B n+mB) I/Os on a graph with n nodes and m edges and a machine with mainmemory of size M, D parallel disks, and block size B. We present two versions of a new algorithm which requires only O i (p 1D\Delta B + p nm) \Delta n+mpD\Delta B \Delta logM=B n+mB
A computational study of externalmemory BFS algorithms
 In SODA
, 2006
"... Breadth First Search (BFS) traversal is an archetype for many important graph problems. However, computing a BFS level decomposition for massive graphs was considered nonviable so far, because of the large number of I/Os it incurs. This paper presents the first experimental evaluation of recent exte ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
(Show Context)
Breadth First Search (BFS) traversal is an archetype for many important graph problems. However, computing a BFS level decomposition for massive graphs was considered nonviable so far, because of the large number of I/Os it incurs. This paper presents the first experimental evaluation of recent externalmemory BFS algorithms for general graphs. With our STXXL based implementations exploiting pipelining and diskparallelism, we were able to compute the BFS level decomposition of a webcrawl based graph of around 130 million nodes and 1.4 billion edges in less than 4 hours using single disk and 2.3 hours using 4 disks. We demonstrate that some rather simple externalmemory algorithms perform significantly better (minutes as compared to hours) than internalmemory BFS, even if more than half of the input resides internally. 1
Neighborhood based fast graph search in large networks
 in SIGMOD
, 2011
"... Complex social and information network search becomes important with a variety of applications. In the core of these applications, lies a common and critical problem: Given a labeled network and a query graph, how to efficiently search the query graph in the target network. The presence of noise a ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
(Show Context)
Complex social and information network search becomes important with a variety of applications. In the core of these applications, lies a common and critical problem: Given a labeled network and a query graph, how to efficiently search the query graph in the target network. The presence of noise and the incomplete knowledge about the structure and content of the target network make it unrealistic to find an exact match. Rather, it is more appealing to find the topk approximate matches. In this paper, we propose a neighborhoodbased similarity measure that could avoid costly graph isomorphism and edit distance computation. Under this new measure, we prove that subgraph similarity search is NP hard, while graph similarity match is polynomial. By studying the principles behind this measure, we found an information propagation model that is able to convert a large net
CacheOblivious Data Structures and Algorithms for Undirected BreadthFirst Search and Shortest Paths
 IN PROCEEDINGS OF THE 9TH SCANDINAVIAN WORKSHOP ON ALGORITHM THEORY
, 2004
"... We present improved cacheoblivious data structures and algorithms for breadthfirst search and the singlesource shortest path problem on undirected graphs with nonnegative edge weights. Our results close the performance gap between the currently best cacheaware algorithms for these problems and ..."
Abstract

Cited by 22 (8 self)
 Add to MetaCart
We present improved cacheoblivious data structures and algorithms for breadthfirst search and the singlesource shortest path problem on undirected graphs with nonnegative edge weights. Our results close the performance gap between the currently best cacheaware algorithms for these problems and their cacheoblivious counterparts. Our shortestpath algorithm relies on a new data structure, called bucket heap, which is the first cacheoblivious priority queue to efficiently support a weak DecreaseKey operation.
External memory bfs on undirected graphs with bounded degree
 In Proceedings of SODA’2001
, 2001
"... We give the first external memory algorithm for breadthfirst search (BFS) which achieves o(n) I/Os on arbitrary undirected graphs with n nodes and maximum node degree d. Let M and B> d denote the main memory size and block size, respectively. Using Sort(x) = O( ~.IOgM/B ~), our algorithm needs ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
(Show Context)
We give the first external memory algorithm for breadthfirst search (BFS) which achieves o(n) I/Os on arbitrary undirected graphs with n nodes and maximum node degree d. Let M and B> d denote the main memory size and block size, respectively. Using Sort(x) = O( ~.IOgM/B ~), our algorithm needs O(~.1o ~ B "b Sort(n. BY)) I/Os and O(n. B Y) external space for an arbitrary parameter 0 </_< 1/2. The result carries over to BFS, depthfirst search (DFS) and single source shortest paths (SSSP) on undirected planar graphs with arbitrary node degrees. 1 Introduct ion We use the standard I/O model of [1], which counts accesses to a disk of potentially infinite size using the parameters M for the memory size and B for the block size where B < M/2. Let Sort(x) = O( ~.lOgM/B ~) denote the number of I/Os needed to sort x items, and Scan(x) = [~] the number of I/Os required to transfer x items between contiguous disk positions and internal memory. Given a graph G with n nodes and m edges the model applies when M < n < m. The best known external memory algorithms for breadthfirst search (BFS), depthfirst search (DFS) and singlesource shortest paths (SSSP) on general undirected graphs still require f~(n) I/Os, even if the graphs are planar and/or have bounded node degrees: O(n+~.Sor t (n) ) I/Os for BFS [4], O(n+~log ~) I/Os for SSSP [3], and O(min{~~. ~ +n, (n+~). log ~}) I/Os for DFS [6]. Better algorithms are known for special graph classes, see [6] for an overview. Furthermore, there is an O(Sort(n)) I /O algorithm for SSSP on undirected planar graphs G with bounded degree [2]. However, it requires a BFStree for G as part of the input. New Results. We show how a modification of the BFS algorithm of Munagala and Ranade [4] can take advantage of a redundant graph representation. For arbitrary undirected graphs with maximum node degree d < B we obtain an O(~' * + Sort(n • B~)) I /O~P lanck Ins t i tu t f/Jr Informatik, Stuhlsatzenausweg 85,
I/Oefficient undirected shortest paths
 In Proc. 11th Annual European Symposium on Algorithms, volume 2832 of LNCS
, 2003
"... Abstract. We show how to compute singlesource shortest paths in undirected graphs with nonnegative edge lengths in O ( p nm/B log n + MST (n, m)) I/Os, where n is the number of vertices, m is the number of edges, B is the disk block size, and MST (n, m) is the I/Ocost of computing a minimum spann ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
(Show Context)
Abstract. We show how to compute singlesource shortest paths in undirected graphs with nonnegative edge lengths in O ( p nm/B log n + MST (n, m)) I/Os, where n is the number of vertices, m is the number of edges, B is the disk block size, and MST (n, m) is the I/Ocost of computing a minimum spanning tree. For sparse graphs, the new algorithm performs O((n / √ B) log n) I/Os. This result removes our previous algorithm’s dependence on the edge lengths in the graph. 1
Cacheoblivious algorithms and data structures
 IN SWAT
, 2004
"... Frigo, Leiserson, Prokop and Ramachandran in 1999 introduced the idealcache model as a formal model of computation for developing algorithms in environments with multiple levels of caching, and coined the terminology of cacheoblivious algorithms. Cacheoblivious algorithms are described as stand ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Frigo, Leiserson, Prokop and Ramachandran in 1999 introduced the idealcache model as a formal model of computation for developing algorithms in environments with multiple levels of caching, and coined the terminology of cacheoblivious algorithms. Cacheoblivious algorithms are described as standard RAM algorithms with only one memory level, i.e. without any knowledge about memory hierarchies, but are analyzed in the twolevel I/O model of Aggarwal and Vitter for an arbitrary memory and block size and an optimal offline cache replacement strategy. The result are algorithms that automatically apply to multilevel memory hierarchies. This paper gives an overview of the results achieved on cacheoblivious algorithms and data structures since the seminal paper by Frigo et al.