#### DMCA

## GraphChi: Large-scale Graph Computation On just a PC (2012)

### Cached

### Download Links

Venue: | In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, OSDI’12 |

Citations: | 115 - 6 self |

### Citations

3438 | MapReduce: Simplified data processing on large clusters
- Dean, Ghemawat
- 2004
(Show Context)
Citation Context ...eraction graphs are particularly challenging to handle, because they cannot be readily decomposed into small parts that could be processed in parallel. This lack of data-parallelism renders MapReduce =-=[20]-=- inefficient for computing on such graphs, as has been argued by many researchers (for example, [14, 31, 33]). Consequently, in recent years several graph-based abstractions have been proposed, most n... |

3325 |
Collective dynamics of ’small-world’ networks
- Watts, Strogat
- 1998
(Show Context)
Citation Context ...tively. The goal of Triangle Counting is to count the number of edge triangles incident to each vertex. This problem is used in social network analysis for analyzing the graph connectivity properties =-=[46]-=-. Triangle Counting requires computing intersections of the adjacency lists of neighboring vertices. To do this efficiently, we first created a graph with vertices sorted by their degree (using a modi... |

3269 | The pagerank citation ranking: Bringing order to the web
- Page, Brin, et al.
- 1999
(Show Context)
Citation Context ...ore all vertex values in memory. We now discuss the programming model in detail, with a running example. Running Example: As a running example, we use a simple GraphChi implementation of the PageRank =-=[36]-=- algorithm. The vertex update-function is simple: at each update, compute a weighted sum of the ranks of in-neighbors (vertices with an edge directed to the vertex). Incomplete pseudo-code is shown in... |

2387 | A note on two problems in connexion with graphs
- Dijkstra
- 1959
(Show Context)
Citation Context ...a single source vertex and all the other vertices by computing a shortest-path tree 15 rooted at the source vertex. For in-memory graphs, Breadth-First Search (BFS) or the famous Dijkstra’s algorithm =-=[45]-=- can be used to compute the shortest-path tree efficiently. BFS can be naturally implemented in the vertex-centric model as well. Approximate algorithms for SSSP have also been recently proposed by ma... |

1668 | On power-law relationships of the internet topology.
- Faloutsos, Faloutsos, et al.
- 1999
(Show Context)
Citation Context ... is a vertex in the graph and there is an edge (u, v) from web page u to v if there is a hyperlink from page u to v. The in-degree distribution is highly skewed and follows the power-law distribution =-=[49]-=-. Search engines such as Google [58] analyze this graph structure to compute a ranking of web pages [53, 112]. 3. Communication relations, such as internet traffic between hosts or phone calls between... |

1373 |
A bridging model for parallel computation,
- Valiant
- 1990
(Show Context)
Citation Context ...ains of graph mining, data mining, machine learning, and sparse linear algebra. Most existing frameworks execute update functions in lock-step, and implement the Bulk-Synchronous Parallel (BSP) model =-=[45]-=-, which defines that update-functions can only observe values from the previous iteration. BSP is often preferred in distributed systems as it is simple to implement, and allows maximum level of paral... |

1000 |
Tsitsiklis, Parallel and Distributed Computation,
- Bertsekas, N
- 1989
(Show Context)
Citation Context ...prisingly, unlike most distributed frameworks, PSW naturally implements the asynchronous model of computation, which has been shown to be more efficient than synchronous computation for many purposes =-=[7, 32]-=-. We further extend our method to graphs that are continu-ously evolving. This setting was recently studied by Cheng et. al., who proposed Kineograph [16], a distributed system for processing a conti... |

989 | What is twitter, a social network or a news media
- KWAK, LEE, et al.
- 2010
(Show Context)
Citation Context ...ocessors, 64GB of RAM, running Linux (“AMD Server”). Graph name Vertices Edges P Preproc. live-journal [3] 4.8M 69M 3 0.5 min netflix [6] 0.5M 99M 20 1 min domain [47] 26M 0.37B 20 2 min twitter-2010 =-=[28]-=- 42M 1.5B 20 10 min uk-2007-05 [12] 106M 3.7B 40 31 min uk-union [12] 133M 5.4B 50 33 min yahoo-web [47] 1.4B 6.6B 50 37 min Table 1: Experiment graphs. Preprocessing (conversion to shards) was done o... |

915 |
Probabilistic Graphical Models: Principles and Techniques
- Koller, Friedman
- 2009
(Show Context)
Citation Context ...hen to find the most likely values of the pixels, thus improving the picture quality. Exact inference in graphical models is generally intractable (with the exception of models with a tree structure) =-=[77]-=-. Instead, approximate iterative methods are used: the most popular algorithms are Loopy Belief Propagation (BP) and Gibbs Sampling. Efficient parallel and distributed Belief Propagation for very larg... |

852 | A Case for Redundant Arrays of Inexpensive Disks (RAID
- Patterson, Gibson, et al.
- 1988
(Show Context)
Citation Context ...s test, we modified the I/O-layer of GraphChi to stripe files across disks. We installed three 2TB disks into the AMD server and used stripe-size of 10 MB. Our solution is similar to the RAID level 0 =-=[37]-=-. At best, we could get a total of 2x speedup with three drives. Figure 8b shows the effect of block size on performance of GraphChi on SSDs and HDs. With very small blocks, the observed that OS overh... |

762 | Dryad: distributed data-parallel programs from sequential building blocks
- Isard, Budiu, et al.
- 2007
(Show Context)
Citation Context ...Stream can be faster for some types of computation. It is worth noting that similar local computation models have been proposed much earlier in the context of parallel computation. Systems like Dryad =-=[68]-=- define a Directed Acyclic Graph (DAG) that configures the flow of computation. The systolic abstraction [80] defines an iterative computation on a directed graph where each vertex represents a proces... |

653 | The ubiquitous B-tree
- Comer
- 1979
(Show Context)
Citation Context ...: Indices created over the edges often take more space than the edges themselves: for example, using an InnoDB table in MySQL [106], storing the edge tuples takes just 9 bytes per edge but the B-tree =-=[39]-=- primary key index over src or dst IDs takes 20B / edge (on a graph with 1.5B edges). Updating the indices when edges are inserted or removed can be costly. Using double linked lists avoids the proble... |

632 | Multilevel k-way partitioning scheme for irregular graphs
- KARYPIS, KUMAR, et al.
- 1998
(Show Context)
Citation Context ...mpossible, if a graph is supplied without metadata required to efficiently cluster it. General graph partitioners are not currently an option, since even the state-of-the-art graph partitioner, METIS =-=[27]-=-, requires hundreds of gigabytes of memory to work with graphs of billions of edges. Graph compression. Compact representation of realworld graphs is a well-studied problem, the best algorithms can st... |

496 | Group formation in large social networks: Membership, growth, and evolution.
- Backstrom, Huttenlocher, et al.
- 2006
(Show Context)
Citation Context ...xperiments with multiple hard drives we used an older 8-core server with four AMD Opteron 8384 processors, 64GB of RAM, running Linux (“AMD Server”). Graph name Vertices Edges P Preproc. live-journal =-=[3]-=- 4.8M 69M 3 0.5 min netflix [6] 0.5M 99M 20 1 min domain [47] 26M 0.37B 20 2 min twitter-2010 [28] 42M 1.5B 20 10 min uk-2007-05 [12] 106M 3.7B 40 31 min uk-union [12] 133M 5.4B 50 33 min yahoo-web [4... |

496 | Pregel: A system for large-scale graph processing
- Malewicz, Austern, et al.
- 2010
(Show Context)
Citation Context ...to small parts that could be processed in parallel. This lack of data-parallelism renders MapReduce [20] inefficient for computing on such graphs, as has been argued by many researchers (for example, =-=[14, 31, 33]-=-). Consequently, in recent years several graph-based abstractions have been proposed, most notably Pregel [33] and GraphLab [31]. Both use a vertex-centric computation model, in which the user defines... |

435 | A Distributed Algorithm for Minimum Weight Spanning Trees,
- Gallager, Humblet, et al.
- 1983
(Show Context)
Citation Context ...ion (Boruvska’s algorithm). We will discuss our implementation of Boruvska’s algorithm in Chapter 6. A distributed algorithm to compute the MSR, based on message passing, was already proposed in 1983 =-=[51]-=-. Graph Analytics Graph analytics, such as the analysis of social networks, is a very active field of research. • Community detection algorithms attempt to detect groups of nodes, called “communities”... |

423 |
Codeword Sets and Representations of the Integers”,
- Elias
- 1975
(Show Context)
Citation Context ...er sequences. We can then encode the differences between subsequent values (a technique common in index compression), which are small on average. In our implementation, we used the Elias-Gamma coding =-=[46]-=-, which typically compresses the pointer-array to only a fraction of the original size, allowing us to permanently pin the index to memory and so avoid disk access completely. The second approach is t... |

415 | Inferring Web Communities from Link Topology
- Gibson, Kleinberg, et al.
- 1998
(Show Context)
Citation Context ...page u to v. The in-degree distribution is highly skewed and follows the power-law distribution [49]. Search engines such as Google [58] analyze this graph structure to compute a ranking of web pages =-=[53, 112]-=-. 3. Communication relations, such as internet traffic between hosts or phone calls between people can also be represented as graphs. This kind of data is of particular interest to intelligence agenci... |

409 | Scaling personalized web search
- Jeh, Widom
- 2003
(Show Context)
Citation Context ...multiplications of an initial ranking vector by the transition matrix (the power method). However, for computing all PPR vectors [112], the direct methods are too expensive, with complexity of O(V 2) =-=[70]-=-. Fortunately, Fogaras et. al. [50] proved that by simulating a modest number of short random walk segments from each node, we can efficiently obtain a good approximation of the PPR vector for each us... |

272 | The netflix prize.
- Bennett, Lanning
- 2007
(Show Context)
Citation Context ...matrices: R ≈ U × V ′ . We implemented the Alternating Least Squares (ALS) algorithm [49], by adapting a GraphLab implementation [31]. We used ALS to solve the Netflix movie rating prediction problem =-=[6]-=-: in this model, the graph is bipartite, with each user and movie represented by a vertex, connected by an edge storing the rating (edges correspond to the non-zeros of matrix R). The algorithm comput... |

268 | The WebGraph framework I: Compression techniques
- Boldi, Vigna
- 2004
(Show Context)
Citation Context ...ry to work with graphs of billions of edges. Graph compression. Compact representation of realworld graphs is a well-studied problem, the best algorithms can store web-graphs in only 4 bits/edge (see =-=[9, 13, 18, 25]-=-). Unfortunately, while the graph structure can often be compressed and stored in memory, we also associate data with each of the edges and vertices, which can take significantly more space than the g... |

211 | Spark: cluster computing with working sets
- Zaharia, Chowdhury, et al.
- 2010
(Show Context)
Citation Context ...del, in which the user defines a program that is executed locally for each vertex in parallel. In addition, high-performance systems that are based on key-value tables, such as Piccolo [40] and Spark =-=[48]-=-, can efficiently represent many graph-parallel algorithms. Current graph systems are able to scale to graphs of billions of edges by distributing the computation. However, while distributed compution... |

208 | Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters.
- Leskovec, Lang, et al.
- 2009
(Show Context)
Citation Context ...works, one is faced with the challenge of partitioning the graph across cluster nodes. Finding efficient graph cuts that minimize communication between nodes, and are also balanced, is a hard problem =-=[29]-=-. More generally, distributed systems and their users must deal with managing a cluster, fault tolerance, and often unpredictable performance. From the perspective of programmers, debugging and optimi... |

195 | Learning from labeled and unlabeled data with label propagation. Carnegie Mellon Univ.,
- Zhu, Ghahramani
- 2002
(Show Context)
Citation Context ... Graph Mining: We implemented three algorithms for analyzing graph structure: Connected Components, Community Detection, and Triangle Counting. The first two algorithms are based on label propagation =-=[50]-=-. On first iteration, each vertex writes its id (“label”) to its edges. On subsequent iterations, vertex chooses a new label based on the labels of its neighbors. For Connected Components, vertex choo... |

186 | Data-Oblivious Graph Algorithms in
- Goodrich, Simons
- 2014
(Show Context)
Citation Context ...d for general large-scale graph computation and has lower memory requirements. A collection of I/O efficient fundamental graph algorithms in the external memory setting was proposed by Chiang et. al. =-=[17]-=-. Their method is based on simulating parallel PRAM algorithms, and requires a series of disk sorts, and would not be efficient for the types of algorithms we consider. For example, the solution to co... |

170 | Reverend bayes on inference engines: A distributed hierarchical approach
- Pearl
- 1982
(Show Context)
Citation Context .... Edges connect related variables and store a factor encoding the dependencies. Exact inference on such models is intractable, so approximate methods are required in practice. Belief Propagation (BP) =-=[39]-=-, is a powerful method based on iterative message passing between vertices. The goal here is to estimate the probabilities of variables (“beliefs”). For this work, we adapted a special BP algorithm pr... |

153 | Efficient Computation of PageRank.
- Haveliwala
- 1999
(Show Context)
Citation Context ...which stores all edges that have destination vertex in that interval. into memory. Similar data layout for sparse graphs was used previously, for example, to implement I/O efficient Pagerank and SpMV =-=[5, 22]-=-. PSW does graph computation in execution intervals, by processing vertices one interval at a time. To create the subgraph for the vertices in interval p, their edges (with their associated values) mu... |

149 | The Buffer Tree: A New Technique for Optimal I/O-Algorithms
- Arge
- 1995
(Show Context)
Citation Context ...We now modify the PSW model to support changes in the graph structure. Particularly, we allow adding edges to the graph efficiently, by implementing a simplified version of I/O efficient buffer trees =-=[2]-=-. Because a shard stores edges sorted by the source, we can divide the shard into P logical parts: part j contains edges with source in the interval j. We associate an inmemory edge-buffer(p, j) for e... |

147 | Supervised Random Walks: Predicting and Recommending Links in Social Networks
- Backstrom, Leskovec
- 2011
(Show Context)
Citation Context .... Recently [60] describe an extensions to this technique used at the microblogging service Twitter (http: //www.twitter.com). We implement their method in our case study (Section 5.5). The authors of =-=[10]-=- propose a method to learn the edge weights in a graph to bias random walks so that the random walker is more likely to visit nodes that are more likely to receive new links in the future. Our works a... |

128 | Powergraph: Distributed graph-parallel computation on natural graphs
- Gonzalez, Low, et al.
- 2012
(Show Context)
Citation Context ...a graph with 106 mil. vertices and 1.9B edges in 40 minutes. Unfortunately, we were unable to repeat their experiment due to unavailability of the graph. Finally, we include comparisons to PowerGraph =-=[21]-=-, which was published simultaneously with this work (PowerGraph and GraphChi are projects of the same research team). PowerGraph is a distributed version of GraphLab [32], which employs a novel vertex... |

127 | Pegasus: A peta-scale graph mining system - implementation and observations
- KANG, TSOURAKAKIS, et al.
(Show Context)
Citation Context ...gerank synchronously, while GraphChi uses asynchronous computation, with relatively faster convergence [7]. GraphChi is able to solve the WebGraph-BP on yahooweb in 25 mins, almost as fast as Pegasus =-=[26]-=-, a Hadoopbased 7 graph mining library, distributed over 100 nodes (Yahoo M-45). GraphChi counts the triangles of the twitter2010 graph in less then 90 minutes, while a Hadoop-based algorithm uses ove... |

112 | Survey of graph database models
- Angles, Gutierrez
(Show Context)
Citation Context ...iguration parameters. Thus, for any specific workload, the results might vary significantly. 4.8 Additional Related work Graph databases have been studied for at least three decades: for a survey see =-=[6]-=-. Early work on graph databases discussed mostly the modeling questions of graph database design, and used a relational or key-value database to implement graph storage. Perhaps the best examples of m... |

110 | Residual belief propagation: Informed scheduling for asynchronous message passing
- Elidan, McGraw, et al.
- 2006
(Show Context)
Citation Context ...mance. This could also enable faster execution of computations where the computation focuses on some parts of the graph more than others, as in priority-scheduled loopy belief propagation computation =-=[47, 54]-=-. 4. Can we improve the partitioning of the edges (in contrast to the simple ID-based intervals) if we can store more information of the graph in-memory? 8.1.2 Distributed Setting The processing capac... |

109 |
J.M.: Graphlab: A new parallel framework for machine learning
- Low, Gonzalez, et al.
- 2010
(Show Context)
Citation Context ...aphs as GraphChi on a single computer (with reasonable performance). To get flavor of the performance of GraphChi, we compare it to several existing distributed systems and the shared-memory GraphLab =-=[32]-=-, based mostly on results we found from recent literature 5 . Our comparisons are listed in Table 2. Although disk-based, GraphChi runs three iterations of Pagerank on the domain graph in 132 seconds,... |

104 | A functional approach to external graph algorithms
- Abello, Buchsbaum, et al.
- 1998
(Show Context)
Citation Context ...sponse to a change in a few of its neighbors. The gather operation is then repeatedly invoked on all neighbors, many of which remain unchanged, thereby wasting computation cycles. For many algorithms =-=[2]-=- it is possible to dynamically maintain the result of the gather phase au and skip the gather on subsequent iterations. The PowerGraph engine maintains a cache of the accumulator au from the previous ... |

104 | Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments.
- Fogaras, Racz, et al.
- 2005
(Show Context)
Citation Context ...cted fraction of short walks longer than k is (1− d)k. For example, with the usual choice of d = 0.15, the expected length is approximately 6.7 hops. For a more detailed analysis, see Fogaras et. al. =-=[50]-=-. It therefore appears that a large number of short walks could approximate the PageRank and related models well. In the context of recommender systems, it may make sense to bias towards shorter walks... |

96 | Large-scale parallel collaborative filtering for the netflix prize
- Zhou, Wilkinson, et al.
(Show Context)
Citation Context ...-rank matrix factorization. The basic idea is to approximate a large sparse matrix R by the product of two smaller matrices: R ≈ U × V ′ . We implemented the Alternating Least Squares (ALS) algorithm =-=[46]-=-, by adapting a GraphLab implementation [30]. We used ALS to solve the Netflix movie rating prediction problem [6]: in this model, the graph is bipartite, with each user and movie represented by a ver... |

86 | Piccolo: Building fast, distributed programs with partitioned tables.
- Power, Li
- 2010
(Show Context)
Citation Context ... computation model, in which the user defines a program that is executed locally for each vertex in parallel. In addition, high-performance systems that are based on key-value tables, such as Piccolo =-=[40]-=- and Spark [48], can efficiently represent many graph-parallel algorithms. Current graph systems are able to scale to graphs of billions of edges by distributing the computation. However, while distri... |

83 | On compressing social networks.
- Chierichetti, Kumar, et al.
- 2009
(Show Context)
Citation Context ...ry to work with graphs of billions of edges. Graph compression. Compact representation of realworld graphs is a well-studied problem, the best algorithms can store web-graphs in only 4 bits/edge (see =-=[9, 13, 18, 25]-=-). Unfortunately, while the graph structure can often be compressed and stored in memory, we also associate data with each of the edges and vertices, which can take significantly more space than the g... |

74 |
Graph-Based Technologies for Intelligence Analysis,"
- Coffman, Greenblatt, et al.
- 2004
(Show Context)
Citation Context ... Social networks can also be inferred from analysis of phone logs or patterns of internet connectivity. Not surprisingly, governmental intelligence agencies have invested heavily into graph analytics =-=[38, 59]-=-, which has recently provoked an intense public debate. In addition to data that is explicitly represented as a graph, many problems in machine learning and data mining can be represented as graph pro... |

72 | Counting triangles and the curse of the last reducer.
- Suri, Vassilvitskii
- 2011
(Show Context)
Citation Context ...odes (Yahoo M-45). GraphChi counts the triangles of the twitter2010 graph in less then 90 minutes, while a Hadoop-based algorithm uses over 1,600 workers to solve the same problem in over 400 minutes =-=[43]-=-. These results highlight the inefficiency of MapReduce for graph problems. Recently, Chu et al. proposed an I/O efficient algorithm for triangle counting [19]. Their method can list the triangles of ... |

69 | Efficient semi-streaming algorithms for local triangle counting in massive graphs
- BECCHETTI, BOLDI, et al.
(Show Context)
Citation Context ...d to 60 minutes took by the Mac Mini. Note, that the relative performance per machine, and per CPU of GraphChi is better. Approximate triangle counting has been studied by many researchers, including =-=[17, 142]-=-. Approximate counting is often studied in a streaming graph setting where only part of the graph is available. Our algorithm is exact. B.4.3 Remarks on Semi-External Algorithms The algorithms we have... |

68 | Gps: A graph processing system.
- Salihoglu, Widom
- 2012
(Show Context)
Citation Context ...me (790 seconds), with only 2 CPUs. Note that Spark is implemented in Scala, while GraphChi is native C++ (an early Scala/Java-version of GraphChi runs 2-3x slower than the C++ version). Stanford GPS =-=[41]-=- is a new implementation of Pregel, with compelling performance. On a cluster of 30 machines, GPS can run 100 iterations of Pagerank (using random partitioning) in 144 minutes, approximately four time... |

68 | Residual splash for optimally parallelizing belief propagation
- Gonzalez, Low, et al.
- 2009
(Show Context)
Citation Context ...quires twice the amount of memory to store th computational state. Although the asynchronous, or GaussSeidel model, has been shown to accelerate c nvergence of iterative computation ( or example, see =-=[54]-=-m [21], a d Chapter 6), it is much more complex to implement and execute in parallel, as evidenced, for example, by the impressive distributed locking mechanisms engineered for the Distributed GraphLa... |

66 | Hadoop: a framework for running applications on large clusters built of commodity hardware, - Bialecki, Cafarella, et al. - 2005 |

66 | TrustWalker: A random walk model for combining trust-based and item-based recommendation.
- Jamali, Ester
- 2009
(Show Context)
Citation Context ... their work. Random walk -based models are also used in other contexts: FolkRank [66] is an adapted version of PageRank for ranking in folksonomies (graphs of users, tags, and resources). TrustWalker =-=[69]-=- improves item-based recommendation by modeling trust between users in a social network, approximated by simulating random walks on the graph. Finally, [86] proposes a ranking of entities in an entity... |

65 | Hyracks: A flexible and extensible foundation for data-intensive computing.
- Borkar, Carey, et al.
- 2011
(Show Context)
Citation Context ... been explored by several researchers. In [30], the authors implement Pregel [97] and iterative MapReduce computation using Datalog and demonstrate promising performance on Hyracks computation engine =-=[28]-=-. They do not discuss the data representation of the graph and their solution is for distributed in-memory computation, and is thus complementary to our work. Recently, compiling Datalog queries for p... |

62 | A survey of out-of-core algorithms in numerical linear algebra, in: External Memory Algorithms,
- Toledo
- 1999
(Show Context)
Citation Context ...n et. al. [15]. GraphChi and the PSW method extend this work by allowing asynchronous computation and mutation of the underlying matrix (graph), thus representing a larger set of applications. Toledo =-=[44]-=- contains a comprehensive survey of (mostly historical) algorithms for out-of-core numerical linear algebra, and discusses also methods for sparse matrices. For most external memory algorithms in lite... |

62 | Scalable inference in latent variable models.
- Ahmed, Aly, et al.
- 2012
(Show Context)
Citation Context ...lementation of the Parallel Sliding Windows algorithm, we suggest an architecture based on multiple independent GraphChi instances that synchronize their state asynchronously using a Parameter Server =-=[4, 65, 135]-=-. Conceptually, a Parameter Server is a distributed key-value service that allows nodes to asynchronously send and receive updates to values of keys. Each node caches all or a subset of the keys. Conf... |

58 | The Combinatorial BLAS: Design, implementation, and applications.
- Buluç, Gilbert
- 2011
(Show Context)
Citation Context ...d model for expressing graph computation. Unlike PEGASUS, it is not implemented on top of MapReduce, but implements its own distributed runtime for the R statistics platform [140]. Combinatorial BLAS =-=[31]-=- provides a similar programming abstraction as PEGASUS, but is designed for high-performance computing environments. Abstractions based on linear algebra are attractive because the programs are writte... |

57 |
External memory algorithms
- Vitter
- 1998
(Show Context)
Citation Context ...ltaneously running advanced graph mining algorithms. We implement the same functionality, but using only a single computer, by applying techniques developed by the I/O-efficient algorithm researchers =-=[42]-=-. We further present a complete system, GraphChi, which we used to solve a wide variety of computational problems on extremely large graphs, efficiently on a single consumergrade computer. In the evol... |

57 | STXXL: standard template library for XXL data sets. - Dementiev, Kettner, et al. - 2008 |

56 | Four degrees of separation.
- Backstrom, Boldi, et al.
- 2011
(Show Context)
Citation Context ...rds shorter walks2: on typical social networks and other natural graphs, the number of nodes in a k−hop radius increases extremely quickly as the expected distance between any two nodes is very small =-=[12]-=-, even less than the famous “six degrees of separation” result by Milgram [101]. For computing sensible recommendations, it thus makes sense to concentrate the graph exploration using random walks to ... |

52 | Streaming graph partitioning for large distributed graphs.
- Stanton, Kliot
- 2012
(Show Context)
Citation Context ...rformance was obtained for ALS matrix factorization, if vertex values are stored in-memory. Replicating the latent factors to edges increases the running time by five-fold. A recently published paper =-=[42]-=- reports that Spark [48], running on a cluster of 50 machines (100 CPUs) [48] runs 5 The results we found do not consider the time it takes to load the graph from disk, or to transfer it over a networ... |

51 |
Distributed GraphLab: A Framework for
- Low
- 2012
(Show Context)
Citation Context ...s to approximate a large sparse matrix R by the product of two smaller matrices: R ≈ U × V ′ . We implemented the Alternating Least Squares (ALS) algorithm [46], by adapting a GraphLab implementation =-=[30]-=-. We used ALS to solve the Netflix movie rating prediction problem [6]: in this model, the graph is bipartite, with each user and movie represented by a vertex, connected by an edge storing the rating... |

50 | Folkrank: A ranking algorithm for folksonomies.
- Hotho, Jaschke, et al.
- 2006
(Show Context)
Citation Context ...new links in the future. Our works allows for the biasing of nodes for random walks explicitly, and is complementary to their work. Random walk -based models are also used in other contexts: FolkRank =-=[66]-=- is an adapted version of PageRank for ranking in folksonomies (graphs of users, tags, and resources). TrustWalker [69] improves item-based recommendation by modeling trust between users in a social n... |

44 | Estimating PageRank on graph streams.
- Sarma, Gollapudi, et al.
- 2008
(Show Context)
Citation Context ...ber of walks. To our knowledge, there is no available implementation of the algorithm of [50]. A related algorithm to compute PageRank on graphs streamed from a disk was proposed by Das Sarma et. al. =-=[41]-=-, but their approach is to sample the vertices and edges of the graph and simulate a small number of hops on the sampled graph to produce a large number of short walks. Their method requires a rather ... |

42 | Compact Representations of Separable Graphs."
- Blandford, Blelloch, et al.
- 2003
(Show Context)
Citation Context ...ry to work with graphs of billions of edges. Graph compression. Compact representation of realworld graphs is a well-studied problem, the best algorithms can store web-graphs in only 4 bits/edge (see =-=[9, 13, 18, 25]-=-). Unfortunately, while the graph structure can often be compressed and stored in memory, we also associate data with each of the edges and vertices, which can take significantly more space than the g... |

40 | Parallel Gibbs sampling: From colored fields to thin junction trees
- Gonzalez, Low, et al.
- 2011
(Show Context)
Citation Context ...on converges significantly more slowly than in the asynchronous model (in some cases, the synchronous version even fails to converge) [54, 55]. Parallel Gibbs Sampling on GraphLab was investigated in =-=[56]-=-. These algorithms map naturally to the vertex-centric computation model. A specialized dynamic scheduling policy of the vertex updates, as proposed in [54, 55], can improve convergence significantly.... |

36 | Fast incremental and personalized pagerank”,
- Bahmani, Chowdhury, et al.
- 2010
(Show Context)
Citation Context ...ated scheme to stitch small walk segments and handle special cases, while our method and that of Fogaras can simulate the walks on the full graph and thus are not limited to PageRank. Bahmani et. al. =-=[14]-=- study how to efficiently update a database of random walk segments when new edges are inserted into the graph. Their method could be combined with DrunkardMob. Finally, [15] proposes a method to comp... |

35 | Direction-Optimizing Breadth-First Search. In
- Beamer, Asanovic, et al.
- 2012
(Show Context)
Citation Context ...l operator traverseOut visits all the out-edges of the vertices in the current frontier (set of vertices) and adds their destination IDs to the next frontier. We use a simple optimization proposed in =-=[16]-=-: if the current frontier is very large, to compute the next frontier, instead of issuing a (top-down) out-edge query for each of the vertices in the frontier, it can be more efficient to (bottom-up) ... |

34 | Multithreaded asynchronous graph traversal for in-memory and semi-external memory.
- Pearce, Gokhale, et al.
- 2010
(Show Context)
Citation Context ...012). While we expect the memory capacity of personal computers to grow in the future, the datasets are expected to grow quickly as well. 2.2 Standard Sparse Graph Formats The system by Pearce et al. =-=[38]-=- uses compressed sparse row (CSR) storage format to store the graph on disk, which is equivalent to storing the graph as adjacency sets: the outedges of each vertex are stored consecutively in the fil... |

34 | Learning to Extract Entities from Labeled and Unlabeled Text
- Jones
- 2005
(Show Context)
Citation Context ... his ”, “ finished”. An edge is added between a named entity vertex and a context vertex if they appear together in the corpus. The edge weight stores the number of co-occurences. The co-EM algorithm =-=[71]-=- can then be used to cluster the named entities and contexts under concept categories such as “city” and “person”. This algorithm can be naturally implemented in the vertex-centric model, as described... |

31 | Wtf: the who to follow service at twitter.
- Gupta, Goel, et al.
- 2013
(Show Context)
Citation Context ...ial network, called the link prediction problem [92]. Perhaps the most common approach is to return the top k ranked nodes of the Personalized Pagerank [112] vector for the user in question. Recently =-=[60]-=- describe an extensions to this technique used at the microblogging service Twitter (http: //www.twitter.com). We implement their method in our case study (Section 5.5). The authors of [10] propose a ... |

30 | I/o-efficient techniques for computing pagerank.
- Chen, Gan, et al.
- 2002
(Show Context)
Citation Context ...hich is then sorted (using disk-sort), and used to generate input graph for next iteration. For algorithms that modify only the vertices, not edges, such as Pagerank, a similar solution has been used =-=[15]-=-. However, it cannot be efficiently used to perform asynchronous computation. 3 Parallel Sliding Windows This section describes the Parallel Sliding Windows (PSW) method (Algorithm 2). PSW can process... |

30 | Kineograph: taking the pulse of a fast-changing and connected world
- Cheng, Hong, et al.
- 2012
(Show Context)
Citation Context ...synchronous computation for many purposes [7, 32]. We further extend our method to graphs that are continu-ously evolving. This setting was recently studied by Cheng et. al., who proposed Kineograph =-=[16]-=-, a distributed system for processing a continuous in-flow of graph updates, while simultaneously running advanced graph mining algorithms. We implement the same functionality, but using only a single... |

30 | On external-memory MST, SSSP and multi-way planar graph separation.
- Arge, Brodal, et al.
- 2004
(Show Context)
Citation Context ... and Schwabe [79] give an improved deterministic algorithm with a bound of O(sort(E) log(B) + log(V )scan(E)). 2We borrow terminology from the study of iterative linear system solvers. 96 Arge et al. =-=[8]-=- give a deterministic algorithm requiring O(sort(E) log log(B)). The best I/O bound is for a randomized algorithm by Abello et al. [2] using O(sort(E)) I/O’s with high probability. As MSF can be used ... |

30 | More effective distributed ml via a stale synchronous parallel parameter server.
- Ho, Cipar, et al.
- 2013
(Show Context)
Citation Context ...lementation of the Parallel Sliding Windows algorithm, we suggest an architecture based on multiple independent GraphChi instances that synchronize their state asynchronously using a Parameter Server =-=[4, 65, 135]-=-. Conceptually, a Parameter Server is a distributed key-value service that allows nodes to asynchronously send and receive updates to values of keys. Each node caches all or a subset of the keys. Conf... |

29 |
NSA Prism program taps in to user data of Apple, Google and others. The Guardian,
- Greenwald, MacAskill
- 2013
(Show Context)
Citation Context ... Social networks can also be inferred from analysis of phone logs or patterns of internet connectivity. Not surprisingly, governmental intelligence agencies have invested heavily into graph analytics =-=[38, 59]-=-, which has recently provoked an intense public debate. In addition to data that is explicitly represented as a graph, many problems in machine learning and data mining can be represented as graph pro... |

28 |
Triangle listing in massive networks and its applications.
- Chu
- 2011
(Show Context)
Citation Context ...lve the same problem in over 400 minutes [43]. These results highlight the inefficiency of MapReduce for graph problems. Recently, Chu et al. proposed an I/O efficient algorithm for triangle counting =-=[19]-=-. Their method can list the triangles of a graph with 106 mil. vertices and 1.9B edges in 40 minutes. Unfortunately, we were unable to repeat their experiment due to unavailability of the graph. Final... |

26 | Storing RDF as a Graph,
- Bonstrom, Hinze, et al.
- 2003
(Show Context)
Citation Context ...efficiently it can handle graphs with edge and vertex attributes. Graph storage has been also studied by the Semantic Web community, for storing RDF data. Storing RDF as a graph was first proposed in =-=[27]-=-. Our focus has been on graphs such as social networks and the Web, but we suggest GraphChi-DB could also be used as a backend for storing RDF triples. The idea of in-database analytics has been explo... |

25 | E.: Optimal sparse matrix dense vector multiplication in the I/O-model. In:
- Bender, Brodal, et al.
- 2007
(Show Context)
Citation Context ...which stores all edges that have destination vertex in that interval. into memory. Similar data layout for sparse graphs was used previously, for example, to implement I/O efficient Pagerank and SpMV =-=[5, 22]-=-. PSW does graph computation in execution intervals, by processing vertices one interval at a time. To create the subgraph for the vertices in interval p, their edges (with their associated values) mu... |

24 | Beyond ‘Caveman Communities’: Hubs and Spokes for Graph Compression and Mining.
- Kang, Faloutsos
- 2011
(Show Context)
Citation Context |

24 |
H.: Turbograph: A fast parallel graph engine handling billion-scale graphs in a single pc
- Han, Lee, et al.
- 2013
(Show Context)
Citation Context ...ndows[83], several papers have been published, including in the top conferences, that have proposed alternative approaches for disk-based graph computation (for example, X-Stream [125] and TurboGraph =-=[62]-=-). All these papers cite our work as their inspiration and use GraphChi as the primary comparative system in their benchmarks. When writing this thesis, [83] has received over 85 citations in just 18 ... |

23 | Engineering an external memory minimum spanning tree algorithm.
- Dementiev, Sanders, et al.
- 2004
(Show Context)
Citation Context ...y have a competitive I/O bound of O(sort(E) log(V/M)) and also work well in practice. We also show that our MSF implementation is competitive with a specialized algorithm proposed by Dementiev et al. =-=[43]-=- while being much simpler. We analyze theoretically the acceleration of label propagation using the Gauss-Seidel execution, in comparison to synchronous execution.1 6.1 Introduction Research on extern... |

22 | SSDAlloc: hybrid SSD/RAM memory management made easy," presented at the
- Badam, Pai
- 2011
(Show Context)
Citation Context ...tion on just a personal computer? Handling graphs with billions of edges in memory would require tens or hundreds of gigabytes of DRAM, currently only available to high-end servers, with steep prices =-=[4]-=-. This leaves us with only one option: to use persistent storage as memory extension. Unfortunately, processing large graphs efficiently from disk is a hard problem, and generic solutions, such as sys... |

22 | Linkbench: a database benchmark based on the facebook social graph,
- Armstrong, Ponnekanti, et al.
- 2013
(Show Context)
Citation Context ...odel further in Section 4.7.6. 69 Graph Edges GraphChi-DB Neo4J MySQL data + indices live-journal [11] 69M 0.8 GB 2.3 GB 2.1 GB twitter-2010 [81] 1.5B 17 GB 52 GB 62 GB LinkBench 5B ∼ 350GB – 1400 GB =-=[9]-=- Table 4.1: Comparison of database disk space for graphs stored in GraphChi-DB, Neo4j and MySQL. For MySQL we created a table for 4-byte source and destination IDs, using (source, destination) as the ... |

21 |
O jistem problemu minimalnim (about a certain minimal problem). In: Prace, Moravske Prirodovedecke Spolecnosti,
- Boruvka
- 1926
(Show Context)
Citation Context ... for computing the minimum spanning forest (MSF) in the external-memory setting use different variations of graph contraction to recursively solve the problem. We implement a variation of the Boruvka =-=[29]-=- algorithm on PSW, based on the MLP algorithm. On each iteration, Boruvka’s algorithm selects the minimum weight edge of each vertex. These minimum edges are surely part of the minimum spanning forest... |

20 | Space-optimal heavy hitters with strong error bounds
- Berinde, Cormode, et al.
(Show Context)
Citation Context .... For each source we maintain a vertex × visits mapping for some maximum of K elements. To limit the number of vertices in the map to K in principled manner we use the FREQUENT algorithm described in =-=[20]-=-, which was originally proposed by [103]. The idea is simple: when we record a new visit to vertex v, as long as the number of distinct vertices in the map is less than K, we either insert (v, 1) into... |

19 |
Fast Personalized Pagerank on Mapreduce,”
- Bahmani, Xin
- 2011
(Show Context)
Citation Context ...PageRank. Bahmani et. al. [14] study how to efficiently update a database of random walk segments when new edges are inserted into the graph. Their method could be combined with DrunkardMob. Finally, =-=[15]-=- proposes a method to compute PPR efficiently on the popular MapReduce [42] parallel data processing framework. Their method is a generalization of the method by Das Sarma et. al. [41]. While MapReduc... |

19 | Scaling Datalog for machine learning on Big Data
- Bu, Borkar, et al.
- 2012
(Show Context)
Citation Context ...ies to execute analytical queries. However, MADlib does not target graph computation. Use of Datalog as a declarative query language for graph computation has been explored by several researchers. In =-=[30]-=-, the authors implement Pregel [97] and iterative MapReduce computation using Datalog and demonstrate promising performance on Hyracks computation engine [28]. They do not discuss the data representat... |

19 | Distributed parallel inference on large factor graphs - Gonzalez, Low, et al. - 2009 |

15 |
Advanced modularity-specialized label propagation algorithm for detecting communities in networks
- Liu, Murata
(Show Context)
Citation Context ...ent iterations, vertex chooses a new label based on the labels of its neighbors. For Connected Components, vertex chooses the minimum label; for Community Detection, the most frequent label is chosen =-=[30]-=-. A neighbor is scheduled only if a label in a connecting edge changes, which we implement by using selective scheduling. Finally, sets of vertices with equal labels are interpreted as connected compo... |

15 |
Prex sums and their applications, in Synthesis of Parallel Algorithms
- Blelloch
- 1993
(Show Context)
Citation Context ... (number of in-edges) for each of the vertices, requiring one pass over the input file. The degrees for consecutive vertices can be combined to save memory. To finish, Sharder computes the prefix sum =-=[10]-=- over the degree array, and divides vertices into P intervals with approximately the same number of in-edges. 2. On the second pass, Sharder writes each edge to a temporary scratch file of the owning ... |

15 |
Elementary graph algorithms in external memory.
- Katriel, Meyer
- 2003
(Show Context)
Citation Context ...in-memory graphs, SCCs can be efficiently found using Depth-First Search (DFS). Unfortunately, in the external memory setting, as well as in the distributed setting, executing DFS is very inefficient =-=[76]-=-. A relatively complex local algorithm based on message passing was proposed in [127]. We also implemented a variation of this algorithm for GraphChi, which we describe in Section B.1. • The single-so... |

14 | Improved external memory BFS implementation
- Ajwani, Meyer, et al.
- 2007
(Show Context)
Citation Context ...nds are further improved by Mellhorn and Meyer [99] (MM). These results were theoretical, but the algorithms were implemented (with various optimizations) and experimentally studied by Ajwani et. al. =-=[5]-=-. In their study, they find that in practice the MR algorithm beats MM on short-diameter sparse graphs, while MM is better on grids and certain other types of graphs. We note that these algorithms are... |

13 |
Large graph processing in the cloud
- CHEN, WENG, et al.
- 2010
(Show Context)
Citation Context ...to small parts that could be processed in parallel. This lack of data-parallelism renders MapReduce [20] inefficient for computing on such graphs, as has been argued by many researchers (for example, =-=[14, 31, 33]-=-). Consequently, in recent years several graph-based abstractions have been proposed, most notably Pregel [33] and GraphLab [31]. Both use a vertex-centric computation model, in which the user defines... |

12 |
A large time-aware graph.
- Boldi, Santini, et al.
- 2008
(Show Context)
Citation Context ...x (“AMD Server”). Graph name Vertices Edges P Preproc. live-journal [3] 4.8M 69M 3 0.5 min netflix [6] 0.5M 99M 20 1 min domain [47] 26M 0.37B 20 2 min twitter-2010 [28] 42M 1.5B 20 10 min uk-2007-05 =-=[12]-=- 106M 3.7B 40 31 min uk-union [12] 133M 5.4B 50 33 min yahoo-web [47] 1.4B 6.6B 50 37 min Table 1: Experiment graphs. Preprocessing (conversion to shards) was done on Mac Mini. 7.2 Comparison to Other... |

6 |
Inference of beliefs on billion-scale graphs. The 2nd Workshop on Largescale Data Mining: Theory and Applications
- KANG, CHAU, et al.
- 2010
(Show Context)
Citation Context ...based on iterative message passing between vertices. The goal here is to estimate the probabilities of variables (“beliefs”). For this work, we adapted a special BP algorithm proposed by Kang et. al. =-=[24]-=-, which we call WebGraph-BP. The purpose of this application is to execute BP on a graph of webpages to determine whether a page is “good” or “bad”. For example, phishing sites are regarded as bad and... |

6 |
The madlib analytics library
- Hellerstein, Ré, et al.
(Show Context)
Citation Context ...social networks and the Web, but we suggest GraphChi-DB could also be used as a backend for storing RDF triples. The idea of in-database analytics has been explored by other authors. Recently, MADlib =-=[64]-=- was proposed as a library for implementing machine learning and data analytics inside a relational database. It is based on user-defined functions that can be invoked using SQL queries to execute ana... |

5 |
Graphlab: A distributed framework for machine learning in the cloud,”
- Low, Gonzalez, et al.
- 2011
(Show Context)
Citation Context ...to small parts that could be processed in parallel. This lack of data-parallelism renders MapReduce [20] inefficient for computing on such graphs, as has been argued by many researchers (for example, =-=[14, 31, 33]-=-). Consequently, in recent years several graph-based abstractions have been proposed, most notably Pregel [33] and GraphLab [31]. Both use a vertex-centric computation model, in which the user defines... |

5 | Inference of beliefs on billion-scale graphs
- KANG, CHAU, et al.
(Show Context)
Citation Context ...based on iterative message passing between vertices. The goal here is to estimate the probabilities of variables (“beliefs”). For this work, we adapted a special BP algorithm proposed by Kang et. al. =-=[74]-=-, which we call WebGraph-BP. The purpose of this application is to execute BP on a graph of webpages to determine whether a page is “good” or “bad”. For example, phishing sites are regarded as bad and... |

4 | Parallel and i/o efficient set covering algorithms
- Blelloch, Simhadri, et al.
- 2012
(Show Context)
Citation Context ... has O(|E|). Many real-world graphs are sparse, and it is unclear which bound is better in practice. A similar approach was recently used by Blelloch et. al. for I/O efficient Set Covering algorithms =-=[10]-=-. Optimal bounds for I/O efficient SpMV algorithms was derived recently by Bender [5]. Similar methods were earlier used by Haveliwala [22] and Chen et. al. [15]. GraphChi and the PSW method extend th... |

4 |
Prefix sums and their applications. Technical report, Synthesis of Parallel Algorithms
- Blelloch
- 1990
(Show Context)
Citation Context ... (number of in-edges) for each of the vertices, requiring one pass over the input file. The degrees for consecutive vertices can be combined to save memory. To finish, Sharder computes the prefix sum =-=[11]-=- over the degree array, and divides vertices into P intervals with approximately the same number of in-edges. 2. On the second pass, Sharder writes each edge to a temporary scratch file of the owning ... |

4 |
Yahoo! altavista web page hyperlink connectivity graph, circa 2002
- WebScope
- 2012
(Show Context)
Citation Context ...their associated values of any single vertex in the graph. To illustrate that it is often infeasible to even store just vertex values in memory, consider the yahoo-web graph with 1.7 billion vertices =-=[47]-=-. Associating a floating point value for each vertex would require almost 7 GB of memory, too much for many current PCs (spring 2012). While we expect the memory capacity of personal computers to grow... |

3 |
et al. The input/output complexity of sorting and related problems
- Aggarwal, Vitter
- 1988
(Show Context)
Citation Context ...ecution interval, it is added to the graph only after the execution interval has finished. 63.6 Analysis of the I/O Costs We analyze the I/O efficiency of PSW in the I/O model by Aggarwal and Vitter =-=[1]-=-. In this model, cost of an algorithm is the number of block transfers from disk to main memory. The complexity is parametrized by the size of block transfer, B, stated in the unit of the edge object ... |

2 | Neo4j: The graph database, 2011. http://neo4j.org. Accessed - Neo4j - 2011 |

2 | A heuristic strong connectivity algorithm for large graphs
- Cosgaya-Lozano, Zeh
- 2009
(Show Context)
Citation Context ...+V/M)scan(E)+V ). Previous literature on external memory algorithms has considered the computation of SCCs an open problem [76]. However, a heuristic algorithm for external memory SCC was proposed in =-=[40]-=-. It is based on a graph contraction step that attempts to reduce the graph so that an semi-external algorithm can be used. The algorithm is demonstrated to work on many practical graphs reasonably we... |

1 | Oolong: Programming asynchronous distributed applications with triggers - Mitchell, Power, et al. |

1 | The energy case for graph processing on hybrid cpu and gpu systems
- Gharaibeh, Santos-Neto, et al.
- 2013
(Show Context)
Citation Context .... In addition to the scalability, an important factor in the economics of large computational systems is the cost of electricity. The energy consumption of graph computation frameworks was studied in =-=[52]-=-. Although they do not run experiments with GraphChi, the authors hypothesize that the relative energy consumption, normalized by computational throughput, of a system based on GraphChi (or a similar ... |