Results 1  10
of
46
GraphChi: Largescale Graph Computation On just a PC
 In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, OSDI’12
, 2012
"... Current systems for graph computation require a distributed computing cluster to handle very large realworld problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains c ..."
Abstract

Cited by 109 (6 self)
 Add to MetaCart
(Show Context)
Current systems for graph computation require a distributed computing cluster to handle very large realworld problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains challenging, especially to nonexperts. In this work, we present GraphChi, a diskbased system for computing efficiently on graphs with billions of edges. By using a wellknown method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumerlevel computer. We further extend GraphChi to support graphs that evolve over time, and demonstrate that, on a single computer, GraphChi can process over one hundred thousand graph updates per second, while simultaneously performing computation. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives. By repeating experiments reported for existing distributed systems, we show that, with only fraction of the resources, GraphChi can solve the same problems in very reasonable time. Our work makes largescale graph computation available to anyone with a modern PC. 1
From "Think Like a Vertex " to "Think Like a Graph"
"... To meet the challenge of processing rapidly growing graph and network data created by modern applications, a number of distributed graph processing systems have emerged, such as Pregel and GraphLab. All these systems divide input graphs into partitions, and employ a “think like a vertex ” programmin ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
(Show Context)
To meet the challenge of processing rapidly growing graph and network data created by modern applications, a number of distributed graph processing systems have emerged, such as Pregel and GraphLab. All these systems divide input graphs into partitions, and employ a “think like a vertex ” programming model to support iterative graph computation. This vertexcentric model is easy to program and has been proved useful for many graph algorithms. However, this model hides the partitioning information from the users, thus prevents many algorithmspecific optimizations. This often results in longer execution time due to excessive network messages (e.g. in Pregel) or heavy scheduling overhead to ensure data consistency (e.g. in GraphLab). To address this limitation, we propose a new “think like a graph ” programming paradigm. Under this graphcentric model, the partition structure is opened up to the users, and can be utilized so that communication within a partition can bypass the heavy message passing or scheduling machinery. We implemented this model in a new system, called Giraph++, based on Apache Giraph, an open source implementation of Pregel. We explore the applicability of the graphcentric model to three categories of graph algorithms, and demonstrate its flexibility and superior performance, especially on wellpartitioned data. For example, on a web graph with 118 million vertices and 855 million edges, the graphcentric version of connected component detection algorithm runs 63X faster and uses 204X fewer network messages than its vertexcentric counterpart. 1.
Mizan: A system for dynamic load balancing in largescale graph processing
 In EuroSys ’13
, 2013
"... Pregel [23] was recently introduced as a scalable graph mining system that can provide significant performance improvements over traditional MapReduce implementations. Existing implementations focus primarily on graph partitioning as a preprocessing step to balance computation across compute node ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
(Show Context)
Pregel [23] was recently introduced as a scalable graph mining system that can provide significant performance improvements over traditional MapReduce implementations. Existing implementations focus primarily on graph partitioning as a preprocessing step to balance computation across compute nodes. In this paper, we examine the runtime characteristics of a Pregel system. We show that graph partitioning alone is insufficient for minimizing endtoend computation. Especially where data is very large or the runtime behavior of the algorithm is unknown, an adaptive approach is needed. To this end, we introduce Mizan, a Pregel system that achieves efficient load balancing to better adapt to changes in computing needs. Unlike known implementations of Pregel, Mizan does not assume any a priori knowledge of the structure of the graph or behavior of the algorithm. Instead, it monitors the runtime characteristics of the system. Mizan then performs efficient finegrained vertex migration to balance computation and communication. We have fully implemented Mizan; using extensive evaluation we show that—especially for highlydynamic workloads— Mizan provides up to 84 % improvement over techniques leveraging static graph prepartitioning. 1.
Communication complexity of approximate maximum matching in distributed graph data
, 2013
"... We consider the problem of computing an approximate maximum matching in a graph that consists of n vertices whose edges are stored across k distributed sites in a data center. We are interested in characterizing the communication complexity of this problem which is of primary concern in data center ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
(Show Context)
We consider the problem of computing an approximate maximum matching in a graph that consists of n vertices whose edges are stored across k distributed sites in a data center. We are interested in characterizing the communication complexity of this problem which is of primary concern in data centers where communication bandwidth is a scarce resource. Our main result is that any algorithm that finds an αapproximate maximum matching has a communication complexity of Ω(α2kn). Perhaps surprisingly, we show that this lower bound matches an upper bound of a simple sequential algorithm, showing that no benefits can be obtained with respect to the communication cost despite the full flexibility allowed by the underlying computation model. Our lower bound for matching also implies lower bounds for other important graph problems in the distributed computation setting, including maxflow and graph sparsification. Other main contribution of this paper is a new approach for multiparty randomized communication complexity for graph problems that is of wide applicability.
Triad: a distributed sharednothing rdf engine based on asynchronous message passing
 In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD ’14
, 2014
"... We investigate a new approach to the design of distributed, sharednothing RDF engines. Our engine, coined “TriAD”, combines joinahead pruning via a novel form of RDF graph summarization with a localitybased, horizontal partitioning of RDF triples into a gridlike, distributed index structure. The ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
We investigate a new approach to the design of distributed, sharednothing RDF engines. Our engine, coined “TriAD”, combines joinahead pruning via a novel form of RDF graph summarization with a localitybased, horizontal partitioning of RDF triples into a gridlike, distributed index structure. The multithreaded and distributed execution of joins in TriAD is facilitated by an asynchronous Message Passing protocol which allows us to run multiple join operators along a query plan in a fully parallel, asynchronous fashion. We believe that our architecture provides a so far unique approach to joinahead pruning in a distributed environment, as the more classical form of sideways information passing would not permit for executing distributed joins in an asynchronous way. Our experiments over the LUBM, BTC andWSDTS benchmarks demonstrate that TriAD consistently outperforms centralized RDF engines by up to two orders of magnitude, while gaining a factor of more than three compared to the currently fastest, distributed engines. To our knowledge, we are thus able to report the so far fastest query response times for the above benchmarks using a midrange server and regular Ethernet setup.
LFGraph: Simple and Fast Distributed Graph Analytics∗
"... Distributed graph analytics frameworks must offer low and balanced communication and computation, low preprocessing overhead, low memory footprint, and scalability. We present LFGraph, a fast, scalable, distributed, inmemory graph analytics engine intended primarily for directed graphs. LFGraph i ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Distributed graph analytics frameworks must offer low and balanced communication and computation, low preprocessing overhead, low memory footprint, and scalability. We present LFGraph, a fast, scalable, distributed, inmemory graph analytics engine intended primarily for directed graphs. LFGraph is the first system to satisfy all of the above requirements. It does so by relying on cheap hashbased graph partitioning, while making iterations faster by using publishsubscribe information flow along directed edges, fetchonce communication, singlepass computation, and inneighbor storage. Our analytical and experimental results show that when applied to reallife graphs, LFGraph is faster than the best graph analytics frameworks by factors of 1x–5x when ignoring partitioning time and by 1x–560x when including partitioning time. 1
Pregelix: Big(ger) Graph Analytics on A Dataflow Engine
"... There is a growing need for distributed graph processing systems that are capable of gracefully scaling to very large graph datasets. Unfortunately, this challenge has not been easily met due to the intense memory pressure imposed by processcentric, message passing designs that many graph process ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
There is a growing need for distributed graph processing systems that are capable of gracefully scaling to very large graph datasets. Unfortunately, this challenge has not been easily met due to the intense memory pressure imposed by processcentric, message passing designs that many graph processing systems follow. Pregelix is a new open source distributed graph processing system that is based on an iterative dataflow design that is better tuned to handle both inmemory and outofcore workloads. As such, Pregelix offers improved performance characteristics and scaling properties over current open source systems (e.g., we have seen up to 15× speedup compared to Apache Giraph and up to 35 × speedup compared to distributed GraphLab), and more effective use of available machine resources to support Big(ger) Graph Analytics. 1.
Fast Iterative Graph Computation: A Path Centric Approach
 In SC
, 2014
"... Abstract—Large scale graph processing represents an interesting systems challenge due to the lack of locality. This paper presents PathGraph, a system for improving iterative graph computation on graphs with billions of edges. Our system design has three unique features: First, we model a large gra ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Large scale graph processing represents an interesting systems challenge due to the lack of locality. This paper presents PathGraph, a system for improving iterative graph computation on graphs with billions of edges. Our system design has three unique features: First, we model a large graph using a collection of treebased partitions and use pathcentric computation rather than vertexcentric or edgecentric computation. Our pathcentric graph parallel computation model significantly improves the memory and disk locality for iterative computation algorithms on large graphs. Second, we design a compact storage that is optimized for iterative graph parallel computation. Concretely, we use deltacompression, partition a large graph into treebased partitions and store trees in a DFS order. By clustering highly correlated paths together, we further maximize sequential access and minimize random access on storage media. Third but not the least, we implement the pathcentric computation model by using a scatter/gather programming model, which parallels the iterative computation at partition tree level and performs sequential local updates for vertices in each tree partition to improve the convergence speed. We compare PathGraph to most recent alternative graph processing systems such as GraphChi and XStream, and show that the pathcentric approach outperforms vertexcentric and edgecentric systems on a number of graph algorithms for both inmemory and outofcore graphs.
PrefEdge: SSD prefetcher for largescale graph traversal
 In Proceedings of the 7th International Systems and Storage Conference, SYSTOR ’14
, 2014
"... Mining large graphs has now become an important aspect of multiple diverse applications and a number of computer systems have been proposed to provide runtime support. Recent interest in this area has led to the construction of single machine graph computation systems that use solid state drives ( ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Mining large graphs has now become an important aspect of multiple diverse applications and a number of computer systems have been proposed to provide runtime support. Recent interest in this area has led to the construction of single machine graph computation systems that use solid state drives (SSDs) to store the graph. This approach reduces the cost and simplifies the implementation of graph algorithms, making computations on large graphs available to the average user. However, SSDs are slower than main memory, and making full use of their bandwidth is crucial for executing graph algorithms in a reasonable amount of time. In this paper, we present PrefEdge, a prefetcher for graph algorithms that parallelises requests to derive maximum throughput from SSDs. PrefEdge combines a judicious distribution of graph state between main memory and SSDs with an innovative readahead algorithm to prefetch needed data in parallel. This is in contrast to existing approaches that depend on multithreading the graph algorithms to saturate available bandwidth. Our experiments on graph algorithms using random access show that PrefEdge not only is capable of maximising the throughput from SSDs but is also able to almost hide the effect of I/O latency. The improvements in runtime for graph algorithms is up to 14 × when compared to a single threaded baseline. When compared to multithreaded implementations, PrefEdge performs up to 80 % faster without the program complexity and the programmer effort needed for multithreaded graph algorithms.
Flashgraph: processing billionnode graphs on an array of commodity ssds
"... Abstract—Graph analysis performs many random reads and writes, thus these workloads are typically performed in memory. Traditionally, analyzing large graphs requires a cluster of machines so the aggregate memory exceeds the size of the graph. We demonstrate that a multicore server can process graph ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Graph analysis performs many random reads and writes, thus these workloads are typically performed in memory. Traditionally, analyzing large graphs requires a cluster of machines so the aggregate memory exceeds the size of the graph. We demonstrate that a multicore server can process graphs of billions of vertices and hundreds of billions of edges, utilizing commodity SSDs without much performance loss. We do so by implementing a graphprocessing engine within a userspace SSD file system designed for high IOPS and extreme parallelism. This allows us to localize computation to cached data in a nonuniform memory architecture and hide latency by overlapping computation with I/O. Our semiexternal memory graph engine, called FlashGraph, stores vertex state in memory and adjacency lists on SSDs. FlashGraph exposes a general and flexible programming interface that can express a variety of graph algorithms and their optimizations. FlashGraph in semiexternal memory performs many algorithms up to 20 times faster than PowerGraph, a generalpurpose, inmemory graph engine. Even breadthfirst search, which generates many small random I/Os, runs significantly faster in FlashGraph. I.