Results 1  10
of
19
GraphChi: Largescale Graph Computation On just a PC
 In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, OSDI’12
, 2012
"... Current systems for graph computation require a distributed computing cluster to handle very large realworld problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains c ..."
Abstract

Cited by 115 (6 self)
 Add to MetaCart
(Show Context)
Current systems for graph computation require a distributed computing cluster to handle very large realworld problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains challenging, especially to nonexperts. In this work, we present GraphChi, a diskbased system for computing efficiently on graphs with billions of edges. By using a wellknown method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumerlevel computer. We further extend GraphChi to support graphs that evolve over time, and demonstrate that, on a single computer, GraphChi can process over one hundred thousand graph updates per second, while simultaneously performing computation. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives. By repeating experiments reported for existing distributed systems, we show that, with only fraction of the resources, GraphChi can solve the same problems in very reasonable time. Our work makes largescale graph computation available to anyone with a modern PC. 1
FENNEL: Streaming Graph Partitioning for Massive Scale Graphs
"... Balanced graph partitioning in the streaming setting is a key problem to enable scalable and efficient computations on massive graph data such as web graphs, knowledge graphs, and graphs arising in the context of online social networks. Two families of heuristics for graph partitioning in the stream ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
Balanced graph partitioning in the streaming setting is a key problem to enable scalable and efficient computations on massive graph data such as web graphs, knowledge graphs, and graphs arising in the context of online social networks. Two families of heuristics for graph partitioning in the streaming setting are in wide use: place the newly arrived vertex in the cluster with the largest number of neighbors or in the cluster with the least number of nonneighbors. In this work, we introduce a framework which unifies the two seemingly orthogonal heuristics and allows us to quantify the interpolation between them. More generally, the framework enables a well principled design of scalable, streaming graph partitioning algorithms that are amenable to distributed implementations. We derive a novel onepass, streaming graph partitioning algorithm and show that it yields significant performance improvements over previous approaches using an extensive set of realworld and synthetic graphs. Surprisingly, despite the fact that our algorithm is a onepass streaming algorithm, we found its performance to be in many cases comparable to the defacto standard offline software METIS and in some cases even superiror. For instance, for the Twitter graph with more than 1.4 billion of edges, our method partitions the graph in about 40 minutes achieving a balanced partition that cuts as few as 6.8 % of edges, whereas it took more than 8 1 hours by METIS to 2 produce a balanced partition that cuts 11.98 % of edges. We also demonstrate the performance gains by using our graph partitioner while solving standard PageRank computation in a graph processing platform with respect to the communication cost and runtime.
LFGraph: Simple and fast distributed graph analytics
 in Proceedings of the First ACM Conference on Timely Results in Operating Systems, ser. TRIOS ’13. ACM
, 2013
"... Abstract Distributed graph analytics frameworks must offer low and balanced communication and computation, low preprocessing overhead, low memory footprint, and scalability. We present LFGraph, a fast, scalable, distributed, inmemory graph analytics engine intended primarily for directed graphs. L ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Abstract Distributed graph analytics frameworks must offer low and balanced communication and computation, low preprocessing overhead, low memory footprint, and scalability. We present LFGraph, a fast, scalable, distributed, inmemory graph analytics engine intended primarily for directed graphs. LFGraph is the first system to satisfy all of the above requirements. It does so by relying on cheap hashbased graph partitioning, while making iterations faster by using publishsubscribe information flow along directed edges, fetchonce communication, singlepass computation, and inneighbor storage. Our analytical and experimental results show that when applied to reallife graphs, LFGraph is faster than the best graph analytics frameworks by factors of 1x5x when ignoring partitioning time and by 1x560x when including partitioning time.
Horton+: A Distributed System for Processing Declarative Reachability Queries over Partitioned Graphs
"... Horton+ is a graph query processing system that executes declarative reachability queries on a partitioned attributed multigraph. It employs a query language, query optimizer, and a distributed execution engine. The query language expresses declarative reachability queries, and supports closures an ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Horton+ is a graph query processing system that executes declarative reachability queries on a partitioned attributed multigraph. It employs a query language, query optimizer, and a distributed execution engine. The query language expresses declarative reachability queries, and supports closures and predicates on node and edge attributes to match graph paths. We introduce three algebraic operators, select, traverse, and join, and a query is compiled into an execution plan containing these operators. As reachability queries access the graph elements in a random access pattern, the graph is therefore maintained in the main memory of a cluster of servers to reduce query execution time. We develop a distributed execution engine that processes a query plan in parallel on the graph servers. Since the query language is declarative, we build a query optimizer that uses graph statistics to estimate predicate selectivity. We experimentally evaluate the system performance on a cluster of 16 graph servers using synthetic graphs as well as a real graph from an application that uses reachability queries. The evaluation shows (1) the efficiency of the optimizer in reducing query execution time, (2) system scalability with the size of the graph and with the number of servers, and (3) the convenience of using declarative queries. 1.
NUMAaware graphstructured analytics
 In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP
, 2015
"... Graphstructured analytics has been widely adopted in a number of big data applications such as social computation, websearch and recommendation systems. Though much prior research focuses on scaling graphanalytics on distributed environments, the strong desire on performance per core, dollar and ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Graphstructured analytics has been widely adopted in a number of big data applications such as social computation, websearch and recommendation systems. Though much prior research focuses on scaling graphanalytics on distributed environments, the strong desire on performance per core, dollar and joule has generated considerable interests of processing largescale graphs on a single serverclass machine, which may have several terabytes of RAM and 80 or more cores. However, prior graphanalytics systems are largely neutral to NUMA characteristics and thus have suboptimal performance. This paper presents a detailed study of NUMA characteristics and their impact on the efficiency of graphanalytics. Our study uncovers two insights: 1) either random or interleaved allocation of graph data will significantly hamper data locality and parallelism; 2) sequential internode (i.e., remote) memory accesses have much higher bandwidth than both intra and internode random ones. Based on them, this paper describes Polymer, a NUMAaware graphanalytics system on multicore with two key design decisions. First, Polymer differentially allocates and places topology data, applicationdefined data and mutable runtime states of a graph system according to their access patterns to minimize remote accesses. Second, for some remaining random accesses, Polymer carefully converts random remote accesses into sequential remote accesses, by using lightweight replication of vertices across NUMA nodes. To improve load balance and vertex convergence, Polymer is further built with a hierarchical barrier to boost parallelism and locality, an edgeoriented balanced partitioning for skewed graphs, and adaptive data structures according to the proportion of active vertices. A detailed evaluation on an 80core machine shows that Polymer often outperforms the stateoftheart singlemachine graphanalytics systems, including Ligra, XStream and Galois, for a set of popular realworld and synthetic graphs.
Balanced graph edge partition
 KDD
, 2014
"... Abstract Balanced edge partition has emerged as a new approach to partition an input graph data for the purpose of scaling out parallel computations, which is of interest for several modern data analytics computation platforms, including platforms for iterative computations, machine learning probl ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract Balanced edge partition has emerged as a new approach to partition an input graph data for the purpose of scaling out parallel computations, which is of interest for several modern data analytics computation platforms, including platforms for iterative computations, machine learning problems, and graph databases. This new approach stands in a stark contrast to the traditional approach of balanced vertex partition, where for given number of partitions, the problem is to minimize the number of edges cut subject to balancing the vertex cardinality of partitions. In this paper, we first characterize the expected costs of vertex and edge partitions with and without aggregation of messages, for the commonly deployed policy of placing a vertex or an edge uniformly at random to one of the partitions. We then obtain the first approximation algorithms for the balanced edgepartition problem which for the case of no aggregation matches the best known approximation ratio for the balanced vertexpartition problem, and show that this remains to hold for the case with aggregation up to factor that is equal to the maximum indegree of a vertex. We report results of an extensive empirical evaluation on a set of realworld graphs, which quantifies the benefits of edgevs. vertexpartition, and demonstrates efficiency of natural greedy online assignments for the balanced edgepartition problem with and with no aggregation.
Adaptive partitioning for largescale dynamic graphs
 In proc. ICDCS (2014
"... Abstract—In the last years, largescale graph processing has gained increasing attention, with most recent systems placing particular emphasis on latency. One possible technique to improve runtime performance in a distributed graph processing system is to reduce network communication. The most notab ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract—In the last years, largescale graph processing has gained increasing attention, with most recent systems placing particular emphasis on latency. One possible technique to improve runtime performance in a distributed graph processing system is to reduce network communication. The most notable way to achieve this goal is to partition the graph by minimizing the number of edges that connect vertices assigned to different machines, while keeping the load balanced. However, realworld graphs are highly dynamic, with vertices and edges being constantly added and removed. Carefully updating the partitioning of the graph to reflect these changes is necessary to avoid the introduction of an extensive number of cut edges, which would gradually worsen computation performance. In this paper we show that performance degradation in dynamic graph processing systems can be avoided by adapting continuously the graph partitions as the graph changes. We present a novel highly scalable adaptive partitioning strategy, and show a number of refinements that make it work under the constraints of a largescale distributed system. The partitioning strategy is based on iterative vertex migrations, relying only on local information. We have implemented the technique in a graph processing system, and we show through three realworld scenarios how adapting graph partitioning reduces execution time by over 50 % when compared to commonly used hashpartitioning. I.
xdgp: A dynamic graph processing system with adaptive partitioning. arXiv
, 2013
"... Many realworld systems, such as social networks, rely on mining efficiently large graphs, with hundreds of millions of vertices and edges. This volume of information requires partitioning the graph across multiple nodes in a distributed system. This has a deep effect on performance, as traversing ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Many realworld systems, such as social networks, rely on mining efficiently large graphs, with hundreds of millions of vertices and edges. This volume of information requires partitioning the graph across multiple nodes in a distributed system. This has a deep effect on performance, as traversing edges cut between partitions incurs a significant performance penalty due to the cost of communication. Thus, several systems in the literature have attempted to improve computational performance by enhancing graph partitioning, but they do not support another characteristic of realworld graphs: graphs are inherently dynamic, their topology evolves continuously, and subsequently the optimum partitioning also changes over time. In this work, we present the first system that dynamically repartitions massive graphs to adapt to structural changes. The system optimises graph partitioning to prevent performance degradation without using data replication. The system adopts an iterative vertex migration algorithm that relies on local information only, making complex coordination unnecessary. We show how the improvement in graph partitioning reduces execution time by over 50%, while adapting the partitioning to a large number of changes to the graph in three realworld scenarios.
Systems for near realtime analysis of largescale dynamic graphs
, 2014
"... ar ..."
(Show Context)