Results 1  10
of
13
FENNEL: Streaming Graph Partitioning for Massive Scale Graphs
"... Balanced graph partitioning in the streaming setting is a key problem to enable scalable and efficient computations on massive graph data such as web graphs, knowledge graphs, and graphs arising in the context of online social networks. Two families of heuristics for graph partitioning in the stream ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
Balanced graph partitioning in the streaming setting is a key problem to enable scalable and efficient computations on massive graph data such as web graphs, knowledge graphs, and graphs arising in the context of online social networks. Two families of heuristics for graph partitioning in the streaming setting are in wide use: place the newly arrived vertex in the cluster with the largest number of neighbors or in the cluster with the least number of nonneighbors. In this work, we introduce a framework which unifies the two seemingly orthogonal heuristics and allows us to quantify the interpolation between them. More generally, the framework enables a well principled design of scalable, streaming graph partitioning algorithms that are amenable to distributed implementations. We derive a novel onepass, streaming graph partitioning algorithm and show that it yields significant performance improvements over previous approaches using an extensive set of realworld and synthetic graphs. Surprisingly, despite the fact that our algorithm is a onepass streaming algorithm, we found its performance to be in many cases comparable to the defacto standard offline software METIS and in some cases even superiror. For instance, for the Twitter graph with more than 1.4 billion of edges, our method partitions the graph in about 40 minutes achieving a balanced partition that cuts as few as 6.8 % of edges, whereas it took more than 8 1 hours by METIS to 2 produce a balanced partition that cuts 11.98 % of edges. We also demonstrate the performance gains by using our graph partitioner while solving standard PageRank computation in a graph processing platform with respect to the communication cost and runtime.
Recent advances in graph partitioning
, 2013
"... We survey recent trends in practical algorithms for balanced graph partitioning together with applications and future research directions. ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
We survey recent trends in practical algorithms for balanced graph partitioning together with applications and future research directions.
Streaming Balanced Graph Partitioning Algorithms for Random Graphs
"... The has been a recent explosion in the size of stored data, partially due to advances in storage technology, and partially due to the growing popularity of cloudcomputing and the vast quantities of data generated, motivates the need for streaming algorithms that can compute approximate solutions wit ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
The has been a recent explosion in the size of stored data, partially due to advances in storage technology, and partially due to the growing popularity of cloudcomputing and the vast quantities of data generated, motivates the need for streaming algorithms that can compute approximate solutions without full random access to all of the data. We address the problem of computing a balanced kpartitioning of a graph with only one pass over the data. Based on experimental results in [11] we analyze two variants of a randomized greedy algorithm, one that prefers the arg max and one that is proportional, on random graphs with embedded balanced kcuts and theoretically bound the performance of each algorithms the arg max algorithm is able to asymptotically recover the embedded kcut, while, surprisingly, the proportional variant can not. 1
Balanced graph edge partition
 KDD
, 2014
"... Abstract Balanced edge partition has emerged as a new approach to partition an input graph data for the purpose of scaling out parallel computations, which is of interest for several modern data analytics computation platforms, including platforms for iterative computations, machine learning probl ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract Balanced edge partition has emerged as a new approach to partition an input graph data for the purpose of scaling out parallel computations, which is of interest for several modern data analytics computation platforms, including platforms for iterative computations, machine learning problems, and graph databases. This new approach stands in a stark contrast to the traditional approach of balanced vertex partition, where for given number of partitions, the problem is to minimize the number of edges cut subject to balancing the vertex cardinality of partitions. In this paper, we first characterize the expected costs of vertex and edge partitions with and without aggregation of messages, for the commonly deployed policy of placing a vertex or an edge uniformly at random to one of the partitions. We then obtain the first approximation algorithms for the balanced edgepartition problem which for the case of no aggregation matches the best known approximation ratio for the balanced vertexpartition problem, and show that this remains to hold for the case with aggregation up to factor that is equal to the maximum indegree of a vertex. We report results of an extensive empirical evaluation on a set of realworld graphs, which quantifies the benefits of edgevs. vertexpartition, and demonstrates efficiency of natural greedy online assignments for the balanced edgepartition problem with and with no aggregation.
Adaptive partitioning for largescale dynamic graphs
 In proc. ICDCS (2014
"... Abstract—In the last years, largescale graph processing has gained increasing attention, with most recent systems placing particular emphasis on latency. One possible technique to improve runtime performance in a distributed graph processing system is to reduce network communication. The most notab ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract—In the last years, largescale graph processing has gained increasing attention, with most recent systems placing particular emphasis on latency. One possible technique to improve runtime performance in a distributed graph processing system is to reduce network communication. The most notable way to achieve this goal is to partition the graph by minimizing the number of edges that connect vertices assigned to different machines, while keeping the load balanced. However, realworld graphs are highly dynamic, with vertices and edges being constantly added and removed. Carefully updating the partitioning of the graph to reflect these changes is necessary to avoid the introduction of an extensive number of cut edges, which would gradually worsen computation performance. In this paper we show that performance degradation in dynamic graph processing systems can be avoided by adapting continuously the graph partitions as the graph changes. We present a novel highly scalable adaptive partitioning strategy, and show a number of refinements that make it work under the constraints of a largescale distributed system. The partitioning strategy is based on iterative vertex migrations, relying only on local information. We have implemented the technique in a graph processing system, and we show through three realworld scenarios how adapting graph partitioning reduces execution time by over 50 % when compared to commonly used hashpartitioning. I.
Systems for near realtime analysis of largescale dynamic graphs
, 2014
"... ar ..."
(Show Context)
LogGP: A Logbased Dynamic Graph Partitioning Method
"... With the increasing availability and scale of graph data from Web 2.0, graph partitioning becomes one of efficient preprocessing techniques to balance the computing workload. Since the cost of partitioning the entire graph is strictly prohibitive, there are some recent tentative works towards strea ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
With the increasing availability and scale of graph data from Web 2.0, graph partitioning becomes one of efficient preprocessing techniques to balance the computing workload. Since the cost of partitioning the entire graph is strictly prohibitive, there are some recent tentative works towards streaming graph partitioning which can run faster, be easily paralleled, and be incrementally updated. Unfortunately, the experiments show that the running time of each partitioning is still unbalanced due to the variation of workload access pattens during the supersteps. In addition, the onepass streaming partitioning result is not always satisfactory for the algorithms ’ local view of the graph. In this paper, we present LogGP, a logbased graph partitioning system that records, analyzes and reuses the historical statistical information to refine the partitioning result. LogGP can be used as a middleware and deployed to many stateoftheart paralleled graph processing systems easily. LogGP utilizes the historical partitioning results to generate a hypergraph and uses a novel hypergraph streaming partitioning approach to generate a better initial streaming graph partitioning result. During the execution, the system uses running logs to optimize graph partitioning which prevents performance degradation. Moreover, LogGP can dynamically repartition the massive graphs in accordance with the structural changes. Extensive experiments conducted on a moderate size of computing cluster with realworld graph datasets demonstrate the superiority of our approach against the stateoftheart solutions. 1.
Online and Ondemand Partitioning of Streaming Graphs
"... Abstract—Many applications generate data that naturally leads to a graph representation for its modeling and analysis. A common approach to address the size and complexity of these graphs is to split them across a number of partitions, in a way that computations on them can be performed mostly local ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—Many applications generate data that naturally leads to a graph representation for its modeling and analysis. A common approach to address the size and complexity of these graphs is to split them across a number of partitions, in a way that computations on them can be performed mostly locally and in parallel in the resulting partitions. In this work, we present a framework that enables partitioning of evolving graphs whose elements (nodes and edges) are streamed in an arbitrary order. At a core of our techniques lies a Condensed Spanning Tree (CST) structure that summarizes the graph stream and permits computation of highquality graph partitions both online and ondemand, without the need to ever look at the whole graph. The partitioning algorithm we present manages to create partitions from streaming graphs with low memory usage, but can also adapt partitions overtime based on different application needs such as minimizing crosspartition edges, balancing load across partitions, elastically adapting partitions based on a maximum load threshold and reducing migration cost. Our experiments with many different real and synthetic graphs demonstrate that our techniques manage to process and partition efficiently millions of graph nodes per second and also adapt them based on different requirements using only the information kept in the compressed CST structure, which can reduce the input graph size down to 1.6%. I.
Online Partitioning of MultiLabeled Graphs∗
"... Graph partitioning is an old problem that is finding renewed interest in the era of big, complex datasets and parallel computing frameworks that can benefit from a proper partitiong of big graph data across multiple nodes in a cluster. In this paper we look into a specific instance of the problem t ..."
Abstract
 Add to MetaCart
(Show Context)
Graph partitioning is an old problem that is finding renewed interest in the era of big, complex datasets and parallel computing frameworks that can benefit from a proper partitiong of big graph data across multiple nodes in a cluster. In this paper we look into a specific instance of the problem termed online graph partitioning that addresses the need to partition large graphs that do not fit in main memory. A neglected aspect of modern graph datasets is that real graphs have labels! Node labels may, for instance, correspond to categorical attributes (such as country, profession, participating groups, etc.) of the entities depicted by the vertices of the graph. Edge labels may represent different relationship types (e.g. “friendof”, “likes”, etc.). In this work we first revisit the formulation of the graph partitioning problem for graphs with labels on both nodes and edges. We introduce “relationcut”, as a new metric that extends the traditional “edgecut ” metric used in graph partitioning in order to take into account the existence of different edgetypes. Then, we combine this metric with a novel “labelcut ” metric that takes into consideration the displacement of related nodes with similar labels across partitions. In our experiments we adapt two recent online partitioning algorithms for the new proposed metric and provide a thorough evaluation on a variety of real and synthetic graphs. Our experiments demonstrate that the proposed technique balances the generated cuts on both relations and labels on the resulting partitions. 1.
Streaming Graph Partitioning in the Planted Partition Model
"... The sheer increase in the size of graph data has created a lot of interest into developing efficient distributed graph processing frameworks. Popular existing frameworks such as GraphLab and Pregel rely on balanced graph partitioning in order to minimize communication and achieve work balance. In t ..."
Abstract
 Add to MetaCart
(Show Context)
The sheer increase in the size of graph data has created a lot of interest into developing efficient distributed graph processing frameworks. Popular existing frameworks such as GraphLab and Pregel rely on balanced graph partitioning in order to minimize communication and achieve work balance. In this work we contribute to the recent research line of streaming graph partitioning [30, 31, 34] which computes an approximately balanced kpartitioning of the vertex set of a graph using a single pass over the graph stream using degreebased criteria. This graph partitioning framework is well tailored to processing largescale and dynamic graphs. In this work we introduce the use of higher length walks for streaming graph partitioning and show that their use incurs a minor computational cost which can significantly improve the quality of the graph partition. We perform an average case analysis of our algorithm using the planted partition model [7, 25]. We complement the recent results of Stanton [30] by showing that our proposed method recovers the true partition with high probability even when the gap of the model tends to zero as the size of the graph grows. Furthermore, among the wide number of choices for the length of the walks we show that the proposed length is optimal. Finally, we perform simulations which indicate that our asymptotic results hold even for small graph sizes.