Results 1 - 10
of
13
FENNEL: Streaming Graph Partitioning for Massive Scale Graphs
"... Balanced graph partitioning in the streaming setting is a key problem to enable scalable and efficient computations on massive graph data such as web graphs, knowledge graphs, and graphs arising in the context of online social networks. Two families of heuristics for graph partitioning in the stream ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
(Show Context)
Balanced graph partitioning in the streaming setting is a key problem to enable scalable and efficient computations on massive graph data such as web graphs, knowledge graphs, and graphs arising in the context of online social networks. Two families of heuristics for graph partitioning in the streaming setting are in wide use: place the newly arrived vertex in the cluster with the largest number of neighbors or in the cluster with the least number of non-neighbors. In this work, we introduce a framework which unifies the two seemingly orthogonal heuristics and allows us to quantify the interpolation between them. More generally, the framework enables a well principled design of scalable, streaming graph partitioning algorithms that are amenable to distributed implementations. We derive a novel one-pass, streaming graph partitioning algorithm and show that it yields significant performance improvements over previous approaches using an extensive set of real-world and synthetic graphs. Surprisingly, despite the fact that our algorithm is a onepass streaming algorithm, we found its performance to be in many cases comparable to the de-facto standard offline software METIS and in some cases even superiror. For instance, for the Twitter graph with more than 1.4 billion of edges, our method partitions the graph in about 40 minutes achieving a balanced partition that cuts as few as 6.8 % of edges, whereas it took more than 8 1 hours by METIS to 2 produce a balanced partition that cuts 11.98 % of edges. We also demonstrate the performance gains by using our graph partitioner while solving standard PageRank computation in a graph processing platform with respect to the communication cost and runtime.
Recent advances in graph partitioning
, 2013
"... We survey recent trends in practical algorithms for balanced graph partitioning together with applications and future research directions. ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
We survey recent trends in practical algorithms for balanced graph partitioning together with applications and future research directions.
Streaming Balanced Graph Partitioning Algorithms for Random Graphs
"... The has been a recent explosion in the size of stored data, partially due to advances in storage technology, and partially due to the growing popularity of cloudcomputing and the vast quantities of data generated, motivates the need for streaming algorithms that can compute approximate solutions wit ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The has been a recent explosion in the size of stored data, partially due to advances in storage technology, and partially due to the growing popularity of cloudcomputing and the vast quantities of data generated, motivates the need for streaming algorithms that can compute approximate solutions without full random access to all of the data. We address the problem of computing a balanced k-partitioning of a graph with only one pass over the data. Based on experimental results in [11] we analyze two variants of a randomized greedy algorithm, one that prefers the arg max and one that is proportional, on random graphs with embedded balanced k-cuts and theoretically bound the performance of each algorithms- the arg max algorithm is able to asymptotically recover the embedded k-cut, while, surprisingly, the proportional variant can not. 1
Balanced graph edge partition
- KDD
, 2014
"... Abstract -Balanced edge partition has emerged as a new approach to partition an input graph data for the purpose of scaling out parallel computations, which is of interest for several modern data analytics computation platforms, including platforms for iterative computations, machine learning probl ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Abstract -Balanced edge partition has emerged as a new approach to partition an input graph data for the purpose of scaling out parallel computations, which is of interest for several modern data analytics computation platforms, including platforms for iterative computations, machine learning problems, and graph databases. This new approach stands in a stark contrast to the traditional approach of balanced vertex partition, where for given number of partitions, the problem is to minimize the number of edges cut subject to balancing the vertex cardinality of partitions. In this paper, we first characterize the expected costs of vertex and edge partitions with and without aggregation of messages, for the commonly deployed policy of placing a vertex or an edge uniformly at random to one of the partitions. We then obtain the first approximation algorithms for the balanced edge-partition problem which for the case of no aggregation matches the best known approximation ratio for the balanced vertex-partition problem, and show that this remains to hold for the case with aggregation up to factor that is equal to the maximum in-degree of a vertex. We report results of an extensive empirical evaluation on a set of real-world graphs, which quantifies the benefits of edgevs. vertex-partition, and demonstrates efficiency of natural greedy online assignments for the balanced edge-partition problem with and with no aggregation.
Adaptive partitioning for large-scale dynamic graphs
- In proc. ICDCS (2014
"... Abstract—In the last years, large-scale graph processing has gained increasing attention, with most recent systems placing particular emphasis on latency. One possible technique to improve runtime performance in a distributed graph processing system is to reduce network communication. The most notab ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Abstract—In the last years, large-scale graph processing has gained increasing attention, with most recent systems placing particular emphasis on latency. One possible technique to improve runtime performance in a distributed graph processing system is to reduce network communication. The most notable way to achieve this goal is to partition the graph by minimizing the num-ber of edges that connect vertices assigned to different machines, while keeping the load balanced. However, real-world graphs are highly dynamic, with vertices and edges being constantly added and removed. Carefully updating the partitioning of the graph to reflect these changes is necessary to avoid the introduction of an extensive number of cut edges, which would gradually worsen computation performance. In this paper we show that performance degradation in dynamic graph processing systems can be avoided by adapting continuously the graph partitions as the graph changes. We present a novel highly scalable adaptive partitioning strategy, and show a number of refinements that make it work under the constraints of a large-scale distributed system. The partitioning strategy is based on iterative vertex migrations, relying only on local information. We have implemented the technique in a graph processing system, and we show through three real-world scenarios how adapting graph partitioning reduces execution time by over 50 % when compared to commonly used hash-partitioning. I.
Systems for near real-time analysis of large-scale dynamic graphs
, 2014
"... ar ..."
(Show Context)
LogGP: A Log-based Dynamic Graph Partitioning Method
"... With the increasing availability and scale of graph data from Web 2.0, graph partitioning becomes one of efficient pre-processing techniques to balance the computing workload. Since the cost of partitioning the entire graph is strictly prohibitive, there are some recent tentative works towards strea ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
With the increasing availability and scale of graph data from Web 2.0, graph partitioning becomes one of efficient pre-processing techniques to balance the computing workload. Since the cost of partitioning the entire graph is strictly prohibitive, there are some recent tentative works towards streaming graph partitioning which can run faster, be easily paralleled, and be incrementally updated. Unfortunately, the experiments show that the running time of each parti-tioning is still unbalanced due to the variation of workload access pattens during the supersteps. In addition, the one-pass streaming partitioning result is not always satisfactory for the algorithms ’ local view of the graph. In this paper, we present LogGP, a log-based graph parti-tioning system that records, analyzes and reuses the histor-ical statistical information to refine the partitioning result. LogGP can be used as a middle-ware and deployed to many state-of-the-art paralleled graph processing systems easily. LogGP utilizes the historical partitioning results to gener-ate a hyper-graph and uses a novel hyper-graph streaming partitioning approach to generate a better initial streaming graph partitioning result. During the execution, the sys-tem uses running logs to optimize graph partitioning which prevents performance degradation. Moreover, LogGP can dynamically repartition the massive graphs in accordance with the structural changes. Extensive experiments con-ducted on a moderate size of computing cluster with real-world graph datasets demonstrate the superiority of our ap-proach against the state-of-the-art solutions. 1.
Online and On-demand Partitioning of Streaming Graphs
"... Abstract—Many applications generate data that naturally leads to a graph representation for its modeling and analysis. A common approach to address the size and complexity of these graphs is to split them across a number of partitions, in a way that computations on them can be performed mostly local ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Many applications generate data that naturally leads to a graph representation for its modeling and analysis. A common approach to address the size and complexity of these graphs is to split them across a number of partitions, in a way that computations on them can be performed mostly locally and in parallel in the resulting partitions. In this work, we present a framework that enables partitioning of evolving graphs whose elements (nodes and edges) are streamed in an arbitrary order. At a core of our techniques lies a Condensed Spanning Tree (CST) structure that summarizes the graph stream and permits computation of high-quality graph partitions both on-line and on-demand, without the need to ever look at the whole graph. The partitioning algorithm we present manages to create partitions from streaming graphs with low memory usage, but can also adapt partitions overtime based on different application needs such as minimizing cross-partition edges, balancing load across partitions, elastically adapting partitions based on a maximum load threshold and reducing migration cost. Our experiments with many different real and synthetic graphs demonstrate that our techniques manage to process and partition efficiently millions of graph nodes per second and also adapt them based on different requirements using only the information kept in the compressed CST structure, which can reduce the input graph size down to 1.6%. I.
Online Partitioning of Multi-Labeled Graphs∗
"... Graph partitioning is an old problem that is finding renewed in-terest in the era of big, complex datasets and parallel computing frameworks that can benefit from a proper partitiong of big graph data across multiple nodes in a cluster. In this paper we look into a specific instance of the problem t ..."
Abstract
- Add to MetaCart
(Show Context)
Graph partitioning is an old problem that is finding renewed in-terest in the era of big, complex datasets and parallel computing frameworks that can benefit from a proper partitiong of big graph data across multiple nodes in a cluster. In this paper we look into a specific instance of the problem termed online graph partitioning that addresses the need to partition large graphs that do not fit in main memory. A neglected aspect of modern graph datasets is that real graphs have labels! Node labels may, for instance, correspond to categorical attributes (such as country, profession, participating groups, etc.) of the entities depicted by the vertices of the graph. Edge labels may represent different relationship types (e.g. “friend-of”, “likes”, etc.). In this work we first revisit the formulation of the graph partitioning problem for graphs with labels on both nodes and edges. We introduce “relation-cut”, as a new metric that ex-tends the traditional “edge-cut ” metric used in graph partitioning in order to take into account the existence of different edge-types. Then, we combine this metric with a novel “label-cut ” metric that takes into consideration the displacement of related nodes with sim-ilar labels across partitions. In our experiments we adapt two recent online partitioning algorithms for the new proposed metric and pro-vide a thorough evaluation on a variety of real and synthetic graphs. Our experiments demonstrate that the proposed technique balances the generated cuts on both relations and labels on the resulting par-titions. 1.
Streaming Graph Partitioning in the Planted Partition Model
"... The sheer increase in the size of graph data has created a lot of interest into developing efficient distributed graph processing frameworks. Popular existing frameworks such as GraphLab and Pregel rely on balanced graph partition-ing in order to minimize communication and achieve work balance. In t ..."
Abstract
- Add to MetaCart
(Show Context)
The sheer increase in the size of graph data has created a lot of interest into developing efficient distributed graph processing frameworks. Popular existing frameworks such as GraphLab and Pregel rely on balanced graph partition-ing in order to minimize communication and achieve work balance. In this work we contribute to the recent research line of streaming graph partitioning [30, 31, 34] which computes an approximately balanced k-partitioning of the vertex set of a graph using a single pass over the graph stream using degree-based criteria. This graph partitioning framework is well tailored to processing large-scale and dynamic graphs. In this work we introduce the use of higher length walks for streaming graph partitioning and show that their use incurs a minor computational cost which can significantly improve the quality of the graph partition. We perform an average case analysis of our algorithm using the planted partition model [7, 25]. We complement the recent results of Stanton [30] by showing that our proposed method recovers the true partition with high probability even when the gap of the model tends to zero as the size of the graph grows. Further-more, among the wide number of choices for the length of the walks we show that the proposed length is optimal. Finally, we perform simulations which indicate that our asymptotic results hold even for small graph sizes.