• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Restreaming graph partitioning: simple versatile algorithms for advanced balancing. (2013)

by J Nishimura, J Ugander
Venue:In KDD ’13,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 13
Next 10 →

FENNEL: Streaming Graph Partitioning for Massive Scale Graphs

by Charalampos E. Tsourakakis, Christos Gkantsidis, Bozidar Radunovic, Milan Vojnovic
"... Balanced graph partitioning in the streaming setting is a key problem to enable scalable and efficient computations on massive graph data such as web graphs, knowledge graphs, and graphs arising in the context of online social networks. Two families of heuristics for graph partitioning in the stream ..."
Abstract - Cited by 16 (1 self) - Add to MetaCart
Balanced graph partitioning in the streaming setting is a key problem to enable scalable and efficient computations on massive graph data such as web graphs, knowledge graphs, and graphs arising in the context of online social networks. Two families of heuristics for graph partitioning in the streaming setting are in wide use: place the newly arrived vertex in the cluster with the largest number of neighbors or in the cluster with the least number of non-neighbors. In this work, we introduce a framework which unifies the two seemingly orthogonal heuristics and allows us to quantify the interpolation between them. More generally, the framework enables a well principled design of scalable, streaming graph partitioning algorithms that are amenable to distributed implementations. We derive a novel one-pass, streaming graph partitioning algorithm and show that it yields significant performance improvements over previous approaches using an extensive set of real-world and synthetic graphs. Surprisingly, despite the fact that our algorithm is a onepass streaming algorithm, we found its performance to be in many cases comparable to the de-facto standard offline software METIS and in some cases even superiror. For instance, for the Twitter graph with more than 1.4 billion of edges, our method partitions the graph in about 40 minutes achieving a balanced partition that cuts as few as 6.8 % of edges, whereas it took more than 8 1 hours by METIS to 2 produce a balanced partition that cuts 11.98 % of edges. We also demonstrate the performance gains by using our graph partitioner while solving standard PageRank computation in a graph processing platform with respect to the communication cost and runtime.
(Show Context)

Citation Context

...ommunication load. Hash partitioning takes 25% more time than Fennel and it also has a much higher traffic load. 7. CONCLUSION In this work we provide a novel perspective on a recent line of research =-=[25, 29]-=- for the balanced graph partitioning problem, which results in state-of-the-art performance in terms of speed and quality. Despite the fact that Fennel performs a single pass over the graph, it achiev...

Recent advances in graph partitioning

by Aydın Buluç, Henning Meyerhenke, Ilya Safro, Peter Sanders, Christian Schulz , 2013
"... We survey recent trends in practical algorithms for balanced graph partitioning together with applications and future research directions. ..."
Abstract - Cited by 6 (2 self) - Add to MetaCart
We survey recent trends in practical algorithms for balanced graph partitioning together with applications and future research directions.

Streaming Balanced Graph Partitioning Algorithms for Random Graphs

by Isabelle Stanton
"... The has been a recent explosion in the size of stored data, partially due to advances in storage technology, and partially due to the growing popularity of cloudcomputing and the vast quantities of data generated, motivates the need for streaming algorithms that can compute approximate solutions wit ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
The has been a recent explosion in the size of stored data, partially due to advances in storage technology, and partially due to the growing popularity of cloudcomputing and the vast quantities of data generated, motivates the need for streaming algorithms that can compute approximate solutions without full random access to all of the data. We address the problem of computing a balanced k-partitioning of a graph with only one pass over the data. Based on experimental results in [11] we analyze two variants of a randomized greedy algorithm, one that prefers the arg max and one that is proportional, on random graphs with embedded balanced k-cuts and theoretically bound the performance of each algorithms- the arg max algorithm is able to asymptotically recover the embedded k-cut, while, surprisingly, the proportional variant can not. 1

Balanced graph edge partition

by Florian Bourse , Marc Lelarge , Milan Vojnović - KDD , 2014
"... Abstract -Balanced edge partition has emerged as a new approach to partition an input graph data for the purpose of scaling out parallel computations, which is of interest for several modern data analytics computation platforms, including platforms for iterative computations, machine learning probl ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
Abstract -Balanced edge partition has emerged as a new approach to partition an input graph data for the purpose of scaling out parallel computations, which is of interest for several modern data analytics computation platforms, including platforms for iterative computations, machine learning problems, and graph databases. This new approach stands in a stark contrast to the traditional approach of balanced vertex partition, where for given number of partitions, the problem is to minimize the number of edges cut subject to balancing the vertex cardinality of partitions. In this paper, we first characterize the expected costs of vertex and edge partitions with and without aggregation of messages, for the commonly deployed policy of placing a vertex or an edge uniformly at random to one of the partitions. We then obtain the first approximation algorithms for the balanced edge-partition problem which for the case of no aggregation matches the best known approximation ratio for the balanced vertex-partition problem, and show that this remains to hold for the case with aggregation up to factor that is equal to the maximum in-degree of a vertex. We report results of an extensive empirical evaluation on a set of real-world graphs, which quantifies the benefits of edgevs. vertex-partition, and demonstrates efficiency of natural greedy online assignments for the balanced edge-partition problem with and with no aggregation.
(Show Context)

Citation Context

...em which was studied rather extensively, with the best known approximation guarantee of O( √ log k log n) for the edge-cut cost of a partition in k clusters of a size n graph [16], and with software tools being available off-the-shelf, e.g. METIS [11, 12]. An important requirement for graph partitioning at scale is to being able to produce a good-quality graph partition under restriction to make a single pass through the graph data, which is often referred to as streaming graph partition. Here, again, we find a number of approaches that have been studied for the vertex-partition problem, e.g. [30, 32, 23]. On the contrary, the study of streaming heuristics for the edge-partition problem is limited to the study of the PowerGraph heuristic introduced and studied in the original proposal [8] that advocates the use of edge partitions. In this paper we study the following two fundamental questions: Q1 What are the quantitative performance benefits of using the edge-partition approach as opposed to using the traditional vertex-partition approach? and Q2 What are the approximation guarantees for the edgepartition problem and is it possible to achieve a good average-case performance by using some natu...

Adaptive partitioning for large-scale dynamic graphs

by Luis M. Vaquero, Felix Cuadrado, Dionysios Logothetis, Claudio Martella - In proc. ICDCS (2014
"... Abstract—In the last years, large-scale graph processing has gained increasing attention, with most recent systems placing particular emphasis on latency. One possible technique to improve runtime performance in a distributed graph processing system is to reduce network communication. The most notab ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
Abstract—In the last years, large-scale graph processing has gained increasing attention, with most recent systems placing particular emphasis on latency. One possible technique to improve runtime performance in a distributed graph processing system is to reduce network communication. The most notable way to achieve this goal is to partition the graph by minimizing the num-ber of edges that connect vertices assigned to different machines, while keeping the load balanced. However, real-world graphs are highly dynamic, with vertices and edges being constantly added and removed. Carefully updating the partitioning of the graph to reflect these changes is necessary to avoid the introduction of an extensive number of cut edges, which would gradually worsen computation performance. In this paper we show that performance degradation in dynamic graph processing systems can be avoided by adapting continuously the graph partitions as the graph changes. We present a novel highly scalable adaptive partitioning strategy, and show a number of refinements that make it work under the constraints of a large-scale distributed system. The partitioning strategy is based on iterative vertex migrations, relying only on local information. We have implemented the technique in a graph processing system, and we show through three real-world scenarios how adapting graph partitioning reduces execution time by over 50 % when compared to commonly used hash-partitioning. I.
(Show Context)

Citation Context

...ed, but also implies a huge overhead in object deletion and creation. Recently more sophisticated techniques have been introduced to optimise graph reloading (assuming not many changes have occurred) =-=[19]-=-. These re-streaming mechanisms make a new pass of the graph every time adaptation is needed, which may not scale even when partition parallelisation is doable in separate workers (streaming the whole...

Systems for near real-time analysis of large-scale dynamic graphs

by Luis M. Vaquero, Felix Cuadrado, Matei Ripeanu , 2014
"... ar ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...niques such as [62] and execute them with some periodicity). More sophisticated restreaming techniques have recently been introduced to help reload the graph (assuming not many changes have occurred) =-=[49]-=-. Nishimura and Ugander show how to restream state of the art stream partitioning methods. These re-streaming mechanism make a new pass of the graph, which may not scale even when partition parallelis...

LogGP: A Log-based Dynamic Graph Partitioning Method

by Ning Xu, Lei Chen, Bin Cui
"... With the increasing availability and scale of graph data from Web 2.0, graph partitioning becomes one of efficient pre-processing techniques to balance the computing workload. Since the cost of partitioning the entire graph is strictly prohibitive, there are some recent tentative works towards strea ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
With the increasing availability and scale of graph data from Web 2.0, graph partitioning becomes one of efficient pre-processing techniques to balance the computing workload. Since the cost of partitioning the entire graph is strictly prohibitive, there are some recent tentative works towards streaming graph partitioning which can run faster, be easily paralleled, and be incrementally updated. Unfortunately, the experiments show that the running time of each parti-tioning is still unbalanced due to the variation of workload access pattens during the supersteps. In addition, the one-pass streaming partitioning result is not always satisfactory for the algorithms ’ local view of the graph. In this paper, we present LogGP, a log-based graph parti-tioning system that records, analyzes and reuses the histor-ical statistical information to refine the partitioning result. LogGP can be used as a middle-ware and deployed to many state-of-the-art paralleled graph processing systems easily. LogGP utilizes the historical partitioning results to gener-ate a hyper-graph and uses a novel hyper-graph streaming partitioning approach to generate a better initial streaming graph partitioning result. During the execution, the sys-tem uses running logs to optimize graph partitioning which prevents performance degradation. Moreover, LogGP can dynamically repartition the massive graphs in accordance with the structural changes. Extensive experiments con-ducted on a moderate size of computing cluster with real-world graph datasets demonstrate the superiority of our ap-proach against the state-of-the-art solutions. 1.
(Show Context)

Citation Context

... we also study the performance of HGR and SR individually. • Linear Deterministic Greedy (LDG) approach [23] is considered as one of best static streaming method, and Restreaming LDG (reLDG) approach =-=[19]-=- is extended to generate initial graph partitioning using the last streaming partitioning result. • CatchW [21] is a dynamic graph workload balancing approach for random initial partitioning, which is...

Online and On-demand Partitioning of Streaming Graphs

by Yannis Kotidis
"... Abstract—Many applications generate data that naturally leads to a graph representation for its modeling and analysis. A common approach to address the size and complexity of these graphs is to split them across a number of partitions, in a way that computations on them can be performed mostly local ..."
Abstract - Add to MetaCart
Abstract—Many applications generate data that naturally leads to a graph representation for its modeling and analysis. A common approach to address the size and complexity of these graphs is to split them across a number of partitions, in a way that computations on them can be performed mostly locally and in parallel in the resulting partitions. In this work, we present a framework that enables partitioning of evolving graphs whose elements (nodes and edges) are streamed in an arbitrary order. At a core of our techniques lies a Condensed Spanning Tree (CST) structure that summarizes the graph stream and permits computation of high-quality graph partitions both on-line and on-demand, without the need to ever look at the whole graph. The partitioning algorithm we present manages to create partitions from streaming graphs with low memory usage, but can also adapt partitions overtime based on different application needs such as minimizing cross-partition edges, balancing load across partitions, elastically adapting partitions based on a maximum load threshold and reducing migration cost. Our experiments with many different real and synthetic graphs demonstrate that our techniques manage to process and partition efficiently millions of graph nodes per second and also adapt them based on different requirements using only the information kept in the compressed CST structure, which can reduce the input graph size down to 1.6%. I.
(Show Context)

Citation Context

...n the available memory, sorting them in BFS order and then inserting them in the tree using Algorithm 1. The intuition behind the choice of BFS is that it preserves locality information in the stream =-=[12]-=-. IV. CREATING, MONITORING AND ADAPTING PARTITIONS Partitions in our framework are generated dynamically, while processing the graph stream. New nodes are placed ab c g f h d e Initial Graph Head (8) ...

Online Partitioning of Multi-Labeled Graphs∗

by Yannis Kotidis
"... Graph partitioning is an old problem that is finding renewed in-terest in the era of big, complex datasets and parallel computing frameworks that can benefit from a proper partitiong of big graph data across multiple nodes in a cluster. In this paper we look into a specific instance of the problem t ..."
Abstract - Add to MetaCart
Graph partitioning is an old problem that is finding renewed in-terest in the era of big, complex datasets and parallel computing frameworks that can benefit from a proper partitiong of big graph data across multiple nodes in a cluster. In this paper we look into a specific instance of the problem termed online graph partitioning that addresses the need to partition large graphs that do not fit in main memory. A neglected aspect of modern graph datasets is that real graphs have labels! Node labels may, for instance, correspond to categorical attributes (such as country, profession, participating groups, etc.) of the entities depicted by the vertices of the graph. Edge labels may represent different relationship types (e.g. “friend-of”, “likes”, etc.). In this work we first revisit the formulation of the graph partitioning problem for graphs with labels on both nodes and edges. We introduce “relation-cut”, as a new metric that ex-tends the traditional “edge-cut ” metric used in graph partitioning in order to take into account the existence of different edge-types. Then, we combine this metric with a novel “label-cut ” metric that takes into consideration the displacement of related nodes with sim-ilar labels across partitions. In our experiments we adapt two recent online partitioning algorithms for the new proposed metric and pro-vide a thorough evaluation on a variety of real and synthetic graphs. Our experiments demonstrate that the proposed technique balances the generated cuts on both relations and labels on the resulting par-titions. 1.
(Show Context)

Citation Context

... in practice. Most graph partitioning algorithms are static, meaning that they assume that the whole graph is available and fits in main memory. A recent breed of online graph partitioning algorithms =-=[18, 24, 25]-=- have emerged in order to allow partitioning of graphs that do not fit in a single machine and, thus, a static partitioner is not applicable. Unfortunately, online partitioning algorithms concentrate ...

Streaming Graph Partitioning in the Planted Partition Model

by Charalampos E. Tsourakakis
"... The sheer increase in the size of graph data has created a lot of interest into developing efficient distributed graph processing frameworks. Popular existing frameworks such as GraphLab and Pregel rely on balanced graph partition-ing in order to minimize communication and achieve work balance. In t ..."
Abstract - Add to MetaCart
The sheer increase in the size of graph data has created a lot of interest into developing efficient distributed graph processing frameworks. Popular existing frameworks such as GraphLab and Pregel rely on balanced graph partition-ing in order to minimize communication and achieve work balance. In this work we contribute to the recent research line of streaming graph partitioning [30, 31, 34] which computes an approximately balanced k-partitioning of the vertex set of a graph using a single pass over the graph stream using degree-based criteria. This graph partitioning framework is well tailored to processing large-scale and dynamic graphs. In this work we introduce the use of higher length walks for streaming graph partitioning and show that their use incurs a minor computational cost which can significantly improve the quality of the graph partition. We perform an average case analysis of our algorithm using the planted partition model [7, 25]. We complement the recent results of Stanton [30] by showing that our proposed method recovers the true partition with high probability even when the gap of the model tends to zero as the size of the graph grows. Further-more, among the wide number of choices for the length of the walks we show that the proposed length is optimal. Finally, we perform simulations which indicate that our asymptotic results hold even for small graph sizes.
(Show Context)

Citation Context

...rovided well-performing decision strategies for streaming graph partitioning. In the Appendix we prove that Fennel is NP-hard. Both [31] and [34] can be adapted to edge streams. Nishimura and Ugander =-=[27]-=- consider a variation of Fennel that allows multiple passes over the stream. It is worth mentioning that very recently Margo and Seltzer provided a state-of-art distributed streaming graph partitioner...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University