Results 1  10
of
52
PowerGraph: Distributed GraphParallel Computation on Natural Graphs
"... Largescale graphstructured computation is central to tasks ranging from targeted advertising to natural language processing and has led to the development of several graphparallel abstractions including Pregel and GraphLab. However, the natural graphs commonly found in the realworld have highly ..."
Abstract

Cited by 117 (4 self)
 Add to MetaCart
(Show Context)
Largescale graphstructured computation is central to tasks ranging from targeted advertising to natural language processing and has led to the development of several graphparallel abstractions including Pregel and GraphLab. However, the natural graphs commonly found in the realworld have highly skewed powerlaw degree distributions, which challenge the assumptions made by these abstractions, limiting performance and scalability. In this paper, we characterize the challenges of computation on natural graphs in the context of existing graphparallel abstractions. We then introduce the PowerGraph abstraction which exploits the internal structure of graph programs to address these challenges. Leveraging the PowerGraph abstraction we introduce a new approach to distributed graph placement and representation that exploits the structure of powerlaw graphs. We provide a detailed analysis and experimental evaluation comparing PowerGraph to two popular graphparallel systems. Finally, we describe three different implementation strategies for PowerGraph and discuss their relative merits with empirical evaluations on largescale realworld problems demonstrating order of magnitude gains. 1
GraphChi: Largescale Graph Computation On just a PC
 In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, OSDI’12
, 2012
"... Current systems for graph computation require a distributed computing cluster to handle very large realworld problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains c ..."
Abstract

Cited by 109 (6 self)
 Add to MetaCart
(Show Context)
Current systems for graph computation require a distributed computing cluster to handle very large realworld problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains challenging, especially to nonexperts. In this work, we present GraphChi, a diskbased system for computing efficiently on graphs with billions of edges. By using a wellknown method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumerlevel computer. We further extend GraphChi to support graphs that evolve over time, and demonstrate that, on a single computer, GraphChi can process over one hundred thousand graph updates per second, while simultaneously performing computation. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives. By repeating experiments reported for existing distributed systems, we show that, with only fraction of the resources, GraphChi can solve the same problems in very reasonable time. Our work makes largescale graph computation available to anyone with a modern PC. 1
GPS: A Graph Processing System ∗
"... GPS (for Graph Processing System) is a complete opensource system we developed for scalable, faulttolerant, and easytoprogram execution of algorithms on extremely large graphs. GPS is similar to Google’s proprietary Pregel system [MAB+ 11], with some useful additional functionality described in ..."
Abstract

Cited by 63 (3 self)
 Add to MetaCart
(Show Context)
GPS (for Graph Processing System) is a complete opensource system we developed for scalable, faulttolerant, and easytoprogram execution of algorithms on extremely large graphs. GPS is similar to Google’s proprietary Pregel system [MAB+ 11], with some useful additional functionality described in the paper. In distributed graph processing systems like GPS and Pregel, graph partitioning is the problem of deciding which vertices of the graph are assigned to which compute nodes. In addition to presenting the GPS system itself, we describe how we have used GPS to study the effects of different graph partitioning schemes. We present our experiments on the performance of GPS under different static partitioning schemes—assigning vertices to workers “intelligently ” before the computation starts—and with GPS’s dynamic repartitioning feature, which reassigns vertices to different compute nodes during the computation by observing their message sending patterns. 1
Parallel breadthfirst search on distributed memory systems
, 2011
"... Dataintensive, graphbased computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms for BreadthFirst Search (BFS), a key subroutine in several ..."
Abstract

Cited by 34 (9 self)
 Add to MetaCart
(Show Context)
Dataintensive, graphbased computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms for BreadthFirst Search (BFS), a key subroutine in several graph algorithms. We present two highlytuned parallel approaches for BFS on large parallel systems: a levelsynchronous strategy that relies on a simple vertexbased partitioning of the graph, and a twodimensional sparse matrix partitioningbased approach that mitigates parallel communication overhead. For both approaches, we also present hybrid versions with intranode multithreading. Our novel hybrid twodimensional algorithm reduces communication times by up to a factor of 3.5, relative to a common vertex based approach. Our experimental study identifies execution regimes in which these approaches will be competitive, and we demonstrate extremely high performance on leading distributedmemory parallel systems. For instance, for a 40,000core parallel execution on Hopper, an AMD MagnyCours based system, we achieve a BFS performance rate of 17.8 billion edge visits per second on an undirected graph of 4.3 billion vertices and 68.7 billion edges with skewed degree distribution. 1.
A Flexible OpenSource Toolbox for Scalable Complex Graph Analysis
, 2011
"... The Knowledge Discovery Toolbox (KDT) enables domain experts to perform complex analyses of huge datasets on supercomputers using a highlevel language without grappling with the difficulties of writing parallel code, calling parallel libraries, or becoming a graph expert. KDT provides a flexible Py ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
(Show Context)
The Knowledge Discovery Toolbox (KDT) enables domain experts to perform complex analyses of huge datasets on supercomputers using a highlevel language without grappling with the difficulties of writing parallel code, calling parallel libraries, or becoming a graph expert. KDT provides a flexible Python interface to a small set of highlevel graph operations; composing a few of these operations is often sufficient for a specific analysis. Scalability and performance are delivered by linking to a stateoftheart backend compute engine that scales from laptops to large HPC clusters. KDT delivers very competitive performance from a generalpurpose, reusable library for graphs on the order of 10 billion edges and greater. We demonstrate speedup of 1 and 2 orders of magnitude over PBGL and Pegasus, respectively, on some tasks. Examples from simple use cases and key graphanalytic benchmarks illustrate the productivity and performance realized by KDT users. Semantic graph abstractions provide both flexibility and high performance for realworld use cases. Graphalgorithm researchers benefit from the ability to develop algorithms quickly using KDT’s graph and underlying matrix abstractions for distributed memory. KDT is available as opensource code to foster experimentation.
GraphX: Graph Processing in a Distributed Dataflow Framework
 USENIX ASSOCIATION 11TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI ’14)
, 2014
"... In pursuit of graph processing performance, the systems community has largely abandoned generalpurpose distributed dataflow frameworks in favor of specialized graph processing systems that provide tailored programming abstractions and accelerate the execution of iterative graph algorithms. In thi ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
In pursuit of graph processing performance, the systems community has largely abandoned generalpurpose distributed dataflow frameworks in favor of specialized graph processing systems that provide tailored programming abstractions and accelerate the execution of iterative graph algorithms. In this paper we argue that many of the advantages of specialized graph processing systems can be recovered in a modern generalpurpose distributed dataflow system. We introduce GraphX, an embedded graph processing framework built on top of Apache Spark, a widely used distributed dataflow system. GraphX presents a familiar composable graph abstraction that is sufficient to express existing graph APIs, yet can be implemented using only a few basic dataflow operators (e.g., join, map, groupby). To achieve performance parity with specialized graph systems, GraphX recasts graphspecific optimizations as distributed join optimizations and materialized view maintenance. By leveraging advances in distributed dataflow frameworks, GraphX brings lowcost fault tolerance to graph processing. We evaluate GraphX on real workloads and demonstrate that GraphX achieves an order of magnitude performance gain over the base dataflow framework and matches the performance of specialized graph processing systems while enabling a wider range of computation.
Highly Parallel Sparse MatrixMatrix Multiplication
, 2010
"... Generalized sparse matrixmatrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an unbounded number of processors. Our algorithms are based on ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
Generalized sparse matrixmatrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an unbounded number of processors. Our algorithms are based on twodimensional block distribution of sparse matrices where serial sections use a novel hypersparse kernel for scalability. We give a stateoftheart MPI implementation of one of our algorithms. Our experiments show scaling up to thousands of processors on a variety of test scenarios.
Parallel Community Detection for Massive Graphs
"... Abstract. Tackling the current volume of graphstructured data requires parallel tools. We extend our work on analyzing such massive graph data with the first massively parallel algorithm for community detection that scales to current data sizes, scaling to graphs of over 122 million vertices and ne ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
Abstract. Tackling the current volume of graphstructured data requires parallel tools. We extend our work on analyzing such massive graph data with the first massively parallel algorithm for community detection that scales to current data sizes, scaling to graphs of over 122 million vertices and nearly 2 billion edges in under 7300 seconds on a massively multithreaded Cray XMT. Our algorithm achieves moderate parallel scalability without sacrificing sequential operational complexity. Community detection partitions a graph into subgraphs more densely connected within the subgraph than to the rest of the graph. We take an agglomerative approach similar to Clauset, Newman, and Moore’s sequential algorithm, merging pairs of connected intermediate subgraphs to optimize different graph properties. Working in parallel opens new approaches to high performance. On smaller data sets, we find the output’s modularity compares well with the standard sequential algorithms.
Communication Optimal Parallel Multiplication of Sparse Random Matrices
, 2013
"... Parallel algorithms for sparse matrixmatrix multiplication typically spend most of their time on interprocessor communication rather than on computation, and hardware trends predict the relative cost of communication will only increase. Thus, sparse matrix multiplication algorithms must minimize c ..."
Abstract

Cited by 10 (6 self)
 Add to MetaCart
Parallel algorithms for sparse matrixmatrix multiplication typically spend most of their time on interprocessor communication rather than on computation, and hardware trends predict the relative cost of communication will only increase. Thus, sparse matrix multiplication algorithms must minimize communication costs in order to scale to large processor counts. In this paper, we consider multiplying sparse matrices corresponding to ErdősRényi random graphs on distributedmemory parallel machines. We prove a new lower bound on the expected communication cost for a wide class of algorithms. Our analysis of existing algorithms shows that, while some are optimal for a limited range of matrix density and number of processors, none is optimal in general. We obtain two new parallel algorithms and prove that they match the expected communication cost lower bound, and hence they are optimal.
Scalable multithreaded community detection in social networks
 in Workshop on Multithreaded Architectures and Applications (MTAAP
, 2012
"... Abstract—The volume of existing graphstructured data requires improved parallel tools and algorithms. Finding communities, smaller subgraphs densely connected within the subgraph than to the rest of the graph, plays a role both in developing new parallel algorithms as well as opening smaller portion ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
Abstract—The volume of existing graphstructured data requires improved parallel tools and algorithms. Finding communities, smaller subgraphs densely connected within the subgraph than to the rest of the graph, plays a role both in developing new parallel algorithms as well as opening smaller portions of the data to current analysis tools. We improve performance of our parallel community detection algorithm by 20 % on the massively multithreaded Cray XMT, evaluate its performance on the nextgeneration Cray XMT2, and extend its reach to Intelbased platforms with OpenMP. To our knowledge, not only is this the first massively parallel community detection algorithm but also the only such algorithm that achieves excellent performance and good parallel scalability across all these platforms. Our implementation analyzes a moderate sized graph with 105 million vertices and 3.3 billion edges in around 500 seconds on a four processor, 80logicalcore Intelbased system and 1100 seconds on a 64processor Cray XMT2.