Results 1  10
of
31
GraphChi: Largescale Graph Computation On just a PC
 In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, OSDI’12
, 2012
"... Current systems for graph computation require a distributed computing cluster to handle very large realworld problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains c ..."
Abstract

Cited by 115 (6 self)
 Add to MetaCart
(Show Context)
Current systems for graph computation require a distributed computing cluster to handle very large realworld problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains challenging, especially to nonexperts. In this work, we present GraphChi, a diskbased system for computing efficiently on graphs with billions of edges. By using a wellknown method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumerlevel computer. We further extend GraphChi to support graphs that evolve over time, and demonstrate that, on a single computer, GraphChi can process over one hundred thousand graph updates per second, while simultaneously performing computation. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives. By repeating experiments reported for existing distributed systems, we show that, with only fraction of the resources, GraphChi can solve the same problems in very reasonable time. Our work makes largescale graph computation available to anyone with a modern PC. 1
Global convergence of stochastic gradient descent for some nonconvex matrix problems. arXiv preprint arXiv:1411.1134,
, 2014
"... Abstract Stochastic gradient descent (SGD) on a lowrank factorization ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract Stochastic gradient descent (SGD) on a lowrank factorization
Who to Follow and Why: Link Prediction with Explanations
"... User recommender systems are a key component in any online social networking platform: they help the users growing their network faster, thus driving engagement and loyalty. In this paper we study link prediction with explanations for user recommendation in social networks. For this problem we pro ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
User recommender systems are a key component in any online social networking platform: they help the users growing their network faster, thus driving engagement and loyalty. In this paper we study link prediction with explanations for user recommendation in social networks. For this problem we propose WTFW (“Who to Follow and Why”), a stochastic topic model for link prediction over directed and nodesattributed graphs. Our model not only predicts links, but for each predicted link it decides whether it is a“topical” or a “social ” link, and depending on this decision it produces a di↵erent type of explanation. A topical link is recommended between a user interested in a topic and a user authoritative in that topic: the explanation in this case is a set of binary features describing the topic responsible of the link creation. A social link is recommended between users which share a large social neighborhood: in this case the explanation is the set of neighbors which are more likely to be responsible for the link creation. Our experimental assessment on realworld data confirms the accuracy of WTFW in the link prediction and the quality of the associated explanations.
The More the Merrier: Efficient MultiSource Graph Traversal
"... Graph analytics on social networks, Web data, and communication networks has been widely used in a plethora of applications. Many graph analytics algorithms are based on breadthfirst search (BFS) graph traversal, which is not only timeconsuming for large datasets but also involves much redundant ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Graph analytics on social networks, Web data, and communication networks has been widely used in a plethora of applications. Many graph analytics algorithms are based on breadthfirst search (BFS) graph traversal, which is not only timeconsuming for large datasets but also involves much redundant computation when executed multiple times from different start vertices. In this paper, we propose MultiSource BFS (MSBFS), an algorithm that is designed to run multiple concurrent BFSs over the same graph on a single CPU core while scaling up as the number of cores increases. MSBFS leverages the properties of smallworld networks, which apply to many realworld graphs, and enables efficient graph traversal that: (i) shares common computation across concurrent BFSs; (ii) greatly reduces the number of random memory accesses; and (iii) does not incur synchronization costs. We demonstrate how a real graph analytics application—allvertices closeness centrality—can be efficiently solved with MSBFS. Furthermore, we present an extensive experimental evaluation with both synthetic and real datasets, including Twitter and Wikipedia, showing that MSBFS provides almost linear scalability with respect to the number of cores and excellent scalability for increasing graph sizes, outperforming stateoftheart BFS algorithms by more than one order of magnitude when running a large number of BFSs. 1.
NScale: Neighborhoodcentric LargeScale Graph Analytics
 in the Cloud,” http://arxiv.org/abs/1405.1499
, 2014
"... There is an increasing interest in executing rich and complex analysis tasks over largescale graphs, many of which require processing and reasoning about a large number of multihop neighborhoods or subgraphs in the graph. Examples of such tasks include ego network analysis, motif counting, findi ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
There is an increasing interest in executing rich and complex analysis tasks over largescale graphs, many of which require processing and reasoning about a large number of multihop neighborhoods or subgraphs in the graph. Examples of such tasks include ego network analysis, motif counting, finding social circles, personalized recommendations, link prediction, anomaly detection, analyzing influence cascades, and so on. These tasks are not well served by the existing vertexcentric graph processing frameworks, whose computation and execution models limit the user program to directly access the state of a single vertex; this results in high communication, scheduling, and memory overheads in executing such tasks using those frameworks. Further, most existing graph processing frameworks typically ignore the challenges in extracting the relevant portion of the graph that an analysis task needs, and loading it
Improving User Topic Interest Profiles by Behavior Factorization
"... Many recommenders aim to provide relevant recommendations to users by building personal topic interest profiles and then using these profiles to find interesting contents for the user. In social media, recommender systems build user profiles by directly combining users ’ topic interest signals fro ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Many recommenders aim to provide relevant recommendations to users by building personal topic interest profiles and then using these profiles to find interesting contents for the user. In social media, recommender systems build user profiles by directly combining users ’ topic interest signals from a wide variety of consumption and publishing behaviors, such as social media posts they authored, commented on, +1’d or liked. Here we propose to separately model users ’ topical interests that come from these various behavioral signals in order to construct better user profiles. Intuitively, since publishing a post requires more effort, the topic interests coming from publishing signals should be more accurate of a user’s central interest than, say, a simple gesture such as a +1. By separating a single user’s interest profile into several behavioral profiles, we obtain better and cleaner topic interest signals, as well as enabling topic prediction for different types of behavior, such as topics that the user might +1 or comment on, but might never write a post on that topic. To do this at large scales in Google+, we employed matrix factorization techniques to model each user’s behaviors as a separate example entry in the input userbytopic matrix. Using this technique, which we call "behavioral factorization", we implemented and built a topic recommender predicting user’s topical interests using their actions within Google+. We experimentally showed that we obtained better and cleaner signals than baseline methods, and are able to more accurately predict topic interests as well as achieve better coverage.
The energy case for graph processing on hybrid cpu and gpu systems
 In Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
, 2013
"... This paper investigates the power, energy, and performance characteristics of largescale graph processing on hybrid (i.e., CPU and GPU) singlenode systems. Graph processing can be accelerated on hybrid systems by properly mapping the graphlayout to processing units, such that the algorithmic task ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
This paper investigates the power, energy, and performance characteristics of largescale graph processing on hybrid (i.e., CPU and GPU) singlenode systems. Graph processing can be accelerated on hybrid systems by properly mapping the graphlayout to processing units, such that the algorithmic tasks exercise each of the units where they perform best. However, the GPUs have much higher Thermal Design Power (TDP), thus their impact on the overall energy consumption is unclear. Our evaluation using large realworld graphs and synthetic graphs as large as 1 billion vertices and 16 billion edges shows that a hybrid system is efficient in terms of both timetosolution and energy. Categories and Subject Descriptors
FASTPPR: Scaling Personalized PageRank Estimation for Large Graphs
"... We propose a new algorithm, FASTPPR, for the SignificantPageRank problem: given input nodes s, t in a directed graph and threshold δ, decide if the Personalized PageRank from s to t is at least δ. Existing algorithms for this problem have a runningtime of Ω(1/δ); this makes them unsuitable for u ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We propose a new algorithm, FASTPPR, for the SignificantPageRank problem: given input nodes s, t in a directed graph and threshold δ, decide if the Personalized PageRank from s to t is at least δ. Existing algorithms for this problem have a runningtime of Ω(1/δ); this makes them unsuitable for use in large socialnetworks for applications requiring values of δ = O(1/n). FASTPPR is based on a bidirectional search and requires no preprocessing of the graph. It has a provable average runningtime guarantee of Õ( d/δ) (where d is the average indegree of the graph). We complement this result with an Ω(1/ δ) lower bound for SignificantPageRank, showing that the dependence on δ cannot be improved. We perform a detailed empirical study on numerous massive graphs showing that FASTPPR dramatically outperforms existing algorithms. For example, with target nodes sampled according to popularity, on the 2010 Twitter graph with 1.5 billion edges, FASTPPR has a 20 factor speedup over the state of the art. Furthermore, an enhanced version of FASTPPR has a 160 factor speedup on the Twitter graph, and is at least 20 times faster on all our candidate graphs.
Search and Retrieval–Information Filtering
"... Random walks on graphs are a staple of many ranking and recommendation algorithms. Simulating random walks on a graph which fits in memory is trivial, but massive graphs pose a problem: the latency of following walks across network in a cluster or loading nodes from disk ondemand renders basic ra ..."
Abstract
 Add to MetaCart
(Show Context)
Random walks on graphs are a staple of many ranking and recommendation algorithms. Simulating random walks on a graph which fits in memory is trivial, but massive graphs pose a problem: the latency of following walks across network in a cluster or loading nodes from disk ondemand renders basic random walk simulation unbearably inefficient. In this work we propose DrunkardMob1, a new algorithm for simulating hundreds of millions, or even billions, of random walks on massive graphs, on just a single PC or laptop. Instead of simulating one walk a time it processes millions of them in parallel, in a batch. Based on DrunkardMob and GraphChi [19], we further propose a framework for easily expressing scalable algorithms based on graph walks.