Results 1 
6 of
6
Alldistances sketches, revisited: Hip estimators for massive graphs analysis
 PROC. 33RD ACM SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, ACM
, 2014
"... Graph datasets with billions of edges, such as social and Web graphs, are prevalent. To be feasible, computation on such large graphs should scale linearly with graph size. Alldistances sketches (ADSs) are emerging as a powerful tool for scalable computation of some basic properties of individual n ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Graph datasets with billions of edges, such as social and Web graphs, are prevalent. To be feasible, computation on such large graphs should scale linearly with graph size. Alldistances sketches (ADSs) are emerging as a powerful tool for scalable computation of some basic properties of individual nodes or the whole graph. ADSs were first proposed two decades ago (Cohen 1994) and more recent algorithms include ANF (Palmer, Gibbons, and Faloutsos 2002) and hyperANF (Boldi, Rosa, and Vigna 2011). A sketch of logarithmic size is computed for each node in the graph and the computation in total requires only a near linear number of edge relaxations. From the ADS of a node, we can estimate its neighborhood cardinalities (the number of nodes within some query distance) and closeness centrality. More generally we can estimate the distance distribution, effective diameter, similarities, and other parameters of the full graph. We make several contributions which facilitate a more effective use of ADSs for scalable analysis of massive graphs. We provide, for the first time, a unified exposition of ADS algorithms and applications. We present the Historic Inverse Probability (HIP) estimators which are applied to the ADS of a node to estimate a large natural class of queries including neighborhood cardinalities and closeness centralities. We show that our HIP estimators have at most half the variance of previous neighborhood cardinality estimators and that this is essentially optimal. Moreover, HIP obtains a polynomial improvement for more general queries and the estimators are simple, flexible, unbiased, and elegant. We apply HIP for approximate distinct counting on streams by comparing HIP and the original estimators applied to the HyperLogLog MinHash sketches (Flajolet et al. 2007). We demonstrate significant improvement in estimation quality for this stateoftheart practical algorithm and also illustrate the ease of applying HIP. Finally, we study the quality of ADS estimation of distance ranges, generalizing the nearlinear time factor2 approximation of the diameter.
Sketchbased influence maximization and computation: Scaling up with guarantees
 In International Conference on Information and Knowledge Management (ICIKM
, 2014
"... Propagation of contagion through networks is a fundamental process. It is used to model the spread of information, influence, or a viral infection. Diffusion patterns can be specified by a probabilistic model, such as Independent Cascade (IC), or captured by a set of representative traces. Basic co ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Propagation of contagion through networks is a fundamental process. It is used to model the spread of information, influence, or a viral infection. Diffusion patterns can be specified by a probabilistic model, such as Independent Cascade (IC), or captured by a set of representative traces. Basic computational problems in the study of diffusion are influence queries (determining the potency of a specified seed set of nodes) and Influence Maximization (identifying the most influential seed set of a given size). Answering each influence query involves many edge traversals, and does not scale when there are many queries on very large graphs. The gold standard for Influence Maximization is the greedy algorithm, which iteratively adds to the seed set a node maximizing the marginal gain in influence. Greedy has a guaranteed approximation ratio of at least (1 − 1/e) and actually produces a sequence of nodes, with each prefix having approximation guarantee with respect to the samesize optimum. Since Greedy does not scale well beyond a few million edges, for larger inputs one must currently use either heuristics or alternative algorithms designed for a prespecified small seed set size. We develop a novel sketchbased design for influence computation. Our greedy Sketchbased Influence Maximization (SKIM) algorithm scales to graphs with billions of edges, with one to two orders of magnitude speedup over the best greedy methods. It still has a guaranteed approximation ratio, and in practice its quality nearly matches that of exact greedy. We also present influence oracles, which use lineartime preprocessing to generate a small sketch for each node, allowing the influence of any seed set to be quickly answered from the sketches of its nodes. 1
What you can do with coordinated samples
 In The 17th. International Workshop on Randomization and Computation (RANDOM
, 2013
"... ar ..."
(Show Context)
Estimation for monotone sampling: Competitiveness and customization
 IN PODC. ACM
, 2014
"... ..."
(Show Context)
Scalable Facility Location for Massive Graphs on Pregellike Systems
, 2015
"... We propose a new scalable algorithm for the facilitylocation problem. We study the graph setting, where the cost of serving a client from a facility is represented by the shortestpath distance on a graph. This setting is applicable to various problems arising in the Web and social media, and allo ..."
Abstract
 Add to MetaCart
We propose a new scalable algorithm for the facilitylocation problem. We study the graph setting, where the cost of serving a client from a facility is represented by the shortestpath distance on a graph. This setting is applicable to various problems arising in the Web and social media, and allows to leverage the inherent sparsity of such graphs. To obtain truly scalable performance, we design a parallel algorithm that operates on clusters of sharednothing machines. In particular, we target modern Pregellike architectures, and we implement our algorithm on Apache Giraph. Our work builds upon previous results: a facility location algorithm for the PRAM model, a recent distancesketching method for massive graphs, and a parallel algorithm to finding maximal independent sets. The main challenge is to adapt those building blocks to the distributed graph setting, while maintaining the approximation guarantee and limiting the amount of distributed communication. Extensive experimental results show that our algorithm scales gracefully to graphs with billions of edges, while, in terms of quality, being competitive with stateoftheart sequential algorithms.
GRECS: Graph Encryption for Approximate Shortest Distance Queries
"... We propose graph encryption schemes that efficiently support approximate shortest distance queries on largescale encrypted graphs. Shortest distance queries are one of the most fundamental graph operations and have a wide range of applications. Using such graph encryption schemes, a client can ou ..."
Abstract
 Add to MetaCart
(Show Context)
We propose graph encryption schemes that efficiently support approximate shortest distance queries on largescale encrypted graphs. Shortest distance queries are one of the most fundamental graph operations and have a wide range of applications. Using such graph encryption schemes, a client can outsource largescale privacysensitive graphs to an untrusted server without losing the ability to query it. Other applications include encrypted graph databases and controlled disclosure systems. We propose GRECS (stands for GRaph EnCryption for approximate Shortest distance queries) which includes three oracle encryption schemes that are provably secure against any semihonest server. Our first construction makes use of only symmetrickey operations, resulting in a computationallyefficient construction. Our second scheme makes use of somewhathomomorphic encryption and is less computationallyefficient but achieves optimal communication complexity (i.e. uses a minimal amount of bandwidth). Finally, our third scheme is both computationallyefficient and achieves optimal communication complexity at the cost of a small amount of additional leakage. We implemented and evaluated the efficiency of our constructions experimentally. The experiments demonstrate that our schemes are efficient and can be applied to graphs that scale up to 1.6 million nodes and 11 million edges.