Results 1  10
of
20
The WebGraph Framework I: Compression Techniques
 In Proc. of the Thirteenth International World Wide Web Conference
, 2003
"... Studying web graphs is often dicult due to their large size. Recently, several proposals have been published about various techniques that allow to store a web graph in memory in a limited space, exploiting the inner redundancies of the web. The WebGraph framework is a suite of codes, algorithms ..."
Abstract

Cited by 158 (30 self)
 Add to MetaCart
Studying web graphs is often dicult due to their large size. Recently, several proposals have been published about various techniques that allow to store a web graph in memory in a limited space, exploiting the inner redundancies of the web. The WebGraph framework is a suite of codes, algorithms and tools that aims at making it easy to manipulate large web graphs. This papers presents the compression techniques used in WebGraph, which are centred around referentiation and intervalisation (which in turn are dual to each other).
Deeper inside pagerank
 Internet Mathematics
, 2004
"... Abstract. This paper serves as a companion or extension to the “Inside PageRank” paper by Bianchini et al. [Bianchini et al. 03]. It is a comprehensive survey of all issues associated with PageRank, covering the basic PageRank model, available and recommended solution methods, storage issues, existe ..."
Abstract

Cited by 142 (4 self)
 Add to MetaCart
Abstract. This paper serves as a companion or extension to the “Inside PageRank” paper by Bianchini et al. [Bianchini et al. 03]. It is a comprehensive survey of all issues associated with PageRank, covering the basic PageRank model, available and recommended solution methods, storage issues, existence, uniqueness, and convergence properties, possible alterations to the basic model, suggested alternatives to the traditional solution methods, sensitivity and conditioning, and finally the updating problem. We introduce a few new results, provide an extensive reference list, and speculate about exciting areas of future research. 1.
On Compressing Social Networks
"... Motivated by structural properties of the Web graph that support efficient data structures for in memory adjacency queries, we study the extent to which a large network can be compressed. Boldi and Vigna (WWW 2004), showed that Web graphs can be compressed down to three bits of storage per edge; we ..."
Abstract

Cited by 35 (1 self)
 Add to MetaCart
Motivated by structural properties of the Web graph that support efficient data structures for in memory adjacency queries, we study the extent to which a large network can be compressed. Boldi and Vigna (WWW 2004), showed that Web graphs can be compressed down to three bits of storage per edge; we study the compressibility of social networks where again adjacency queries are a fundamental primitive. To this end, we propose simple combinatorial formulations that encapsulate efficient compressibility of graphs. We show that some of the problems are NPhard yet admit effective heuristics, some of which can exploit properties of social networks such as link reciprocity. Our extensive experiments show that social networks and the Web graph exhibit vastly different compressibility characteristics.
A Fast and Compact Web Graph Representation
"... Compressed graphs representation has become an attractive research topic because of its applications in the manipulation of huge Web graphs in main memory. By far the best current result is the technique by Boldi and Vigna, which takes advantage of several particular properties of Web graphs. In t ..."
Abstract

Cited by 17 (12 self)
 Add to MetaCart
Compressed graphs representation has become an attractive research topic because of its applications in the manipulation of huge Web graphs in main memory. By far the best current result is the technique by Boldi and Vigna, which takes advantage of several particular properties of Web graphs. In this paper we show that the same properties can be exploited with a different and elegant technique, built on RePair compression, which achieves about the same space but much faster navigation of the graph. Moreover, the technique has the potential of adapting well to secondary memory. In addition, we introduce an approximate RePair version that works efficiently with limited main memory.
kNearest Neighbors in Uncertain Graphs
"... Complex networks, such as biological, social, and communication networks, often entail uncertainty, and thus, can be modeled as probabilistic graphs. Similar to the problem of similarity search in standard graphs, a fundamental problem for probabilistic graphs is to efficiently answer knearest neig ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
Complex networks, such as biological, social, and communication networks, often entail uncertainty, and thus, can be modeled as probabilistic graphs. Similar to the problem of similarity search in standard graphs, a fundamental problem for probabilistic graphs is to efficiently answer knearest neighbor queries (kNN), which is the problem of computing the k closest nodes to some specific node. In this paper we introduce a framework for processing kNN queries in probabilistic graphs. We propose novel distance functions that extend wellknown graph concepts, such as shortest paths. In order to compute them in probabilistic graphs, we design algorithms based on sampling. During kNN query processing we efficiently prune the search space using novel techniques. Our experiments indicate that our distance functions outperform previously used alternatives in identifying true neighbors in realworld biological data. We also demonstrate that our algorithms scale for graphs with tens of millions of edges. 1.
Do your worst to make the best: Paradoxical effects in pagerank incremental computations
 In Proceedings of the third Workshop on Web Graphs (WAW), volume 3243 of Lecture Notes in Computer Science
, 2004
"... Deciding which kind of visit accumulates highquality pages more quickly is one of the most often debated issue in the design of web crawlers. It is known that breadthfirst visits work well, as they tend to discover pages with high PageRank early on in the crawl. Indeed, this visit order is much be ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
Deciding which kind of visit accumulates highquality pages more quickly is one of the most often debated issue in the design of web crawlers. It is known that breadthfirst visits work well, as they tend to discover pages with high PageRank early on in the crawl. Indeed, this visit order is much better than depth first, which is in turn even worse than a random visit; nevertheless, breadthfirst can be superseded using an omniscient visit that chooses, at every step, the node of highest PageRank in the frontier. This paper discusses a related, and previously overlooked, measure of effectivity for crawl strategies: whether the graph obtained after a partial visit is in some sense representative of the underlying web graph as far as the computation of PageRank is concerned. More precisely, we are interested in determining how rapidly the computation of PageRank over the visited subgraph yields relative ranks that agree with the ones the nodes have in the complete graph; ranks are compared using Kendall’s τ. We describe a number of largescale experiments that show the following paradoxical effect: visits that gather PageRank more quickly (e.g., highestqualityfirst) are also those that tend to miscalculate PageRank. Finally, we perform the same kind of experimental analysis on some synthetic random graphs, generated using wellknown webgraph models: the results are almost opposite to those obtained on real web graphs. 1
The Scalable Hyperlink Store
 HT'09
, 2009
"... This paper describes the Scalable Hyperlink Store, a distributed inmemory “database ” for storing large portions of the web graph. SHS is an enabler for research on structural properties of the web graph as well as new linkbased ranking algorithms. Previous work on specialized hyperlink databases ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
This paper describes the Scalable Hyperlink Store, a distributed inmemory “database ” for storing large portions of the web graph. SHS is an enabler for research on structural properties of the web graph as well as new linkbased ranking algorithms. Previous work on specialized hyperlink databases focused on finding efficient compression algorithms for web graphs. By contrast, this work focuses on the systems issues of building such a database. Specifically, it describes how to build a hyperlink database that is fast, scalable, faulttolerant, and incrementally updateable.
Neighbor Query Friendly Compression of Social Networks ∗
"... Compressing social networks can substantially facilitate mining and advanced analysis of large social networks. Preferably, social networks should be compressed in a way that they still can be queried efficiently without decompression. Arguably, neighbor queries, which search for all neighbors of a ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Compressing social networks can substantially facilitate mining and advanced analysis of large social networks. Preferably, social networks should be compressed in a way that they still can be queried efficiently without decompression. Arguably, neighbor queries, which search for all neighbors of a query vertex, are the most essential operations on social networks. Can we compress social networks effectively in a neighbor query friendly manner, that is, neighbor queries still can be answered in sublinear time using the compression? In this paper, we develop an effective social network compression approach achieved by a novel Eulerian data structure using multiposition linearizations of directed graphs. Our method comes with a nontrivial theoretical bound on the compression rate. To the best of our
Theory and practice of triangle problems in very large (sparse (powerlaw)) graphs
"... Finding, counting and/or listing triangles (three vertices with three edges) in large graphs are natural fundamental problems, which received recently much attention because of their importance in complex network analysis. We provide here a detailed state of the art on these problems, in a unified w ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
Finding, counting and/or listing triangles (three vertices with three edges) in large graphs are natural fundamental problems, which received recently much attention because of their importance in complex network analysis. We provide here a detailed state of the art on these problems, in a unified way. We note that, until now, authors paid surprisingly little attention to space complexity, despite its both fundamental and practical interest. We give the space complexities of known algorithms and discuss their implications. Then we propose improvements of a known algorithm, as well as a new algorithm, which are time optimal for triangle listing and beats previous algorithms concerning space complexity. They have the additional advantage of performing better on powerlaw graphs, which we also study. We finally show with an experimental study that these two algorithms perform very well in practice, allowing to handle cases that were previously out of reach. 1 Introduction. A triangle in an undirected graph is a set of three vertices such that each possible edge between them is present in the graph. Following classical conventions, we call finding, counting and listing the problems of deciding if a given graph contains any triangle, counting the number of triangles in the graph, and listing
Compression of Web and Social Graphs supporting Neighbor and Community Queries
"... Motivated by the needs of mining and advanced analysis of large Web graphs and social networks, we study graph patterns that simultaneously provide compression and query opportunities, so that the compressed representation provides efficient support for search and mining queries. We first analyze pa ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Motivated by the needs of mining and advanced analysis of large Web graphs and social networks, we study graph patterns that simultaneously provide compression and query opportunities, so that the compressed representation provides efficient support for search and mining queries. We first analyze patterns used for Web graph compression while supporting neighbor queries. Our results show that composing edgereducing patterns with other methods achieves new space/time tradeoffs, in particular breaking the smallest known space barrier for Web graphs when supporting neighbor queries. Second, we propose a novel graph compression method based on representing communities with compact data structures. These offer competitive support for neighbor queries, but excel especially at answering community queries. As far as we know, ours is the first graph compression method supporting such a wide range of community queries.