Results 1  10
of
97
The WebGraph Framework I: Compression Techniques
 In Proc. of the Thirteenth International World Wide Web Conference
, 2003
"... Studying web graphs is often dicult due to their large size. Recently, several proposals have been published about various techniques that allow to store a web graph in memory in a limited space, exploiting the inner redundancies of the web. The WebGraph framework is a suite of codes, algorithms ..."
Abstract

Cited by 247 (32 self)
 Add to MetaCart
(Show Context)
Studying web graphs is often dicult due to their large size. Recently, several proposals have been published about various techniques that allow to store a web graph in memory in a limited space, exploiting the inner redundancies of the web. The WebGraph framework is a suite of codes, algorithms and tools that aims at making it easy to manipulate large web graphs. This papers presents the compression techniques used in WebGraph, which are centred around referentiation and intervalisation (which in turn are dual to each other).
The Web as a graph
, 2000
"... The pages and hyperlinks of the WorldWide Web maybe viewed as nodes and edges in a directed graph. This graph has about a billion nodes today,several billion links, and appears to grow exponentially with time. There are many reasonsmathematical, sociological, and commercialfor studying the e ..."
Abstract

Cited by 222 (3 self)
 Add to MetaCart
The pages and hyperlinks of the WorldWide Web maybe viewed as nodes and edges in a directed graph. This graph has about a billion nodes today,several billion links, and appears to grow exponentially with time. There are many reasonsmathematical, sociological, and commercialfor studying the evolution of this graph. We first review a set of algorithms that operate on the Web graph, addressing problems from Web search, automatic community discovery, and classification. We then recall a number of measurements and properties of the Web graph. Noting that traditional random graph models do not explain these observations, we propose a new family of random graph models.
A General Model of Web Graphs
, 2003
"... We describe a very general model of a random graph process whose proportional degree sequence obeys a power law. Such laws have recently been observed in graphs associated with the world wide web. ..."
Abstract

Cited by 105 (6 self)
 Add to MetaCart
(Show Context)
We describe a very general model of a random graph process whose proportional degree sequence obeys a power law. Such laws have recently been observed in graphs associated with the world wide web.
A survey on pagerank computing
 Internet Mathematics
, 2005
"... Abstract. This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. T ..."
Abstract

Cited by 90 (0 self)
 Add to MetaCart
Abstract. This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. This defines the importance of the model and the data structures that underly PageRank processing. Computing even a single PageRank is a difficult computational task. Computing many PageRanks is a much more complex challenge. Recently, significant effort has been invested in building sets of personalized PageRank vectors. PageRank is also used in many diverse applications other than ranking. We are interested in the theoretical foundations of the PageRank formulation, in the acceleration of PageRank computing, in the effects of particular aspects of web graph structure on the optimal organization of computations, and in PageRank stability. We also review alternative models that lead to authority indices similar to PageRank and the role of such indices in applications other than web search. We also discuss linkbased search personalization and outline some aspects of PageRank infrastructure from associated measures of convergence to link preprocessing. 1.
On Compressing Social Networks
"... Motivated by structural properties of the Web graph that support efficient data structures for in memory adjacency queries, we study the extent to which a large network can be compressed. Boldi and Vigna (WWW 2004), showed that Web graphs can be compressed down to three bits of storage per edge; we ..."
Abstract

Cited by 80 (2 self)
 Add to MetaCart
(Show Context)
Motivated by structural properties of the Web graph that support efficient data structures for in memory adjacency queries, we study the extent to which a large network can be compressed. Boldi and Vigna (WWW 2004), showed that Web graphs can be compressed down to three bits of storage per edge; we study the compressibility of social networks where again adjacency queries are a fundamental primitive. To this end, we propose simple combinatorial formulations that encapsulate efficient compressibility of graphs. We show that some of the problems are NPhard yet admit effective heuristics, some of which can exploit properties of social networks such as link reciprocity. Our extensive experiments show that social networks and the Web graph exhibit vastly different compressibility characteristics.
Graph summarization with bounded error
 In SIGMOD 2008: Proceedings of the 2008 ACM SIGMOD International Conference on Management of data
, 2008
"... We propose a highly compact twopart representation of a given graph G consisting of a graph summary and a set of corrections. The graph summary is an aggregate graph in which each node corresponds to a set of nodes in G, and each edge represents the edges between all pair of nodes in the two sets. ..."
Abstract

Cited by 70 (7 self)
 Add to MetaCart
(Show Context)
We propose a highly compact twopart representation of a given graph G consisting of a graph summary and a set of corrections. The graph summary is an aggregate graph in which each node corresponds to a set of nodes in G, and each edge represents the edges between all pair of nodes in the two sets. On the other hand, the corrections portion specifies the list of edgecorrections that should be applied to the summary to recreate G. Our representations allow for both lossless and lossy graph compression with bounds on the introduced error. Further, in combination with the MDL principle, they yield highly intuitive coarselevel summaries of the input graph G. We develop algorithms to construct highly compressed graph representations with small sizes and guaranteed accuracy, and validate our approach through an extensive set of experiments with multiple reallife graph data sets. To the best of our knowledge, this is the first work to compute graph summaries using the MDL principle, and use the summaries (along with corrections) to compress graphs with bounded error.
Compressing the graph structure of the web
 In IEEE Data Compression Conference (DCC
, 2001
"... A large amount of research has recently focused on the graph structure (or link structure) of the World Wide Web. This structure has proven to be extremely useful for improving the performance of search engines and other tools for navigating the web. However, since the graphs in these scenarios invo ..."
Abstract

Cited by 59 (2 self)
 Add to MetaCart
(Show Context)
A large amount of research has recently focused on the graph structure (or link structure) of the World Wide Web. This structure has proven to be extremely useful for improving the performance of search engines and other tools for navigating the web. However, since the graphs in these scenarios involve hundreds of millions of nodes and even more edges, highly spaceefficient data structures are needed to fit the data in memory. A first step in this direction was done by the DEC Connectivity Server, which stores the graph in compressed form. In this paper, we describe techniques for compressing the graph structure of the web, and give experimental results of a prototype implementation. We attempt to exploit a variety of different sources of compressibility of these graphs and of the associated set of URLs in order to obtain good compression performance on a large web graph. 1
The Link Database: Fast Access to Graphs of the Web
"... ... graph where URLs are nodes and hyperlinks are directed edges. The Link Database provides fast access to the hyperlinks. To support a wide range of graph algorithms, we find it important to fit the Link Database into memory. In the first version of the Link Database, we achieved this fit by using ..."
Abstract

Cited by 46 (2 self)
 Add to MetaCart
... graph where URLs are nodes and hyperlinks are directed edges. The Link Database provides fast access to the hyperlinks. To support a wide range of graph algorithms, we find it important to fit the Link Database into memory. In the first version of the Link Database, we achieved this fit by using machines with lots of memory (8GB), and storing each hyperlink in 32 bits. However, this approach was limited to roughly 100 million Web pages. This paper presents techniques to compress the links to accommodate larger graphs. Our techniques combine wellknown compression methods with methods that depend on the properties of the web graph. The first compression technique takes advantage of the fact that most hyperlinks on most Web pages point to other pages on the same host as the page itself. The second technique takes advantage of the fact that many pages on the same host share hyperlinks, that is, they tend to point to a common set of pages. Together, these techniques reduce space requirements to under 6 bits per link. While (de)compression adds latency to the hyperlink access time, we can still compute the strongly connected components of a 6 billionedge graph in under 20 minutes and run applications such as Kleinberg's HITS in real time. This paper describes our techniques for compressing the Link Database, and provides performance numbers for compression ratios and decompression speed.
Compact Representations of Separable Graphs
 In Proceedings of the Annual ACMSIAM Symposium on Discrete Algorithms
, 2003
"... We consider the problem of representing graphs compactly while supporting queries e#ciently. In particular we describe a data structure for representing nvertex unlabeled graphs that satisfy an O(n )separator theorem, c < 1. The structure uses O(n) bits, and supports adjacency and degree qu ..."
Abstract

Cited by 42 (11 self)
 Add to MetaCart
(Show Context)
We consider the problem of representing graphs compactly while supporting queries e#ciently. In particular we describe a data structure for representing nvertex unlabeled graphs that satisfy an O(n )separator theorem, c < 1. The structure uses O(n) bits, and supports adjacency and degree queries in constant time, and neighbor listing in constant time per neighbor. This generalizes previous results for graphs with constant genus, such as planar graphs.