## Permuting Web and Social Graphs

Citations: | 1 - 0 self |

### BibTeX

@MISC{Boldi_permutingweb,

author = {Paolo Boldi and Massimo Santini and Sebastiano Vigna},

title = {Permuting Web and Social Graphs },

year = {}

}

### OpenURL

### Abstract

Since the first investigations on web graph compression, it has been clear that the ordering of the nodes of the graph has a fundamental influence on the compression rate (usually expressed as the number of bits per link). The authors of the LINK database [2], for instance, investigated three different approaches: an extrinsic ordering (URL ordering) and two intrinsic orderings based on the rows of the adjacency matrix (lexicographic and Gray code); they concluded that URL ordering has many advantages in spite of a small penalty in compression. In this paper we approach this issue in a more systematic way, testing some known orderings and proposing some new ones. Our experiments are made in the WebGraph framework [3], and show that the compression technique and the structure of the graph can produce significantly different results. In particular, we show that for the transposed web graph URL ordering is significantly less effective, and that some new mixed orderings combining host information and Gray/lexicographic orderings outperform all previous methods: in some large transposed graphs they yield the quite incredible compression rate of 1 bit per link. We experiment these simple ideas on some non-web social networks and obtain results that are extremely promising and are very close to those recently achieved using shingle orderings and backlinks compression schemes [4].

### Citations

195 | Representing Web Graphs - Raghavan, Garcia-Molina |

182 |
Space Efficient Static Trees and Graphs
- Jacobson
- 1989
(Show Context)
Citation Context ...ble if it is able to compress efficiently typical instances. On the other hand, compressed data structures are in a sense the empirical counterpart of succinct data structures (introduced by Jacobson =-=[14]-=-), which store data using a number of bits equal to the information-theoretical lower bound, providing access time asymptotically equivalent to a standard data structure; in particular, a compressed d... |

180 | The Webgraph framework I: Compression techniques
- Boldi, Vigna
- 2004
(Show Context)
Citation Context ...suggests a number of possible practical improvements over the bit list idea, in particular that of run-length encoding such a list, which is similar to the inclusion-exclusion blocks used by WebGraph =-=[3]-=-. The results of [2] were obtained on large datasets on which they obtain a compression rate of about 5.5 bits/link: this value includes the offset data structure that allows for random access, but it... |

122 |
The Connectivity Server: Fast access to linkage information on the Web
- Bharat, Bröder, et al.
- 1998
(Show Context)
Citation Context ...l number of bits. This solution is usually considered good enough for all practical purposes, and has the extra advantage that even the URL list can be compressed very efficiently via prefix omission =-=[6]-=-. Analogous techniques, which use additional information beside the web graph itself, are called extrinsic. It is natural to wonder if there is an alternative way of finding a “good ordering” of the n... |

85 | T owards compressing web graphs
- Adler, Mitzenmacher
- 2001
(Show Context)
Citation Context ...ric difference between the two successor sets is written: a bit list records which successors of the referenced list are actually used. The idea of using similarity was also explored independently in =-=[16]-=-, which also presents some negative results about the complexity of finding the “best possible” node to copy from, and suggests a number of possible practical improvements over the bit list idea, in p... |

79 |
Succinct representations of graphs
- Turan
- 1984
(Show Context)
Citation Context ...ieve this lower bound, and are henceforth space-optimal, at least in the average case. For directed labelled graphs, the adjacency matrix is optimal (as there are 2n2 directed labelled graphs). Turán =-=[12]-=-, in one of the early papers on the subject, described a representation for planar graphs and posed the problem of encoding general unlabelled graphs (i.e., graphs that should be considered up to auto... |

53 | Compressing the graph structure of the Web
- Suel, Yuan
- 2001
(Show Context)
Citation Context .... In its simplest form, this solution may consist in classifying the arcs of the graphs into two or more classes and in compressing them differently; an early example of this approach is attempted in =-=[18]-=-, that distinguishes between global frequent links (interhost links towards pages with large indegree, that are Huffman-coded), global absolute links (the other interhost links, stored using a Golomb ... |

38 | R.G.: The Link Database: Fast Access to Graphs of the Web
- Randall, Stata, et al.
- 2002
(Show Context)
Citation Context ... it has been clear that the ordering of the nodes of the graph has a fundamental influence on the compression rate (usually expressed as the number of bits per link). The authors of the LINK database =-=[2]-=-, for instance, investigated three different approaches: an extrinsic ordering (URL ordering) and two intrinsic orderings based on the rows of the adjacency matrix (lexicographic and Gray code); they ... |

38 | On compressing social networks
- Chierichetti, Kumar, et al.
- 2009
(Show Context)
Citation Context ...se simple ideas on some non-web social networks and obtain results that are extremely promising and are very close to those recently achieved using shingle orderings and backlinks compression schemes =-=[4]-=-. 1 Introduction The web graph [5] is a directed graph whose nodes correspond to URLs, with an arc from x to y whenever the page denoted by x contains a hyperlink toward page denoted by y; more loosel... |

36 | Compact Representations of Separable Graphs
- Blandford, Blelloch, et al.
- 2003
(Show Context)
Citation Context ...tructures), given by a simple counting argument: any compression scheme for 5 Of course, it is possible to devise intrinsic methods that do not necessarily depend on some ordering; see, for instance, =-=[11]-=-. 3objects of a universe � cannot use less than log2 |�| bits on the average; in some cases, it is actually possible to design compression schemes that achieve this lower bound, and are henceforth sp... |

30 | Succinct representation of general unlabeled graphs
- Naor
- 1990
(Show Context)
Citation Context ...presentation for planar graphs and posed the problem of encoding general unlabelled graphs (i.e., graphs that should be considered up to automorphisms); a solution was found a few years later by Naor =-=[13]-=-. Albeit interesting, this kind of results is of limited practical impact for two reasons: no efficient method is usually provided to access the data without decompressing it entirely, and moreover in... |

25 |
C.P.: Inverted file compression through document identifier reassignment
- Shieh, Chen, et al.
- 2003
(Show Context)
Citation Context .... we store, using a variable-length bit encoding, x0, x1 − x0, x2 − x1, . . . . 3We note that the same approach has been shown to be fruitful in the compression of inverted indices; see, for instance =-=[7, 8, 9, 10]-=-. 4In this description we are ignoring the problem that π is not unique if A contains the same row many times. 2external information (such as the URLs of each node). 5 Another possible solution to th... |

25 |
Efficient storage retrieval by content and address of static files
- Elias
- 1974
(Show Context)
Citation Context ...ess, however, some more data must be loaded into memory: usually, a list of pointers into the bitstream representing the graph. This list is monotone, so it can be represented using Elias–Fano coding =-=[21, 22]-=-, requiring around 2 + log ℓ bits per pointer, where ℓ is the average number of bits per node. Since the overall space usage is of lower order with respect to the rest of the data (in practice, it is ... |

23 | Sorting out the document identifier assignment problem
- Silvestri
(Show Context)
Citation Context .... we store, using a variable-length bit encoding, x0, x1 − x0, x2 − x1, . . . . 3We note that the same approach has been shown to be fruitful in the compression of inverted indices; see, for instance =-=[7, 8, 9, 10]-=-. 4In this description we are ignoring the problem that π is not unique if A contains the same row many times. 2external information (such as the URLs of each node). 5 Another possible solution to th... |

17 | Document identifier reassignment through dimensionality reduction
- Blanco, Barreiro
- 2005
(Show Context)
Citation Context .... we store, using a variable-length bit encoding, x0, x1 − x0, x2 − x1, . . . . 3We note that the same approach has been shown to be fruitful in the compression of inverted indices; see, for instance =-=[7, 8, 9, 10]-=-. 4In this description we are ignoring the problem that π is not unique if A contains the same row many times. 2external information (such as the URLs of each node). 5 Another possible solution to th... |

13 |
On the number of bits required to implement an associative memory
- Fano
- 1971
(Show Context)
Citation Context ...ess, however, some more data must be loaded into memory: usually, a list of pointers into the bitstream representing the graph. This list is monotone, so it can be represented using Elias–Fano coding =-=[21, 22]-=-, requiring around 2 + log ℓ bits per pointer, where ℓ is the average number of bits per node. Since the overall space usage is of lower order with respect to the rest of the data (in practice, it is ... |

10 | Permuting Web Graphs
- Boldi, Santini, et al.
- 2009
(Show Context)
Citation Context ...IN Project “Automi e linguaggi formali: aspetti matematici e applicativi”, and by MIUR PRIN Project “Web Ram: web retrieval and mining”. A preliminary version of the results in this paper appeared in =-=[1]-=-. 1although sparse its adjacency matrix is way too big to fit in main memory, even on large computers. To overcome this technical difficulty, one can access the graph from external memory, which howe... |

10 | L.: Efficient and Simple Encodings for the Web Graph
- Guillaume, Latapy, et al.
- 2002
(Show Context)
Citation Context ... a Compaq with 2 Ghz CPUs). Apart for [2], there was a flurry of activities about the same problem in the early 2000s, some of which delivered ideas that were later used in other frameworks. In 2002, =-=[17]-=- considered two solutions, both based on the idea of grouping the successor lists into blocks and then compressing each block separately, either by gzipping it or by writing each successor as a differ... |

9 | S.: Codes for the world wide web
- Boldi, Vigna
- 2005
(Show Context)
Citation Context ...of arcs copied by a previous successor list is represented by inclusionexclusion blocks; • consecutive successors are represented by intervals; • the residual successors are gap-encoded using ζ codes =-=[20]-=-, a kind of instantaneous code devised explicitly for power laws with small exponent. WebGraph provides an implementation of the BV compression scheme in Java (full source is available). Moreover, the... |

6 |
Efficient compression of web graphs
- Y, Miyawaki, et al.
(Show Context)
Citation Context ...ering; the compression rate and access time are good, but experiments are provided only for very small data sets, and the software is not available. Pattern compression. Asano, Miyawaki and Nishizeki =-=[23]-=- use different techniques to encode intra and inter-host links, and for intra-host compression they adopt six different kinds of patterns that are used to cover the local adjacency matrix. No code is ... |

2 | Compact encoding of the web graph exploiting various power laws: statistical reason behind link database
- Asano, Ito, et al.
(Show Context)
Citation Context ... are written as sequences of 3-bit groups, plus one continuation bit): the reason behind this choice is that, although nybble codes are outperformed by Huffman, they are much faster to decompress. As =-=[15]-=- explain a posteriori, the choice of nybble codes is optimal (at least for gaps between successors) among all the universal codes of the same family (the so called k-bit variable length code, where k ... |