## Permuting Web Graphs ∗

Citations: | 10 - 2 self |

### BibTeX

@MISC{Boldi_permutingweb,

author = {Paolo Boldi and Massimo Santini and Sebastiano Vigna},

title = {Permuting Web Graphs ∗},

year = {}

}

### OpenURL

### Abstract

Since the first investigations on web graph compression, it has been clear that the ordering of the nodes of the graph has a fundamental influence on the compression rate (usually expressed as the number of bits per link). The author of the LINK database [1], for instance, investigated three different approaches: an extrinsic ordering (URL ordering) and two intrinsic (or coordinate-free) orderings based on the rows of the adjacency matrix (lexicographic and Gray code); they concluded that URL ordering has many advantages in spite of a small penalty in compression. In this paper we approach this issue in a more systematic way, testing some old orderings and proposing some new ones. Our experiments are made in the WebGraph framework [2], and show that the compression technique and the structure of the graph can produce significantly different results. In particular, we show that for the transpose web graph URL ordering is significantly less effective, and that some new orderings combining host information and Gray/lexicographic orderings outperform all previous methods. In particular, in some large transposed graphs they yield the quite incredible compression rate of 1 bit per link. 1

### Citations

185 | Representing Web graphs
- Raghavan, Garcia-Molina
(Show Context)
Citation Context ...exicographic orderings outperform all previous methods. In particular, in some large transposed graphs they yield the quite incredible compression rate of 1 bit per link. 1 Introduction The web graph =-=[3]-=- is a directed graph whose nodes correspond to URLs, with an arc from x to y whenever page denoted by x contains a hyperlink toward page denoted by y; more loosely, the same term is sometimes used for... |

170 | The WebGraph framework I: Compression techniques - Boldi, Vigna - 2004 |

121 |
The connectivity server: Fast access to linkage. Information on the Web
- Bharat, Broder, et al.
- 1998
(Show Context)
Citation Context ...ecompressing it (or, decompressing it only partially, on-demand, and efficiently). The latter approach, that can be referred to as web graph compression, can be traced back to the Connectivity Server =-=[4]-=- and to the LINK database [1]; more recently, it led to the development of the WebGraph framework [2], that still provides the best practical compression techniques. Most web graph compression algorit... |

43 |
Index compression through document reordering
- Blandford, Blelloch
- 2002
(Show Context)
Citation Context .... . we store, using a variable-length bit encoding, x0, x1 − x0, x2 − x1,... . 2 We note that the same approach has been shown to be fruitful in the compression of inverted indices; see, for instance =-=[5,6,7,8]-=-. 3 In this description we are ignoring the problem that π is not unique if A contains the same row many times.118 P. Boldi, M. Santini, and S. Vigna These approaches are called intrinsic,orcoordinat... |

25 |
C.P.: Inverted file compression through document identifier reassignment
- Shieh, Chen, et al.
- 2003
(Show Context)
Citation Context .... . we store, using a variable-length bit encoding, x0, x1 − x0, x2 − x1,... . 2 We note that the same approach has been shown to be fruitful in the compression of inverted indices; see, for instance =-=[5,6,7,8]-=-. 3 In this description we are ignoring the problem that π is not unique if A contains the same row many times.118 P. Boldi, M. Santini, and S. Vigna These approaches are called intrinsic,orcoordinat... |

23 | Sorting out the document identifier assignment problem
- Silvestri
(Show Context)
Citation Context .... . we store, using a variable-length bit encoding, x0, x1 − x0, x2 − x1,... . 2 We note that the same approach has been shown to be fruitful in the compression of inverted indices; see, for instance =-=[5,6,7,8]-=-. 3 In this description we are ignoring the problem that π is not unique if A contains the same row many times.118 P. Boldi, M. Santini, and S. Vigna These approaches are called intrinsic,orcoordinat... |

17 | Document identifier reassignment through dimensionality reduction
- Blanco, Barreiro
- 2005
(Show Context)
Citation Context |

2 |
The link database: Fast access to graphs of the web. Res. rep
- Randall, Wickremesinghe, et al.
(Show Context)
Citation Context ..., it has been clear that the ordering of the nodes of the graph has a fundamental influence on the compression rate (usually expressed as the number of bits per link). The author of the LINK database =-=[1]-=-, for instance, investigated three different approaches: an extrinsic ordering (URL ordering) and two intrinsic (or coordinatefree) orderings based on the rows of the adjacency matrix (lexicographic a... |

1 |
The Art of Computer Programming. In: Fascicle 2: Generating All Tuples and
- Knuth
- 2005
(Show Context)
Citation Context ...sive vectors 5 differ by exactly one bit; Gray codes, named after the physicist Frank Gray, find countless applications in computer science, physics and mathematics (we refer the interested reader to =-=[9]-=- for more information on this topic). The ordering imposed by a Gray code on 2 n is called a Gray ordering. Even though there are many Gray codes, and thus many Gray orderings, one that is very simple... |