## Models for the Compressible Web

### BibTeX

@MISC{Chierichetti_modelsfor,

author = {Flavio Chierichetti and Ravi Kumar and Silvio Lattanzi and Alessandro Panconesi and Prabhakar Raghavan},

title = {Models for the Compressible Web},

year = {}

}

### OpenURL

### Abstract

Graphs resulting from human behavior (the web graph, friendship graphs, etc.) have hitherto been viewed as a monolithic class of graphs with similar characteristics; for instance, their degree distributions are markedly heavy-tailed. In this paper we take our understanding of behavioral graphs a step further by showing that an intriguing empirical property of web graphs — their compressibility — cannot be exhibited by well-known graph models for the web and for social networks. We then develop a more nuanced model for web graphs and show that it does exhibit compressibility, in addition to previously modeled web graph properties.

### Citations

2089 | Emergence of scaling in random networks
- Barabási, Albert
- 1999
(Show Context)
Citation Context ...atistics. In such analysis, there has been a tendency to lump together behavioral graphs arising from a variety of contexts, to be studied using a common set of models and tools. It has been observed =-=[3]-=-, [9], [22] for instance that the directed graphs arising from such diverse phenomena as the web graph (pages are nodes and hyperlinks are edges), citation graphs, friendship graphs, and email traffic... |

1903 |
Collective dynamics of ‘smallworld’ networks
- Watts, Strogatz
- 1998
(Show Context)
Citation Context ...the dependencies between the interacting humans who collectively generate these statistics. These explanations have found new expression in the form of rich-get-richer and herdmentality theories [3], =-=[34]-=-. Early rigorous analyses of such models include [2], [7], [13], [21]. Whereas Kumar et al. [21] and Borgs et al. [8] focused on modeling the web graph, the models of Aiello, Chung, and Lu (ACL) [2], ... |

980 |
Human behavior and the principle of least effort
- Zipf
- 1949
(Show Context)
Citation Context ...r law degree distributions in behavioral (and other) graphs has a long history [3], [22]; indeed, such distributions predate the modern interest in social networks through observations in linguistics =-=[36]-=- and sociology [30]; see the survey by Mitzenmacher [28]. Simon [30], Mandelbrot [27], Zipf [36] and others have provided a number of explanations for these distributions, attributing them to the depe... |

795 | Managing Gigabytes: Compressing and Indexing Documents and Images
- Witten, Moffat, et al.
- 1999
(Show Context)
Citation Context ...perty is their compressibility — the number of bits needed to store each edge in the graph. Compressibility determines the ability to efficiently store and manipulate these massive graphs [18], [31], =-=[35]-=-. An intriguing set of papers by Boldi, Santini, and Vigna [4]–[6] shows that the web graph is highly compressible: it can be stored such that each edge requires only a small constant number — between... |

623 | The small-world phenomenon: An algorithmic perspective - Kleinberg |

299 | Graphs over time: densification laws, shrinking diameters and possible explanations
- Leskovec, Kleinberg, et al.
- 2005
(Show Context)
Citation Context ...ic Erdös–Rényi models cannot exhibit such power laws. To explain the power law degree distributions seen in behavioral graphs, several models have been developed [2], [3], [7], [8], [11], [17], [21], =-=[25]-=- for generating random graphs in which dependent events combine to deliver the observed power laws. While the degree distribution is a fundamental but local property of such graphs, an important globa... |

297 | Trawling the Web for Emerging Cyber-Communities
- Kumar, Raghavan, et al.
- 1999
(Show Context)
Citation Context ...In such analysis, there has been a tendency to lump together behavioral graphs arising from a variety of contexts, to be studied using a common set of models and tools. It has been observed [3], [9], =-=[22]-=- for instance that the directed graphs arising from such diverse phenomena as the web graph (pages are nodes and hyperlinks are edges), citation graphs, friendship graphs, and email traffic graphs all... |

251 | A brief history of generative models for power law and lognormal distributions
- Mitzenmacher
(Show Context)
Citation Context ...phs has a long history [3], [22]; indeed, such distributions predate the modern interest in social networks through observations in linguistics [36] and sociology [30]; see the survey by Mitzenmacher =-=[28]-=-. Simon [30], Mandelbrot [27], Zipf [36] and others have provided a number of explanations for these distributions, attributing them to the dependencies between the interacting humans who collectively... |

249 |
On a class of skew distribution functions
- Simon
- 1955
(Show Context)
Citation Context ...butions in behavioral (and other) graphs has a long history [3], [22]; indeed, such distributions predate the modern interest in social networks through observations in linguistics [36] and sociology =-=[30]-=-; see the survey by Mitzenmacher [28]. Simon [30], Mandelbrot [27], Zipf [36] and others have provided a number of explanations for these distributions, attributing them to the dependencies between th... |

218 | Graph structure in the web
- Broder, Kumar, et al.
(Show Context)
Citation Context ...ics. In such analysis, there has been a tendency to lump together behavioral graphs arising from a variety of contexts, to be studied using a common set of models and tools. It has been observed [3], =-=[9]-=-, [22] for instance that the directed graphs arising from such diverse phenomena as the web graph (pages are nodes and hyperlinks are edges), citation graphs, friendship graphs, and email traffic grap... |

214 | Stochastic models for the web graph
- Kumar, Raghavan, et al.
(Show Context)
Citation Context ... classic Erdös–Rényi models cannot exhibit such power laws. To explain the power law degree distributions seen in behavioral graphs, several models have been developed [2], [3], [7], [8], [11], [17], =-=[21]-=-, [25] for generating random graphs in which dependent events combine to deliver the observed power laws. While the degree distribution is a fundamental but local property of such graphs, an important... |

204 |
Navigation in a small world
- Kleinberg
(Show Context)
Citation Context ... we show that the preferential attachment (PA) model [3], [7], the ACL model [2],the copying model [21], the Kronecker product model [24], and Kleinberg’s model for navigability 1 on social networks =-=[19]-=-, all have large entropy in the above sense. We then show our main result: a new model for the web graph that has constant entropy per edge, while preserving crucial properties of previous models such... |

189 | An experimental study of the small world problem
- Jeffrey, Milgram
- 1969
(Show Context)
Citation Context ...aphs that require only O(1) bits per edge and those requiring, say, ɛ lg n bits. The point however is that the 1 Since navigability is a crucial property of real-life social networks (cf. [16], [26], =-=[33]-=-), it is tempting to conjecture that social networks are incompressible; see, for instance, [12]. compressibility of our model relies upon other important structural properties of real web graphs that... |

160 | The Web Graph Framework I: Compression Techniques
- Boldi, Vigna
- 2004
(Show Context)
Citation Context ...rical results suggest the intriguing possibility that the Web can be described with only O(1) bits per edge on average. Two properties are at the heart of the compression algorithm of Boldi and Vigna =-=[5]-=-. First, once web pages are sorted lexicographically by URL, the set of out-links of a page exhibits locality; this can plausibly be attributed to the fact that nearby pages are likely to come from th... |

156 | The degree sequence of a scale-free random graph process, Random Structures and Algorithms 18
- Bollobás, Riordan, et al.
- 2001
(Show Context)
Citation Context ...om graphs generated by classic Erdös–Rényi models cannot exhibit such power laws. To explain the power law degree distributions seen in behavioral graphs, several models have been developed [2], [3], =-=[7]-=-, [8], [11], [17], [21], [25] for generating random graphs in which dependent events combine to deliver the observed power laws. While the degree distribution is a fundamental but local property of su... |

151 |
Geographic routing in social networks
- Liben-Nowell, Novak, et al.
- 2005
(Show Context)
Citation Context ...een graphs that require only O(1) bits per edge and those requiring, say, ɛ lg n bits. The point however is that the 1 Since navigability is a crucial property of real-life social networks (cf. [16], =-=[26]-=-, [33]), it is tempting to conjecture that social networks are incompressible; see, for instance, [12]. compressibility of our model relies upon other important structural properties of real web graph... |

150 | Heuristically optimized trade-offs: a new paradigm for power laws
- Fabrikant, Koutsoupias, et al.
- 2002
(Show Context)
Citation Context ...ted by classic Erdös–Rényi models cannot exhibit such power laws. To explain the power law degree distributions seen in behavioral graphs, several models have been developed [2], [3], [7], [8], [11], =-=[17]-=-, [21], [25] for generating random graphs in which dependent events combine to deliver the observed power laws. While the degree distribution is a fundamental but local property of such graphs, an imp... |

134 |
Highly optimized tolerance : A mechanism for power laws in designed systems
- Doyle
- 1999
(Show Context)
Citation Context ...generated by classic Erdös–Rényi models cannot exhibit such power laws. To explain the power law degree distributions seen in behavioral graphs, several models have been developed [2], [3], [7], [8], =-=[11]-=-, [17], [21], [25] for generating random graphs in which dependent events combine to deliver the observed power laws. While the degree distribution is a fundamental but local property of such graphs, ... |

113 | An experimental study of search in global social networks
- Dodds, Muhamad, et al.
(Show Context)
Citation Context ...y between graphs that require only O(1) bits per edge and those requiring, say, ɛ lg n bits. The point however is that the 1 Since navigability is a crucial property of real-life social networks (cf. =-=[16]-=-, [26], [33]), it is tempting to conjecture that social networks are incompressible; see, for instance, [12]. compressibility of our model relies upon other important structural properties of real web... |

89 | Random evolution of massive graphs, in
- Aiello, Chung, et al.
(Show Context)
Citation Context ... > 1; random graphs generated by classic Erdös–Rényi models cannot exhibit such power laws. To explain the power law degree distributions seen in behavioral graphs, several models have been developed =-=[2]-=-, [3], [7], [8], [11], [17], [21], [25] for generating random graphs in which dependent events combine to deliver the observed power laws. While the degree distribution is a fundamental but local prop... |

80 | Towards compressing web graphs
- Adler, Mitzenmacher
- 2001
(Show Context)
Citation Context ...find a short route to a target using only local, myopic choices at each step of the route. The papers by Boldi, Santini, and Vigna [4]–[6] suggests that the web graph is highly compressible (see also =-=[1]-=-, [10], [12], [31]). II. PRELIMINARIES The graph models we study will either have a fixed number of nodes or will be evolving models in which nodes arrive in a discrete-time stochastic process; for ma... |

75 | Realistic, mathematically tractable graph generation and evolution, using Kronecker multiplication
- Leskovec, Chakrabarti, et al.
- 2005
(Show Context)
Citation Context ...heir topology (i.e., with all labels stripped away). Specifically, we show that the preferential attachment (PA) model [3], [7], the ACL model [2],the copying model [21], the Kronecker product model =-=[24]-=-, and Kleinberg’s model for navigability 1 on social networks [19], all have large entropy in the above sense. We then show our main result: a new model for the web graph that has constant entropy per... |

65 |
An informational theory of the statistical structure of language. Communication Theory
- Mandelbrot
- 1953
(Show Context)
Citation Context ...22]; indeed, such distributions predate the modern interest in social networks through observations in linguistics [36] and sociology [30]; see the survey by Mitzenmacher [28]. Simon [30], Mandelbrot =-=[27]-=-, Zipf [36] and others have provided a number of explanations for these distributions, attributing them to the dependencies between the interacting humans who collectively generate these statistics. T... |

58 |
Concentration of Measure for the Analysis of Randomized Algorithms
- Dubhashi, Panconesi
- 2009
(Show Context)
Citation Context .... , ai−1, a ′ i, ai+1, . . . , an)| ≤ c. In order to establish that certain events occur w.h.p., we will make use of the following concentration result known as the method of bounded differences (cf. =-=[15]-=-). Theorem 1 (Method of bounded differences). Let X1, . . . , Xn be independent r.v.’s. Let f be a function on X1, . . . , Xn satisfying the c-Lipschitz property. Then, Pr [|f(X1, . . . , Xn) − E [f(X... |

47 | Compressing the graph structure of the web
- Suel, Yuan
- 2001
(Show Context)
Citation Context ...al property is their compressibility — the number of bits needed to store each edge in the graph. Compressibility determines the ability to efficiently store and manipulate these massive graphs [18], =-=[31]-=-, [35]. An intriguing set of papers by Boldi, Santini, and Vigna [4]–[6] shows that the web graph is highly compressible: it can be stored such that each edge requires only a small constant number — b... |

44 |
A general model of web graphs, Random Structures and Algorithms 22
- Cooper, Frieze
- 2003
(Show Context)
Citation Context ...y generate these statistics. These explanations have found new expression in the form of rich-get-richer and herdmentality theories [3], [34]. Early rigorous analyses of such models include [2], [7], =-=[13]-=-, [21]. Whereas Kumar et al. [21] and Borgs et al. [8] focused on modeling the web graph, the models of Aiello, Chung, and Lu (ACL) [2], Kleinberg [19], Lattanzi and Sivakumar [23], and Leskovec et al... |

35 | On compressing social networks
- Chierichetti, Kumar, et al.
- 1997
(Show Context)
Citation Context ...er is that the 1 Since navigability is a crucial property of real-life social networks (cf. [16], [26], [33]), it is tempting to conjecture that social networks are incompressible; see, for instance, =-=[12]-=-. compressibility of our model relies upon other important structural properties of real web graphs that previous models, in view of our lower bounds, provably cannot have. Related prior work. The obs... |

21 |
A scalable pattern mining approach to web graph compression with communities
- Buehrer, Chellapilla
- 2008
(Show Context)
Citation Context ... highly compressible: it can be stored such that each edge requires only a small constant number — between one and three — of bits on average; a more recent experimental study confirms these findings =-=[10]-=-. These empirical results suggest the intriguing possibility that the Web can be described with only O(1) bits per edge on average. Two properties are at the heart of the compression algorithm of Bold... |

21 | Affiliation Networks
- Lattanzi, Sivakumar
- 2009
(Show Context)
Citation Context ...s include [2], [7], [13], [21]. Whereas Kumar et al. [21] and Borgs et al. [8] focused on modeling the web graph, the models of Aiello, Chung, and Lu (ACL) [2], Kleinberg [19], Lattanzi and Sivakumar =-=[23]-=-, and Leskovec et al. [24] addressed social graphs in which people are nodes and the edges between them denote friendship. The ACL model is in fact known not to be a good representation of the web gra... |

19 | The cover time of the preferential attachment graph - Cooper, Frieze |

10 | Permuting Web Graphs
- Boldi, Santini, et al.
- 2009
(Show Context)
Citation Context ...networks focuses on their navigability: it is possible for a node to find a short route to a target using only local, myopic choices at each step of the route. The papers by Boldi, Santini, and Vigna =-=[4]-=-–[6] suggests that the web graph is highly compressible (see also [1], [10], [12], [31]). II. PRELIMINARIES The graph models we study will either have a fixed number of nodes or will be evolving model... |

9 | First to market is not everything: an analysis of preferential attachment with fitness
- Borgs, Chayes, et al.
- 2007
(Show Context)
Citation Context ...aphs generated by classic Erdös–Rényi models cannot exhibit such power laws. To explain the power law degree distributions seen in behavioral graphs, several models have been developed [2], [3], [7], =-=[8]-=-, [11], [17], [21], [25] for generating random graphs in which dependent events combine to deliver the observed power laws. While the degree distribution is a fundamental but local property of such gr... |

5 |
Speeding up algorithms on compressed web graphs
- Karande, Chellapilla, et al.
- 2009
(Show Context)
Citation Context ...t global property is their compressibility — the number of bits needed to store each edge in the graph. Compressibility determines the ability to efficiently store and manipulate these massive graphs =-=[18]-=-, [31], [35]. An intriguing set of papers by Boldi, Santini, and Vigna [4]–[6] shows that the web graph is highly compressible: it can be stored such that each edge requires only a small constant numb... |

4 | Codes for the world-wide web - Boldi, Vigna - 2005 |

2 |
The largest tree in certain models of random forests
- Mutafchiev
- 1998
(Show Context)
Citation Context ...ed together, preserving self-loops and multi-edges. For k = 1, note that the graphs generated by the above model are forests. Since there are 2O(n) unlabeled forests ) = O(n), i.e., on n nodes (e.g., =-=[29]-=-), we have H(Q pref[k] n the graph without labels and edge orientations is compressible to O(1) bits per edge. The more interesting case is when k ≥ 2 for which we show an incompressibility bound. We ... |

1 | On the complexity of algorithms on recursive trees - Szymanski - 1990 |