## On Compressing Social Networks

### Cached

### Download Links

Citations: | 35 - 1 self |

### BibTeX

@MISC{Chierichetti_oncompressing,

author = {Flavio Chierichetti and Ravi Kumar and Michael Mitzenmacher and Alessandro Panconesi and Silvio Lattanzi and Prabhakar Raghavan},

title = {On Compressing Social Networks},

year = {}

}

### OpenURL

### Abstract

Motivated by structural properties of the Web graph that support efficient data structures for in memory adjacency queries, we study the extent to which a large network can be compressed. Boldi and Vigna (WWW 2004), showed that Web graphs can be compressed down to three bits of storage per edge; we study the compressibility of social networks where again adjacency queries are a fundamental primitive. To this end, we propose simple combinatorial formulations that encapsulate efficient compressibility of graphs. We show that some of the problems are NP-hard yet admit effective heuristics, some of which can exploit properties of social networks such as link reciprocity. Our extensive experiments show that social networks and the Web graph exhibit vastly different compressibility characteristics.

### Citations

10926 |
Computers and Intractability: A Guide to the Theory of NPCompleteness. Freeman and
- Garey, Johnson
- 1979
(Show Context)
Citation Context ...es to minimize the maximum stretch of edges, and the minimum linear arrangement problem, where the goal is to order the nodes to minimize the sum of stretch of edges, have a rich history. We refer to =-=[13]-=- and the online compendium at www.nada.kth.se/ ˜viggo/wwwcompendium/node52.html. 3. COMPRESSION SCHEMES In this section we outline the compression framework used in the rest of the paper. The framewor... |

2089 | Emergence of scaling in random networks
- Barabási, Albert
- 1999
(Show Context)
Citation Context ...heuristic: using shingle ordering, it is possible to copy a constant fraction of the edges in a large class of random graphs with certain properties. The well-known preferential attachment (PA) model =-=[2, 8]-=-, for instance, generates graphs in this class. Our analysis thus shows that it is indeed possible to obtain provable performance guarantees on shingle ordering with respect to copying (hence compress... |

818 | Introduction to Information Retrieval - Manning, Raghavan, et al. - 2009 |

797 | Managing Gigabytes: Compressing and Indexing Documents and Images
- Witten, Moffat, et al.
- 1999
(Show Context)
Citation Context ...ng schemes encode an integer x ∈ Z + using close to the informatic-theoretic minimum of 1 + ⌊lg(x)⌋ bits. For example, the number of bits used by the γ-code to represent x is 1 + 2⌊lg x⌋. We refer to =-=[28]-=- for more background on these codes. 3.1 BV compression scheme BV incorporates three main ideas. First, if the graph has many nodes whose neighborhoods are similar, then the neighborhood of a node can... |

647 | Some optimal inapproximability results - Hastad - 1997 |

376 |
Some simplified NP-complete graph problems
- Garey, Johnson, et al.
- 1976
(Show Context)
Citation Context ...-theoretically optimal (or nearly so). Also note that if the term inside the summation were just |π(u)− π(v)|, then this is the well-known minimum linear arrangement (MLINA) problem. MLINA is NP-hard =-=[14]-=-; little, however, is known about its approximability. The best algorithm [23] approximates MLINA to O( √ log n log log n) and this algorithm is not practical for large graphs. From the standpoint of ... |

226 |
Structure and Evolution of Online Social Networks
- Kumar, Novak, et al.
- 2006
(Show Context)
Citation Context ...work. Maintaining these indexes in memory demands that the underlying graph be stored in a compressed form that facilitates efficient adjacency queries. Secondly, there is a wealth of evidence (e.g., =-=[17]-=-) that social networks are not random graphs in the usual sense: they exhibit certain distinctive local characteristics (such as degree sequences). Studying the compressibility of a social network is ... |

193 | Min-wise independent permutations
- Broder, Charikar, et al.
(Show Context)
Citation Context ...ty of two sets. Let σ be a random permutation of the elements in A ∪ B. For a set A,let Mσ(A) = σ −1 (mina∈A{σ(a)}), the smallest element in A according to σ; we call it the shingle. It can be shown =-=[10]-=- that the probability that the shingles of A and B are identical is precisely the Jaccard coefficient J(A, B), i.e., Pr[Mσ(A) = Mσ(B)] = |A ∩ B| = J(A, B). |A ∪ B| Instead of using random permutations... |

174 | Representing Web Graphs
- Raghavan, Molina
- 2003
(Show Context)
Citation Context ...compression in this context [1]. Randall et al. [22] suggested lexicographic ordering as a way to obtain good Web graph compression, utilizing both similarity and locality. Raghavan and Garcia-Molina =-=[21]-=- considered a hierarchical view of the Web graph to achieve compression; see also Suel and Yuan [27] for a structural approach to compressing Web graphs. A major step was taken by Boldi and Vigna [6],... |

161 | The webgraph framework I: Compression techniques
- Boldi, Vigna
(Show Context)
Citation Context ...ges are nodes, hyperlinks are directed edges) is a special variant of a social network, in that we have a network of pages rather than of people. It is known that the Web graph is highly compressible =-=[6, 11]-=-. Particularly impressive results have been obtained by Boldi and Vigna [6], who exploit lexicographic locality in the Web graph: when pages are ordered lexicographically by URL, proximal pages have s... |

156 | The degree sequence of a scale-free random graph process, Random Structures and Algorithms 18
- Bollobás, Riordan, et al.
- 2001
(Show Context)
Citation Context ...onditioned on the fact that the highest degree at that point is O(n 1/2+ɛ ) whp [12]. Then, by Markov’s inequality the claim follows. Also, we remove all nodes of degree > k, for some constant k — by =-=[9]-=- only ɛkn edges and nodes will be removed this way. The resulting graph will thus have at most n nodes and at least (1 − 2ɛk)mn ≥ (1 − 2ɛk)n edges. Also its maximum degree will be k. By averaging, a g... |

151 |
Geographic routing in social networks
- Liben-Nowell, Novak, et al.
- 2005
(Show Context)
Citation Context ...anonical Gray code [5, 22]. (4) Geographic order. In a social network, if geographic information is available in the form of a zip code, then this defines a geography-based order. Liben-Nowell et al. =-=[18]-=- showed that about the 70% of social network links arise from geographical proximity, suggesting that friends can be grouped together using geographical information. Notice that this only defines a pa... |

102 | Mathematical results on scale-free random graphs
- Bollobás, Riordan
- 2003
(Show Context)
Citation Context ...heuristic: using shingle ordering, it is possible to copy a constant fraction of the edges in a large class of random graphs with certain properties. The well-known preferential attachment (PA) model =-=[2, 8]-=-, for instance, generates graphs in this class. Our analysis thus shows that it is indeed possible to obtain provable performance guarantees on shingle ordering with respect to copying (hence compress... |

81 | Towards compressing web graphs
- Adler, Mitzenmacher
- 2001
(Show Context)
Citation Context ...Adler and Mitzenmacher introduced the idea of finding pages with similar sets of neighbors in the context of compressing Web graphs, and obtained some hardness results for compression in this context =-=[1]-=-. Randall et al. [22] suggested lexicographic ordering as a way to obtain good Web graph compression, utilizing both similarity and locality. Raghavan and Garcia-Molina [21] considered a hierarchical ... |

68 |
Discovering large dense subgraphs in massive graphs
- Gibson, Kumar, et al.
- 2005
(Show Context)
Citation Context ... same shingle and hence be close to each other in a shingle-based ordering. Thus, the properties of locality and similarity are captured by the shingle ordering heuristic. (Gibson, Kumar, and Tomkins =-=[15]-=- used a similar heuristic, but for identifying dense subgraphs of large graphs.) 4.4 Properties of shingle ordering In this section we show some theoretical justification for the shingle ordering heur... |

48 | Compressing the graph structure of the web
- Suel, Yuan
(Show Context)
Citation Context ...tain good Web graph compression, utilizing both similarity and locality. Raghavan and Garcia-Molina [21] considered a hierarchical view of the Web graph to achieve compression; see also Suel and Yuan =-=[27]-=- for a structural approach to compressing Web graphs. A major step was taken by Boldi and Vigna [6], who both developed a generic Web graph compression framework that takes into account the locality a... |

43 |
G.: Index compression through document reordering
- Blandford, Blelloch
- 2002
(Show Context)
Citation Context ...Gray ordering, in compressing the transpose of the Web graph. The problem of assigning or reassigning document identifiers in order to compress text indexes has a long history. Blandford and Blelloch =-=[4]-=- considered the problem of compressing text indexes by permuting the document identifiers to create locality in an inverted index. Silvestri, Perego, and Orlando [26] proposed a clustering approach fo... |

35 | The link database: Fast access to graphs of the web
- Randall, Stata, et al.
- 2001
(Show Context)
Citation Context ...er introduced the idea of finding pages with similar sets of neighbors in the context of compressing Web graphs, and obtained some hardness results for compression in this context [1]. Randall et al. =-=[22]-=- suggested lexicographic ordering as a way to obtain good Web graph compression, utilizing both similarity and locality. Raghavan and Garcia-Molina [21] considered a hierarchical view of the Web graph... |

26 | The webgraph framework II: Codes for the world-wide web - Boldi, Vigna |

25 |
C.P.: Inverted file compression through document identifier reassignment
- Shieh, Chen, et al.
- 2003
(Show Context)
Citation Context ...es by permuting the document identifiers to create locality in an inverted index. Silvestri, Perego, and Orlando [26] proposed a clustering approach for reassigning document identifiers. Shieh et al. =-=[24]-=- proposed a document identifier reassignment method based on a heuristic for the traveling salesman problem. Recently, Silvestri [25] showed that assigning document identifiers to Web documents based ... |

22 | Sorting out the document identifier assignment problem
- Silvestri
- 2007
(Show Context)
Citation Context ...tering approach for reassigning document identifiers. Shieh et al. [24] proposed a document identifier reassignment method based on a heuristic for the traveling salesman problem. Recently, Silvestri =-=[25]-=- showed that assigning document identifiers to Web documents based on URL lexicographic ordering improves compression. There are many classical node ordering problems on graphs. The minimum bandwidth ... |

21 |
A scalable pattern mining approach to web graph compression with communities
- Buehrer, Chellapilla
- 2008
(Show Context)
Citation Context ...ges are nodes, hyperlinks are directed edges) is a special variant of a social network, in that we have a network of pages rather than of people. It is known that the Web graph is highly compressible =-=[6, 11]-=-. Particularly impressive results have been obtained by Boldi and Vigna [6], who exploit lexicographic locality in the Web graph: when pages are ordered lexicographically by URL, proximal pages have s... |

16 | Inapproximability results for sparsest cut, optimal linear arrangement, and precedence constrained scheduling
- Ambühl, Mastrolilli, et al.
- 2007
(Show Context)
Citation Context ...approximates MLINA to O( √ log n log log n) and this algorithm is not practical for large graphs. From the standpoint of the hardness of approximation, only the existence of a PTAS has been ruled out =-=[3]-=-. One cannot hope to use an approximate solution to MLINA to solve MLOGA since we can show (see Appendix A) that these problems are very different in their structure. In actually compressing the graph... |

14 | Concentration for independent permutations - McDiarmid |

14 | Assigning Document Identifiers to Enhance Compressibility of Web Search Engines Indexes
- Silvestri, Perego, et al.
- 2004
(Show Context)
Citation Context ...ng history. Blandford and Blelloch [4] considered the problem of compressing text indexes by permuting the document identifiers to create locality in an inverted index. Silvestri, Perego, and Orlando =-=[26]-=- proposed a clustering approach for reassigning document identifiers. Shieh et al. [24] proposed a document identifier reassignment method based on a heuristic for the traveling salesman problem. Rece... |

13 |
High degree vertices and eigenvalues in the preferential attachment graph
- Flaxman, Frieze, et al.
(Show Context)
Citation Context ...y noting that the expected number of multiple edges and self-loops added by the nth inserted node is O(m 3 /n 1/2−ɛ ), conditioned on the fact that the highest degree at that point is O(n 1/2+ɛ ) whp =-=[12]-=-. Then, by Markov’s inequality the claim follows. Also, we remove all nodes of degree > k, for some constant k — by [9] only ɛkn edges and nodes will be removed this way. The resulting graph will thus... |

13 |
New approximation techniques for some linear ordering problems
- Rao, Richa
(Show Context)
Citation Context ...ummation were just |π(u)− π(v)|, then this is the well-known minimum linear arrangement (MLINA) problem. MLINA is NP-hard [14]; little, however, is known about its approximability. The best algorithm =-=[23]-=- approximates MLINA to O( √ log n log log n) and this algorithm is not practical for large graphs. From the standpoint of the hardness of approximation, only the existence of a PTAS has been ruled out... |

10 | Permuting Web Graphs
- Boldi, Santini, et al.
- 2009
(Show Context)
Citation Context ... Chellapilla [11] used the frequent pattern mining approach to compress Web graphs; using this, they were able to achieve a compression of under two bits per link. Recently, Boldi, Santini, and Vigna =-=[5]-=- studied the effectiveness of various orderings, including Gray ordering, in compressing the transpose of the Web graph. The problem of assigning or reassigning document identifiers in order to compre... |