Results 11  20
of
114
Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication
 In PKDD
, 2005
"... Abstract. How can we generate realistic graphs? In addition, how can we do so with a mathematically tractable model that makes it feasible to analyze their properties rigorously? Real graphs obey a long list of surprising properties: Heavy tails for the in and outdegree distribution; heavy tails f ..."
Abstract

Cited by 103 (25 self)
 Add to MetaCart
(Show Context)
Abstract. How can we generate realistic graphs? In addition, how can we do so with a mathematically tractable model that makes it feasible to analyze their properties rigorously? Real graphs obey a long list of surprising properties: Heavy tails for the in and outdegree distribution; heavy tails for the eigenvalues and eigenvectors; small diameters; and the recently discovered “Densification Power Law ” (DPL). All published graph generators either fail to match several of the above properties, are very complicated to analyze mathematically, or both. Here we propose a graph generator that is mathematically tractable and matches this collection of properties. The main idea is to use a nonstandard matrix operation, the Kronecker product, to generate graphs that we refer to as “Kronecker graphs”. We show that Kronecker graphs naturally obey all the above properties; in fact, we can rigorously prove that they do so. We also provide empirical evidence showing that they can mimic very well several real graphs. 1
Growth of the Flickr Social Network
, 2008
"... Online social networking sites like MySpace, Orkut, and Flickr are among the most popular sites on the Web and continue to experience dramatic growth in their user population. The popularity of these sites offers a unique opportunity to study the dynamics of social networks at scale. Having a proper ..."
Abstract

Cited by 102 (4 self)
 Add to MetaCart
Online social networking sites like MySpace, Orkut, and Flickr are among the most popular sites on the Web and continue to experience dramatic growth in their user population. The popularity of these sites offers a unique opportunity to study the dynamics of social networks at scale. Having a proper understanding of how online social networks grow can provide insights into the network structure, allow predictions of future growth, and enable simulation of systems on networks of arbitrary size. However, to date, most empirical studies have focused on static network snapshots rather than growth dynamics. In this paper, we collect and examine detailed growth data from the Flickr online social network, focusing on the ways in which new links are formed. Our study makes two contributions. First, we collect detailed data covering three months of growth, encompassing 950,143 new users and over 9.7 million new links, and we make this data available to the research community. Second, we use a firstprinciples approach to investigate the link formation process. In short, we find that links tend to be created by users who already have many links, that users tend to respond to incoming links by creating links back to the source, and that users link to other users who are already close in the network.
Jellyfish: A conceptual model for the AS internet topology
, 2004
"... Several novel concepts and tools have revolutionized our understanding of the Internet topology. Most of the existing efforts attempt to develop accurate analytical models. In this paper, our goal is to develop an effective conceptual model: a model that can be easily drawn by hand, while at the sam ..."
Abstract

Cited by 91 (8 self)
 Add to MetaCart
Several novel concepts and tools have revolutionized our understanding of the Internet topology. Most of the existing efforts attempt to develop accurate analytical models. In this paper, our goal is to develop an effective conceptual model: a model that can be easily drawn by hand, while at the same time, it captures significant macroscopic properties. We build the foundation for our model with two thrusts: a) we identify new topological properties, and b) we provide metrics to quantify the topological importance of a node. We propose the jellyfish as a model for the interdomain Internet topology. We show that our model captures and represents the most significant topological properties. Furthermore, we observe that the jellyfish has lasting value: it describes the topology for more than six years.
Approximating Aggregate Queries about Web Pages via Random Walks
 In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB
, 2000
"... We present a random walk as an eÆcient and accurate approach to approximating certain aggregate queries about web pages. Our method uses a novel random walk to produce an almost uniformly distributed sample of web pages. The walk traverses a dynamically built regular undirected graph. Queries we ha ..."
Abstract

Cited by 81 (9 self)
 Add to MetaCart
(Show Context)
We present a random walk as an eÆcient and accurate approach to approximating certain aggregate queries about web pages. Our method uses a novel random walk to produce an almost uniformly distributed sample of web pages. The walk traverses a dynamically built regular undirected graph. Queries we have estimated using this method include the coverage of search engines, the proportion of pages belonging to.com and other domains, and the average size of web pages. Strong experimental evidence suggests that our walk produces accurate results quickly using very limited resources. 1
Compressing the graph structure of the web
 In IEEE Data Compression Conference (DCC
, 2001
"... A large amount of research has recently focused on the graph structure (or link structure) of the World Wide Web. This structure has proven to be extremely useful for improving the performance of search engines and other tools for navigating the web. However, since the graphs in these scenarios invo ..."
Abstract

Cited by 59 (2 self)
 Add to MetaCart
(Show Context)
A large amount of research has recently focused on the graph structure (or link structure) of the World Wide Web. This structure has proven to be extremely useful for improving the performance of search engines and other tools for navigating the web. However, since the graphs in these scenarios involve hundreds of millions of nodes and even more edges, highly spaceefficient data structures are needed to fit the data in memory. A first step in this direction was done by the DEC Connectivity Server, which stores the graph in compressed form. In this paper, we describe techniques for compressing the graph structure of the web, and give experimental results of a prototype implementation. We attempt to exploit a variety of different sources of compressibility of these graphs and of the associated set of URLs in order to obtain good compression performance on a large web graph. 1
Dynamic models for file sizes and double pareto distributions
 Internet Mathematics
, 2002
"... Abstract. In this paper, we introduce and analyze a new, dynamic generative user model to explain the behavior of file size distributions. Our Recursive Forest File model combines multiplicative models that generate lognormal distributions with recent work on random graph models for the web. Unlike ..."
Abstract

Cited by 59 (0 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper, we introduce and analyze a new, dynamic generative user model to explain the behavior of file size distributions. Our Recursive Forest File model combines multiplicative models that generate lognormal distributions with recent work on random graph models for the web. Unlike similar previous work, our Recursive Forest File model allows new files to be created and old files to be deleted over time, and our analysis covers problematic issues such as correlation among file sizes. Moreover, our model allows natural variations where files that are copied or modified are more likely to be copied or modified subsequently. Previous empirical work suggests that file sizes tend to have a lognormal body but a Pareto tail. The Recursive Forest File model explains this behavior, yielding a double Pareto distribution, which has a Pareto tail but close to a lognormal body. We believe the Recursive Forest model may be useful for describing other power law phenomena in computer systems as well as other fields. 1.
The Structure of Broad Topics on the Web
 INTERNATIONAL WORLD WIDE WEB CONFERENCE
, 2002
"... The Web graph is a giant social network whose properties have been measured and modeled extensively in recent years. Most such studies concentrate on the graph structure alone, and do not consider textual properties of the nodes. Consequently, Web communities have been characterized purely in terms ..."
Abstract

Cited by 59 (1 self)
 Add to MetaCart
(Show Context)
The Web graph is a giant social network whose properties have been measured and modeled extensively in recent years. Most such studies concentrate on the graph structure alone, and do not consider textual properties of the nodes. Consequently, Web communities have been characterized purely in terms of graph structure and not on page content. We propose that a topic taxonomy such as Yahoo! or the Open Directory provides a useful framework for understanding the structure of contentbased clusters and communities. In particular, using a topic taxonomy and an automatic classifier, we can measure the background distribution of broad topics on the Web, and analyze the capability of recent random walk algorithms to draw samples which follow such distributions. In addition, we can measure the probability that a page about one broad topic will link to another broad topic. Extending this experiment, we can measure how quickly topic context is lost while walking randomly on the Web graph. Estimates of this topic mixing distance may explain why a global PageRank is still meaningful in the context of broad queries. In general, our measurements may prove valuable in the design of communityspecific crawlers and linkbased ranking systems.
Distributed Pagerank for P2P Systems
, 2003
"... This paper defines and describes a fully distributed implementation of Google's highly effective Pagerank algorithm, for "peer to peer"(P2P) systems. The implementation is based on chaotic (asynchronous) iterative solution of linear systems. The P2P implementation also enables increme ..."
Abstract

Cited by 50 (7 self)
 Add to MetaCart
This paper defines and describes a fully distributed implementation of Google's highly effective Pagerank algorithm, for "peer to peer"(P2P) systems. The implementation is based on chaotic (asynchronous) iterative solution of linear systems. The P2P implementation also enables incremental computation of pageranks as new documents are entered into or deleted from the network. Incremental update enables continuously accurate pageranks whereas the currently centralized web crawl and computation over Internet documents requires several days. This suggests possible applicability of the distributed algorithm to pagerank computations as a replacement for the centralized web crawler based implementation for Internet documents. A complete solution of the distributed pagerank computation for an inplace network converges rapidly (1% accuracy in 10 iterations) for large systems although the time for an iteration may be long. The incremental computation resulting from addition of a single document converges extremely rapidly, typically requiring update path lengths of under 15 nodes even for large networks and very accurate solutions.
Page quality: In search of an unbiased web ranking
, 2005
"... In a number of recent studies [4, 8] researchers have found that because search engines repeatedly return currently popular pages at the top of search results, popular pages tend to get even more popular, while unpopular pages get ignored by an average user. This “richgetricher ” phenomenon is par ..."
Abstract

Cited by 49 (3 self)
 Add to MetaCart
(Show Context)
In a number of recent studies [4, 8] researchers have found that because search engines repeatedly return currently popular pages at the top of search results, popular pages tend to get even more popular, while unpopular pages get ignored by an average user. This “richgetricher ” phenomenon is particularly problematic for new and highquality pages because they may never get a chance to get users ’ attention, decreasing the overall quality of search results in the long run. In this paper, we propose a new ranking function, called page quality that can alleviate the problem of popularitybased ranking. We first present a formal framework to study the search engine bias by discussing what is an “ideal ” way to measure the intrinsic quality of a page. We then compare how PageRank, the current ranking metric used by major search engines, differs from this ideal quality metric. This framework will help us investigate the search engine bias in more concrete terms and provide clear understanding on why PageRank is effective in many cases and exactly when it is problematic. We then propose a practical way to estimate the intrinsic page quality to avoid the inherent bias of PageRank. We derive our proposed quality estimator through a careful analysis of a reasonable web user model and we present experimental results that show the potential of our proposed estimator. We believe that our quality estimator has the potential to alleviate the richgetricher phenomenon and help new and highquality pages get the attention that they deserve. 1.
Multiplicative Attribute Graph Model of RealWorld Networks
, 1009
"... Large scale realworld network data, such as social networks, Internet and Web graphs, are ubiquitous. The study of such social and information networks seeks to find patterns and explain their emergence through tractable models. In most networks, especially in social networks, nodes have a rich set ..."
Abstract

Cited by 45 (4 self)
 Add to MetaCart
(Show Context)
Large scale realworld network data, such as social networks, Internet and Web graphs, are ubiquitous. The study of such social and information networks seeks to find patterns and explain their emergence through tractable models. In most networks, especially in social networks, nodes have a rich set of attributes (e.g., age, gender) associated with them. However, many existing network models focus on modeling the network structure while ignoring the features of the nodes. Here we present a model that we refer to as the Multiplicative Attribute Graphs (MAG), which naturally captures the interactions between the network structure and node attributes. We consider a model where each node has a vector of categorical latent attributes associated with it. The probability of an edge between a pair of nodes then depends on the product of individual attributeattribute similarities. This model yields itself to mathematical analysis and we derive thresholds for the connectivity and the emergence of the giant connected component, and show that the model gives rise to graphs with a constant diameter. We analyze the degree distribution to show that the model can produce networks with either lognormal or powerlaw degree distribution depending on certain conditions. 1