Results 1 - 10
of
13
On Unbiased Sampling for Unstructured Peer-to-Peer Networks
- in Proc. ACM IMC
, 2006
"... This paper addresses the difficult problem of selecting representative samples of peer properties (e.g., degree, link bandwidth, number of files shared) in unstructured peer-to-peer systems. Due to the large size and dynamic nature of these systems, measuring the quantities of interest on every peer ..."
Abstract
-
Cited by 29 (6 self)
- Add to MetaCart
This paper addresses the difficult problem of selecting representative samples of peer properties (e.g., degree, link bandwidth, number of files shared) in unstructured peer-to-peer systems. Due to the large size and dynamic nature of these systems, measuring the quantities of interest on every peer is often prohibitively expensive, while sampling provides a natural means for estimating system-wide behavior efficiently. However, commonly-used sampling techniques for measuring peer-to-peer systems tend to introduce considerable bias for two reasons. First, the dynamic nature of peers can bias results towards short-lived peers, much as naively sampling flows in a router can lead to bias towards short-lived flows. Second, the heterogeneous nature of the overlay topology can lead to bias towards high-degree peers. We present a detailed examination of the ways that the behavior of peer-to-peer systems can introduce bias and suggest the Metropolized Random Walk with Backtracking (MRWB) as a viable and promising technique for collecting nearly unbiased samples. We conduct an extensive simulation study to demonstrate that the proposed technique works well for a wide variety of common peer-to-peer network conditions. Using the Gnutella network, we empirically show that our implementation of the MRWB technique yields more accurate samples than relying on commonlyused sampling techniques. Furthermore, we provide insights into the causes of the observed differences. The tool we have developed, ion-sampler, selects peer addresses uniformly at random using the MRWB technique. These addresses may then be used as input to another measurement tool to collect data on a particular property.
The many facets of Internet topology and traffic
- Networks and Heterogeneous Media
"... ABSTRACT. The Internet’s layered architecture and organizational structure give rise to a number of different topologies, with the lower layers defining more physical and the higher layers more virtual/logical types of connectivity structures. These structures are very different, and successful Inte ..."
Abstract
-
Cited by 10 (8 self)
- Add to MetaCart
ABSTRACT. The Internet’s layered architecture and organizational structure give rise to a number of different topologies, with the lower layers defining more physical and the higher layers more virtual/logical types of connectivity structures. These structures are very different, and successful Internet topology modeling requires annotating the nodes and edges of the corresponding graphs with information that reflects their network-intrinsic meaning. These structures also give rise to different representations of the traffic that traverses the heterogeneous Internet, and a traffic matrix is a compact and succinct description of the traffic exchanges between the nodes in a given connectivity structure. In this paper, we summarize recent advances in Internet research related to (i) inferring and modeling the router-level topologies of individual service providers (i.e., the physical connectivity structure of an ISP, where nodes are routers/switches and links represent physical connections), (ii) estimating the intra-AS traffic matrix when the AS’s router-level topology and routing configuration are known, (iii) inferring and modeling the Internet’s AS-level topology, and (iv) estimating the inter-AS traffic matrix. We will also discuss recent work on Internet connectivity structures that arise at the higher layers in the TCP/IP protocol stack and are more virtual and dynamic; e.g., overlay networks like the WWW graph, where nodes are web pages and edges represent existing hyperlinks, or P2P networks like Gnutella, where nodes represent peers and two peers are connected if they have an active network connection. 1. Introduction. The
Giant component and connectivity in geographical threshold graphs
- In Proceedings of the 5th Workshop On Algorithms And Models For The Web-Graph (WAW2007
, 2007
"... Abstract. The geographical threshold graph model is a random graph model with nodes distributed in a Euclidean space and edges assigned through a function of distance and node weights. We study this model and give conditions for the absence and existence of the giant component, as well as for connec ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Abstract. The geographical threshold graph model is a random graph model with nodes distributed in a Euclidean space and edges assigned through a function of distance and node weights. We study this model and give conditions for the absence and existence of the giant component, as well as for connectivity.
The Structure of Geographical Threshold Graphs
- 9 M. Bradonjić and Joseph Kong, Wireless Ad Hoc Networks with Tunable Topology, Proceedings of the 45th Annual Allerton Conference on Communication, Control and Computing
, 2007
"... Abstract. We analyze the structure of random graphs generated by the geographical threshold model. The model is a generalization of random geometric graphs. Nodes are distributed in space, and edges are assigned according to a threshold function involving the distance between nodes as well as random ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Abstract. We analyze the structure of random graphs generated by the geographical threshold model. The model is a generalization of random geometric graphs. Nodes are distributed in space, and edges are assigned according to a threshold function involving the distance between nodes as well as randomly chosen node weights. We show how the degree distribution, percolation and connectivity transitions, clustering coefficient, and diameter relate to the threshold value and weight distribution. We give bounds on the threshold value guaranteeing the presence or absence of a giant component, connectivity and disconnectivity of the graph, and small diameter. Finally, we consider the clustering coefficient for nodes with a given degree l, finding that its scaling is very close to 1/l when the node weights are exponentially distributed. 1.
NETWORK SECURITY IN MODELS OF COMPLEX NETWORKS
"... Abstract. Vertex pursuit games, such as the game of Cops and Robber, are a simplified model for network security. In these games, cops try to capture a robber loose on the vertices of the network. The minimum number of cops required to win on a graph G is the cop number of G. We present asymptotic r ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. Vertex pursuit games, such as the game of Cops and Robber, are a simplified model for network security. In these games, cops try to capture a robber loose on the vertices of the network. The minimum number of cops required to win on a graph G is the cop number of G. We present asymptotic results for the game of Cops and Robber played in various stochastic network models, such as in G(n, p) with non-constant p, and in random power law graphs. We find bounds for the cop number of G(n, p) for a large range of p as a function of n. We prove that the cop number of random power law graphs with n vertices is asymptotically almost surely Θ(n). The cop number of the core of random power law graphs is investigated, and is proved to be of smaller order than the order of the core. 1.
Infinite limits of the duplication model and graph folding
, 2005
"... We study infinite limits of graphs generated by the duplication model for biological networks. We prove that with probability 1, the sole nontrivial connected component of the limits is unique up to isomorphism. We describe certain infinite deterministic graphs which arise naturally from the model. ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We study infinite limits of graphs generated by the duplication model for biological networks. We prove that with probability 1, the sole nontrivial connected component of the limits is unique up to isomorphism. We describe certain infinite deterministic graphs which arise naturally from the model. We characterize the isomorphism type and induced subgraph structure of these infinite graphs using the notion of dismantlability from the theory of vertex pursuit games, and graph homomorphisms.
Characterization of graphs using degree cores
- in WAW, 2006
"... Abstract. Generative models are often used in modeling real world graphs such as the Web graph in order to better understand the processes through which these graphs are formed. In order to determine if a graph might have been generated by a given model one must compare the features of that graph wi ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. Generative models are often used in modeling real world graphs such as the Web graph in order to better understand the processes through which these graphs are formed. In order to determine if a graph might have been generated by a given model one must compare the features of that graph with those generated by the model. We introduce the concept of a hierarchical degree core tree as a novel way of summarizing the structure of massive graphs. Hierarchical degree core trees are representations of the subgraph relationship between the components of the degree core of the graph, ranging over all possible values of k. From these trees we extract features related to the graph’s local structure from these hierarchical trees. Using these features, we compare four real world graphs (a web graph, a patent citation graph, a co-authorship graph and an email graph) against a number of generative models. All the graphs, with the exception of the email graph, show markedly different features from our generative models. Conversely, the email graph appears to have similar features to a number of our generative models, particularly to the partial duplication model of Chung and Lu. 1
Combinatorial and Numerical Analysis of Geographical Threshold Graphs
"... Abstract. We analyze the structure of random graphs generated by the geographic threshold model. The model is a generalization of random geometric graphs. Nodes are distributed in space, and edges are assigned according to a threshold function involving the distance between nodes as well as randomly ..."
Abstract
- Add to MetaCart
Abstract. We analyze the structure of random graphs generated by the geographic threshold model. The model is a generalization of random geometric graphs. Nodes are distributed in space, and edges are assigned according to a threshold function involving the distance between nodes as well as randomly chosen node weights. We show how the degree distribution, percolation and connectivity transitions, diameter and clustering coefficient are related to the weight distribution and threshold values. Key words: random graph, geographical threshold graph, giant component, connectivity, clustering coefficient. 1
November 2009Efficient Social Website Crawling Using Cluster Graph ABSTRACT
"... Online social communities have gained significant popularity in recent years and have become an area of active research. Compared with general websites or well-structured Web forums, user-centered social websites pose several unique challenges for crawling, a fundamental task for data collection and ..."
Abstract
- Add to MetaCart
Online social communities have gained significant popularity in recent years and have become an area of active research. Compared with general websites or well-structured Web forums, user-centered social websites pose several unique challenges for crawling, a fundamental task for data collection and data mining of large-scale online social communities: (1) Social websites have more complex link structures and much higher indegree and outdegree, resulting in a large number of duplicate links; (2) Social websites contain large amounts of duplicate content usually listed under different URLs; (3) Social websites are interactive in nature, containing a large number of action or uninformative webpages such as login, tell-a-friend, or commenting; and (4) Social webpages differ dramatically in URL format, link structure, and page layout, due to their diverse semantics, functionalities, and user customization. Previous crawler designs targeting the general Web or well-structured Web forums are inadequate for social websites, wasting network bandwidth, storage space, and causing extra overload in social network analysis and data mining tasks. This work tackles the problem of efficient social website crawling by proposing two key techniques: (1) URL-based webpage clustering that identifies frequent itemsets in URLs and groups webpages into semantic clusters; and (2) cluster graph pruning that removes edges and nodes representing duplicate links, duplicate or uninformative content. The offline trained webpage cluster graph is then used at runtime to direct the crawling process. By using only URLs and page link structures, our cluster-graph-based approach can successfully address the challenges in crawling social websites. Extensive evaluations on three different social websites demonstrate that our approach can effectively and efficiently crawl large amounts of informative social content while dramatically reducing the number of duplicate links as well as the amount of duplicate or uninformative content. 1.
Universal random semi-directed graphs
"... Motivated by models for real-world networks such as the web graph, we consider digraphs formed by adding new vertices joined to a fixed constant m number of existing vertices of prescribed type. We consider a certain on-line random construction of a countably infinite graph with out-degree m, and sh ..."
Abstract
- Add to MetaCart
Motivated by models for real-world networks such as the web graph, we consider digraphs formed by adding new vertices joined to a fixed constant m number of existing vertices of prescribed type. We consider a certain on-line random construction of a countably infinite graph with out-degree m, and show that with probability 1 the construction gives rise to a unique isomorphism type. We study algebraic properties of these so-called random semi-directed graphs; in particular, we prove that their automorphism groups embed all countable groups. 1.

