Results 11  20
of
110
The Internet is Flat: Modeling the Transition from a Transit Hierarchy to a Peering Mesh
 in Proceedings of ACM CoNEXT
, 2010
"... Recent measurements and anecdotal evidence indicate that the Internet ecosystem is rapidly evolving from a multitier hierarchybuiltmostlywithtransit(customerprovider)links toadensemeshformedwithmostlypeeringlinks. ThistransitioncanhavemajorimpactontheglobalInterneteconomy aswell asonthe trafficflo ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
Recent measurements and anecdotal evidence indicate that the Internet ecosystem is rapidly evolving from a multitier hierarchybuiltmostlywithtransit(customerprovider)links toadensemeshformedwithmostlypeeringlinks. ThistransitioncanhavemajorimpactontheglobalInterneteconomy aswell asonthe trafficflowandtopologicalstructureofthe Internet. In this paper, we study this evolutionarytransition with an agentbased network formation model that captures key aspectsof the interdomainecosystem, viz., interdomain traffic flow and routing, provider and peer selection strategies, geographical constraints, and the economics of transit and peering interconnections. The model predicts several substantial differences between the Hierarchical Internet and the Flat Internet in terms of topological structure, pathlengths,interdomaintrafficflow,andtheprofitabilityof transitproviders. Wealsoquantifytheeffectofthethreefactors driving this evolutionary transition. Finally, we examineahypotheticalscenarioinwhichalargecontentprovider producesmorethanhalfofthetotalInternettraffic. 1.
Timevarying graphs and dynamic networks
 International Journal of Parallel, Emergent and Distributed Systems
"... The past few years have seen intensive research efforts carried out in some apparently unrelated areas of dynamic systems – delaytolerant networks, opportunisticmobility networks, social networks – obtaining closely related insights. Indeed, the concepts discovered in these investigations can be v ..."
Abstract

Cited by 18 (7 self)
 Add to MetaCart
The past few years have seen intensive research efforts carried out in some apparently unrelated areas of dynamic systems – delaytolerant networks, opportunisticmobility networks, social networks – obtaining closely related insights. Indeed, the concepts discovered in these investigations can be viewed as parts of the same conceptual universe; and the formal models proposed so far to express some specific concepts are components of a larger formal description of this universe. The main contribution of this paper is to integrate the vast collection of concepts, formalisms, and results found in the literature into a unified framework, which we call TVG (for timevarying graphs). Using this framework, it is possible to express directly in the same formalism not only the concepts common to all those different areas, but also those specific to each. Based on this definitional work, employing both existing results and original observations, we present a hierarchical classification of TVGs; each class corresponds to a significant property examined in the distributed computing literature. We then examine how TVGs can be used to study the evolution of network properties, and propose different techniques, depending on whether the indicators for these properties are atemporal (as in the majority of existing studies) or temporal. Finally, we briefly discuss the introduction of randomness in TVGs.
PowerGraph: Distributed GraphParallel Computation on Natural Graphs
"... Largescale graphstructured computation is central to tasks ranging from targeted advertising to natural language processing and has led to the development of several graphparallel abstractions including Pregel and GraphLab. However, the natural graphs commonly found in the realworld have highly ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
Largescale graphstructured computation is central to tasks ranging from targeted advertising to natural language processing and has led to the development of several graphparallel abstractions including Pregel and GraphLab. However, the natural graphs commonly found in the realworld have highly skewed powerlaw degree distributions, which challenge the assumptions made by these abstractions, limiting performance and scalability. In this paper, we characterize the challenges of computation on natural graphs in the context of existing graphparallel abstractions. We then introduce the PowerGraph abstraction which exploits the internal structure of graph programs to address these challenges. Leveraging the PowerGraph abstraction we introduce a new approach to distributed graph placement and representation that exploits the structure of powerlaw graphs. We provide a detailed analysis and experimental evaluation comparing PowerGraph to two popular graphparallel systems. Finally, we describe three different implementation strategies for PowerGraph and discuss their relative merits with empirical evaluations on largescale realworld problems demonstrating order of magnitude gains. 1
HADI: Mining radii of large graphs
 ACM Transactions on Knowledge Discovery from Data
, 2010
"... Given large, multimillion node graphs (e.g., Facebook, webcrawls, etc.), how do they evolve over time? How are they connected? What are the central nodes and the outliers? In this paper we define the Radius plot of a graph and show how it can answer these questions. However, computing the Radius p ..."
Abstract

Cited by 16 (8 self)
 Add to MetaCart
Given large, multimillion node graphs (e.g., Facebook, webcrawls, etc.), how do they evolve over time? How are they connected? What are the central nodes and the outliers? In this paper we define the Radius plot of a graph and show how it can answer these questions. However, computing the Radius plot is prohibitively expensive for graphs reaching the planetary scale. There are two major contributions in this paper: (a) We propose HADI (HAdoop DIameter and radii estimator), a carefully designed and finetuned algorithm to compute the radii and the diameter of massive graphs, that runs on the top of the Hadoop/MapReduce system, with excellent scaleup on the number of available machines (b) We run HADI on several real world datasets including YahooWeb (6B edges, 1/8 of a Terabyte), one of the largest public graphs ever analyzed. Thanks to HADI, we report fascinating patterns on large networks, like the surprisingly small effective diameter, the multimodal/bimodal shape of the Radius plot, and its palindrome motion over time.
The timeseries link prediction problem with applications in communication surveillance
 INFORMS Journal on Computing
, 2009
"... The ability to predict linkages among data objects is central to many data mining tasks, such as product recommendation and social network analysis. A substantial literature has been devoted to the link prediction problem either as an implicitly embedded problem in specific applications or as a gene ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
The ability to predict linkages among data objects is central to many data mining tasks, such as product recommendation and social network analysis. A substantial literature has been devoted to the link prediction problem either as an implicitly embedded problem in specific applications or as a generic data mining task. This literature has mostly adopted a static graph representation where a snapshot of the network is analyzed to predict hidden or future links. However, this representation is only appropriate to investigate whether certain link will ever occur or not and does not apply to many applications for which the prediction of the repeated link occurrences are of main interest (e.g., communication network surveillance). In this paper, we introduce the time series link prediction problem, taking into consideration temporal evolutions of link occurrences to predict link occurrence probabilities at a particular time. Using the Enron email data and highenergy particle physics literature coauthorship data we have demonstrated that time series models of single link occurrences achieved comparable link prediction performance with commonly used static graph link prediction algorithms. Furthermore, combination of static graph link prediction algorithms and time series model produced significantly improved predictions than static graph link prediction methods, demonstrating the great potential of integrated methods that exploit both interlink structural dependencies and intralink temporal dependencies. Key words: analysis of algorithms; communication networks; link prediction; statistical analysis; time series analysis. 1.
Correcting for Missing Data in Information Cascades
, 2010
"... Transmission of infectious diseases, propagation of information, and spread of ideas and influence through social networks are all examples of diffusion. In such cases we say that a contagion spreads through the network, a process that can be modeled by a cascade graph. Studying cascades and network ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Transmission of infectious diseases, propagation of information, and spread of ideas and influence through social networks are all examples of diffusion. In such cases we say that a contagion spreads through the network, a process that can be modeled by a cascade graph. Studying cascades and network diffusion is challenging due to missing data. Even a single missing observation in a sequence of propagation events can significantly alter our inferences about the diffusion process. We address the problem of missing data in information cascades. Specifically, given only a fraction C ′ of the complete cascade C, our goal is to estimate the properties of the complete cascade C, such as its size or depth. To estimate the properties of C, we first formulate ktree model of cascades and analytically study its properties in the face of missing data. We then propose a numerical method that given a cascade model and observed cascade C ′ can estimate properties ofthecomplete cascade C. Weevaluate our methodology usinginformation propagation cascades in the Twitter network (70 million nodes and 2 billion edges), as well as information cascades arising in the blogosphere. Our experiments show that the ktree model is an effective tool to study the effects of missing data in cascades. Most importantly, we show that our method (and the ktree model) can accurately estimate properties of the complete cascade C even when 90 % of the data is missing. 1
Measurement and Analysis
 of Online Social Networks,” in 7th ACM SIGCOMM Internet Measurement Conference (IMC
, 2007
"... Recently, online social networking sites have exploded in popularity. Numerous sites are dedicated to finding and maintaining contacts and to locating and sharing different types of content. Online social networks represent a new kind of information network that differs significantly from existing n ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
Recently, online social networking sites have exploded in popularity. Numerous sites are dedicated to finding and maintaining contacts and to locating and sharing different types of content. Online social networks represent a new kind of information network that differs significantly from existing networks like the Web. For example, in the Web, hyperlinks between content form a graph that is used to organize, navigate, and rank information. The properties of the Web graph have been studied extensively, and have lead to useful algorithms such as PageRank. In contrast, few links exist between content in online social networks and instead, the links exist between content and users, and between users themselves. However, little is known in the research community about the properties of online social network graphs at scale, the factors that shape their structure, or the ways they can be leveraged in information systems. In this thesis, we use novel measurement techniques to study online social networks at scale, and use the resulting insights to design innovative new information systems. First, we examine the structure and growth patterns of online social networks, focusing on how users are connecting to one another. We conduct the first
We Know Who You Followed Last Summer: Inferring Social Link Creation Times in Twitter
, 2011
"... Understanding a network’s temporal evolution appears to require multiple observations of the graph over time. These often expensive repeated crawls are only able to answer questions about what happened from observation to observation, and not what happened before or between network snapshots. Contra ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Understanding a network’s temporal evolution appears to require multiple observations of the graph over time. These often expensive repeated crawls are only able to answer questions about what happened from observation to observation, and not what happened before or between network snapshots. Contrary to this picture, we propose a method for Twitter’s social network that takes a single static snapshot of network edges and user account creation times to accurately infer when these edges were formed. This method can be exact in theory, and we demonstrate empirically for a large subset of Twitter relationships that it is accurate to within a few hours in practice. We study users who have a very large number of edges or who are recommended by Twitter. We examine the graph formed by these nearly 1,800 Twitter celebrities and their 862 million edges in detail, showing that a single static snapshot can give novel insights about Twitter’s evolution. We conclude from this analysis that realworld events and changes to Twitter’s interface for recommending users strongly influence network growth.
Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs
"... This paper introduces LDAG, a scalable Bayesian approach to finding latent group structures in large realworld graph data. Existing Bayesian approaches for group discovery (such as Infinite Relational Models) have only been applied to small graphs with a couple of hundred nodes. LDAG (short for L ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
This paper introduces LDAG, a scalable Bayesian approach to finding latent group structures in large realworld graph data. Existing Bayesian approaches for group discovery (such as Infinite Relational Models) have only been applied to small graphs with a couple of hundred nodes. LDAG (short for Latent Dirichlet Allocation for Graphs) utilizes a wellknown topic modeling algorithm to find latent group structure. Specifically, we modify Latent Dirichlet Allocation (LDA) to operate on graph data instead of text corpora. Our modifications reflect the differences between realworld graph data and text corpora (e.g., a node’s neighbor count vs. a document’s word count). In our empirical study, we apply LDAG to several large graphs (with thousands of nodes) from PubMed (a scientific publication repository). We compare LDAG’s quantitative performance on link prediction with two existing approaches: one Bayesian (namely, Infinite Relational Model) and one nonBayesian (namely, Crossassociations). On average, LDAG outperforms IRM by 15 % and Crossassociations by 25 % (in terms of area under the ROC curve). Furthermore, we demonstrate that LDAG can discover useful qualitative information.
Is Wikipedia link structure different
 In Proceedings of the Second ACM International Conference on Web Search and Data Mining (WSDM
, 2009
"... In this paper, we investigate the difference between Wikipedia and Web link structure with respect to their value as indicators of the relevance of a page for a given topic of request. Our experimental evidence is from two IR testcollections: the.GOV collection used at the TREC Web tracks and the W ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
In this paper, we investigate the difference between Wikipedia and Web link structure with respect to their value as indicators of the relevance of a page for a given topic of request. Our experimental evidence is from two IR testcollections: the.GOV collection used at the TREC Web tracks and the Wikipedia XML Corpus used at INEX. We first perform a comparative analysis of Wikipedia and.GOV link structure and then investigate the value of link evidence for improving search on Wikipedia and on the.GOV domain. Our main findings are: First, Wikipedia link structure is similar to the Web, but more densely linked. Second, Wikipedia’s outlinks behave similar to inlinks and both are good indicators of relevance, whereas on the Web the inlinks are more important. Third, when incorporating link evidence in the retrieval model, for Wikipedia the global link evidence fails and we have to take the local context into account.