GraphTheoretic Analysis of Structured PeertoPeer Systems: Routing Distances and Fault Resilience
, 2003
"... This paper examines graphtheoretic properties of existing peertopeer architectures and proposes a new infrastructure based on optimaldiameter de Bruijn graphs. Since generalized de Bruijn graphs possess very short average routing distances and high resilience to node failure, they are well suite ..."
This paper examines graphtheoretic properties of existing peertopeer architectures and proposes a new infrastructure based on optimaldiameter de Bruijn graphs. Since generalized de Bruijn graphs possess very short average routing distances and high resilience to node failure, they are well suited for structured peertopeer networks. Using the example of Chord, CAN, and de Bruijn, we first study routing performance, graph expansion, and clustering properties of each graph. We then examine bisection width, path overlap, and several other properties that affect routing and resilience of peertopeer networks. Having confirmed that de Bruijn graphs offer the best diameter and highest connectivity among the existing peertopeer structures, we offer a very simple incremental building process that preserves optimal properties of de Bruijn graphs under uniform user joins/departures. We call the combined peertopeer architecture
The Economics of Social Networks.
 In Advances in Economics and Econometrics, Theory and Applications: Ninth World Congress of the Econometric Society.
, 2006
"... Abstract We analyze the problem of optimal monopoly pricing in social networks in order to characterize the influence of the network topology on the pricing rule. It is shown that this influence depends on the type of providers (local versus global monopoly) and of externalities (consumption versus ..."
Abstract We analyze the problem of optimal monopoly pricing in social networks in order to characterize the influence of the network topology on the pricing rule. It is shown that this influence depends on the type of providers (local versus global monopoly) and of externalities (consumption versus price). We identify two situations where the monopolist does not discriminate across nodes in the network (global monopoly with consumption externalities and local monopoly with price externalities) and characterize the relevant centrality index used to discriminate among nodes in the other situations. We also analyze the robustness of the analysis with respect to changes in demand, and the introduction of bargaining between the monopolist and the consumer. JEL Classification Numbers: D85, D43, C69 Keywords: Social Networks, Monopoly Pricing, Network Externalities, Reference Price, Centrality Measures * We dedicate this paper to the memory of Toni CalvóArmengol, a gifted network theorist and a wonderful friend. We thank Coralio Ballester,
Using PageRank to Characterize Web Structure
"... Recent work on modeling the web graph has dwelt on capturing the degree distributions observed on the web. Pointing out that this represents a heavy reliance on “local” properties of the web graph, we study the distribution of PageRank values on the web. Our measurements suggest that PageRank value ..."
Recent work on modeling the web graph has dwelt on capturing the degree distributions observed on the web. Pointing out that this represents a heavy reliance on “local” properties of the web graph, we study the distribution of PageRank values on the web. Our measurements suggest that PageRank values on the web follow a power law. We then develop generative models for the web graph that explain this observation and moreover remain faithful to previously studied degree distributions. We analyze these models and compare the analysis to both snapshots from the web and to graphs generated by simulations on the new models. To our knowledge this represents the first modeling of the web that goes beyond fitting degree distributions on the web.
A Random Graph Model for Power Law Graphs
 Experimental Math
, 2000
"... We propose a random graph m del which is a special case of sparse random graphs with given degree sequences which satisfy a power law. Thism odel involves only asm all num ber of param eters, called logsize and loglog growth rate. These param eters capturesom e universal characteristics ofm assive ..."
We propose a random graph m del which is a special case of sparse random graphs with given degree sequences which satisfy a power law. Thism odel involves only asm all num ber of param eters, called logsize and loglog growth rate. These param eters capturesom e universal characteristics ofm assive graphs. Furtherm re, from these paramfi ters, various properties of the graph can be derived. Forexam)(( for certain ranges of the paramJ?0CM we willcom?C7 the expected distribution of the sizes of the connectedcom onents which almJC surely occur with high probability. We will illustrate the consistency of our m del with the behavior of so m m ssive graphs derived from data in telecom unications. We will also discuss the threshold function, the giant com ponent, and the evolution of random graphs in thism del. 1
Random Evolution in Massive Graphs
, 2001
"... Many massive graphs (such as WWW graphs and Call graphs) share certain universal characteristics which can be described by socalled the "power law". In this paper, we will first briefly survey the history and previous work on power law graphs. Then we will give four evolution models for ge ..."
Many massive graphs (such as WWW graphs and Call graphs) share certain universal characteristics which can be described by socalled the "power law". In this paper, we will first briefly survey the history and previous work on power law graphs. Then we will give four evolution models for generating power law graphs by adding one node/edge at a time. We will show that for any given edge density and desired distributions for indegrees and outdegrees (not necessarily the same, but adhered to certain general conditions), the resulting graph will almost surely satisfy the power law and the in/outdegree conditions. We will show that our most general directed and undirected models include nearly all known models as special cases. In addition, we consider another crucial aspects of massive graphs that is called "scalefree" in the sense that the f requency of sampling (w.r.t. the growth rate) is independent of the parameter of the resulting power law graphs. We will show that our evolution models generate scalefree power law graphs. 1
SpamRank  Fully Automatic Link Spam Detection
 IN PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON ADVERSARIAL INFORMATION RETRIEVAL ON THE WEB (AIRWEB
, 2005
"... Spammers intend to increase the PageRank of certain spam pages by creating a large number of links pointing to them. We propose a novel method based on the concept of personalized PageRank that detects pages with an undeserved high PageRank value without the need of any kind of white or blacklists ..."
Spammers intend to increase the PageRank of certain spam pages by creating a large number of links pointing to them. We propose a novel method based on the concept of personalized PageRank that detects pages with an undeserved high PageRank value without the need of any kind of white or blacklists or other means of human intervention. We assume that spammed pages have a biased distribution of pages that contribute to the undeserved high PageRank value. We define SpamRank by penalizing pages that originate a suspicious PageRank share and personalizing PageRank on the penalties. Our method is tested on a 31 M page crawl of the .de domain with a manually classified 1000page stratified random sample with bias towards large PageRank values.
Duplication models for biological networks
 Journal of Computational Biology
, 2003
"... Are biological networks different from other large complex networks? Both large biological and nonbiological networks exhibit powerlaw graphs (number of nodes with degree k, N.k / � k ¡ ¯), yet the exponents, ¯, fall into different ranges. This may be because duplication of the information in the ..."
Are biological networks different from other large complex networks? Both large biological and nonbiological networks exhibit powerlaw graphs (number of nodes with degree k, N.k / � k ¡ ¯), yet the exponents, ¯, fall into different ranges. This may be because duplication of the information in the genome is a dominant evolutionary force in shaping biological networks (like gene regulatory networks and protein–protein interaction networks) and is fundamentally different from the mechanisms thought to dominate the growth of most nonbiological networks (such as the Internet). The preferential choice models used for nonbiological networks like web graphs can only produce powerlaw graphs with exponents greater than 2. We use combinatorial probabilistic methods to examine the evolution of graphs by node duplication processes and derive exact analytical relationships between the exponent of the power law and the parameters of the model. Both full duplication of nodes (with all their connections) as well as partial duplication (with only some connections) are analyzed. We demonstrate that partial duplication can produce powerlaw graphs with exponents less than 2, consistent with current data on biological networks. The powerlaw exponent for large graphs depends only on the growth process, not on the starting graph.
A Survey of Web Metrics
 ACM COMPUTING SURVEYS
, 2002
"... ... this article, we examine this issue by classifying and discussing a wide ranging set of Web metrics. We present the origins, measurement functions, formulations and comparisons of wellknown Web metrics for quantifying Web graph properties, Web page significance, Web page similarity, search a ..."
... this article, we examine this issue by classifying and discussing a wide ranging set of Web metrics. We present the origins, measurement functions, formulations and comparisons of wellknown Web metrics for quantifying Web graph properties, Web page significance, Web page similarity, search and retrieval, usage characterization and information theoretic properties. We also discuss how these metrics can be applied for improving Web information access and use.
The Open Source Software Development Phenomenon: An Analysis Based on Social Network Theory. Eighth Americas Conference on Information Systems
, 2002
"... The OSS movement is a phenomenon that challenges many traditional theories in economics, software engineering, business strategy, and IT management. Thousands of software programmers are spending tremendous amounts of time and effort writing and debugging software, most often with no direct monetary ..."
The OSS movement is a phenomenon that challenges many traditional theories in economics, software engineering, business strategy, and IT management. Thousands of software programmers are spending tremendous amounts of time and effort writing and debugging software, most often with no direct monetary compensation. The programs, some of which are extremely large and complex, are written without the benefit of traditional project management, change tracking, or error checking techniques. Since the programmers are working outside of a traditional organizational reward structure, accountability is an issue as well. A significant portion of internet ecommerce runs on OSS, and thus many firms have little choice but to trust missioncritical ecommerce systems to run on such software, requiring IT management to deal with new types of sociotechnical problems. A better understanding of how the OSS community functions may help IT planners make more informed decisions and develop more effective strategies for using OSS software. We hypothesize that open source software development can be modeled as selforganizing, collaboration, social networks. We analyze structural data on over 39,000 open source projects hosted at SourceForge.net involving over 33,000 developers. We define two software developers to be connected part of a collaboration social network if they are members of the same project, or are connected by a chain of connected developers. Project sizes, developer project participation, and clusters of connected developers are analyzed. We find evidence to support our hypothesis, primarily in the presence of powerlaw relationships on project sizes (number of developers per project), project membership (number of projects joined by a developer), and cluster sizes. Potential implications for IT researchers, IT managers, and governmental policy makers are discussed.
On Cubical Graphs
 JOURNAL OF COMBINATORIAL THEORY (B) 18, 86 % (1975)
, 1975
"... It is frequently of interest to represent a given graph G as a subgraph of a graph H which has some special structure. A particularly useful class of graphs in which to embed G is the class of ndimensional cubes. This has found applications, for example, in coding theory, data transmission, and lin ..."
It is frequently of interest to represent a given graph G as a subgraph of a graph H which has some special structure. A particularly useful class of graphs in which to embed G is the class of ndimensional cubes. This has found applications, for example, in coding theory, data transmission, and linguistics. In this note, we study the structure of those graphs 6, called cubical graphs (not to be confused with cubic graphs, those graphs for which all vertices have degree 3), which can be embedded into an ndimensional cube. A basic technique used is the investigation of graphs which are critically nonembeddable, i.e., which can not be embedded but all of whose subgrapbs can be embedded.