Results 1  10
of
80
Statistical properties of community structure in large social and information networks
"... A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structur ..."
Abstract

Cited by 120 (10 self)
 Add to MetaCart
A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structural properties of such sets of nodes. We define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales, and we study over 70 large sparse realworld networks taken from a wide range of application domains. Our results suggest a significantly more refined picture of community structure in large realworld networks than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small scales, and at larger size scales, the best possible communities gradually “blend in ” with the rest of the network and thus become less “communitylike.” This behavior is not explained, even at a qualitative level, by any of the commonlyused network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are wellembeddable in a lowdimensional structure, and from small social networks that have served as testbeds of community detection algorithms. We have found, however, that a generative model, in which new edges are added via an iterative “forest fire” burning process, is able to produce graphs exhibiting a network community structure similar to our observations.
Community structure in large networks: Natural cluster sizes and the absence of large welldefined clusters
, 2008
"... A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins wit ..."
Abstract

Cited by 79 (6 self)
 Add to MetaCart
A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins with the premise that a community or a cluster should be thought of as a set of nodes that has more and/or better connections between its members than to the remainder of the network. In this paper, we explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. Rather than defining a procedure to extract sets of nodes from a graph and then attempt to interpret these sets as a “real ” communities, we employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales. We study over 100 large realworld networks, ranging from traditional and online social networks, to technological and information networks and
Resisting Structural Reidentification in Anonymized Social Networks
, 2008
"... We identify privacy risks associated with releasing network data sets and provide an algorithm that mitigates those risks. A network consists of entities connected by links representing relations such as friendship, communication, or shared activity. Maintaining privacy when publishing networked dat ..."
Abstract

Cited by 60 (7 self)
 Add to MetaCart
We identify privacy risks associated with releasing network data sets and provide an algorithm that mitigates those risks. A network consists of entities connected by links representing relations such as friendship, communication, or shared activity. Maintaining privacy when publishing networked data is uniquely challenging because an individual’s network context can be used to identify them even if other identifying information is removed. In this paper, we quantify the privacy risks associated with three classes of attacks on the privacy of individuals in networks, based on the knowledge used by the adversary. We show that the risks of these attacks vary greatly based on network structure and size. We propose a novel approach to anonymizing network data that models aggregate network structure and then allows samples to be drawn from that model. The approach guarantees anonymity for network entities while preserving the ability to estimate a wide variety of network measures with relatively little bias.
LinkBased Characterization and Detection of Web Spam
 In AIRWeb
, 2006
"... We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We study several metrics such as degree correlations, number of neighbors, rank propagation through links, TrustRank and others to build several automatic web spam classifiers. This paper presents a stu ..."
Abstract

Cited by 48 (8 self)
 Add to MetaCart
We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We study several metrics such as degree correlations, number of neighbors, rank propagation through links, TrustRank and others to build several automatic web spam classifiers. This paper presents a study of the performance of each of these classifiers alone, as well as their combined performance. Using this approach we are able to detect 80.4% of the Web spam in our sample, with only 1.1% of false positives.
Randomizing Social Networks: a Spectrum Preserving Approach
, 2008
"... Understanding the general properties of real social networks has gained much attention due to the proliferation of networked data. The nodes in the network are the individuals and the links among them denote their relationships. Many applications of networks such as anonymous Web browsing require re ..."
Abstract

Cited by 40 (6 self)
 Add to MetaCart
Understanding the general properties of real social networks has gained much attention due to the proliferation of networked data. The nodes in the network are the individuals and the links among them denote their relationships. Many applications of networks such as anonymous Web browsing require relationship anonymity due to the sensitive, stigmatizing, or confidential nature of the relationship. One general approach for this problem is to randomize the edges in true networks, and only disclose the randomized networks. In this paper, we investigate how various properties of networks may be affected due to randomization. Specifically, we focus on the spectrum since the eigenvalues of a network are intimately connected to many important topological features. We also conduct theoretical analysis on the extent to which edge anonymity can be achieved. A spectrum preserving graph randomization method, which can better preserve network properties while protecting edge anonymity, is then presented and empirically evaluated.
Dynamics of Large Networks
, 2008
"... A basic premise behind the study of large networks is that interaction leads to complex collective behavior. In our work we found very interesting and counterintuitive patterns for time evolving networks, which change some of the basic assumptions that were made in the past. We then develop models ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
A basic premise behind the study of large networks is that interaction leads to complex collective behavior. In our work we found very interesting and counterintuitive patterns for time evolving networks, which change some of the basic assumptions that were made in the past. We then develop models that explain processes which govern the network evolution, fit such models to real networks, and use them to generate realistic graphs or give formal explanations about their properties. In addition, our work has a wide range of applications: it can help us spot anomalous graphs and outliers, forecast future graph structure and run simulations of network evolution. Another important aspect of our research is the study of “local ” patterns and structures of propagation in networks. We aim to identify building blocks of the networks and find the patterns of influence that these blocks have on information or virus propagation over the network. Our recent work included the study of the spread of influence in a large persontoperson
A Bibliometric and Network Analysis of the field of Computational Linguistics
, 2009
"... The ACL Anthology is a large collection of research papers in computational linguistics. Citation data was obtained using text extraction from a collection of PDF files with significant manual postprocessing performed to clean up the results. Manual annotation of the references was then performed t ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
The ACL Anthology is a large collection of research papers in computational linguistics. Citation data was obtained using text extraction from a collection of PDF files with significant manual postprocessing performed to clean up the results. Manual annotation of the references was then performed to complete the citation network. We analyzed the networks of paper citations, author citations, and author collaborations in an attempt to identify the most central papers and authors. Also, we propose an improved method for comparing different measures of impact based on correlation. The analysis includes general network statistics, PageRank, metrics across publication years and venues, impact factor and hindex, as well as other measures. 1
Coreperiphery organization of complex networks
 Physical Review E
, 2005
"... Networks may, or may not, be wired to have a core that is both itself densely connected and central in terms of graph distance. In this study we propose a coefficient to measure if the network has such a clearcut coreperiphery dichotomy. We measure this coefficient for a number of realworld and mo ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Networks may, or may not, be wired to have a core that is both itself densely connected and central in terms of graph distance. In this study we propose a coefficient to measure if the network has such a clearcut coreperiphery dichotomy. We measure this coefficient for a number of realworld and model networks and find that different classes of networks have their characteristic values. For example do geographical networks have a strong coreperiphery structure, while the coreperiphery structure of social networks (despite their positive degreedegree correlations) is rather weak. We proceed to study radial statistics of the core, i.e. properties of the nneighborhoods of the core vertices for increasing n. We find that almost all networks have unexpectedly many edges within nneighborhoods at a certain distance from the core suggesting an effective radius for nontrivial network processes.
Comparisons of randomization and kdegree anonymization schemes for privacy preserving social network publishing
 In SNAKDD ’09: Proceedings of the 3rd SIGKDD Workshop on Social Network Mining and Analysis (SNAKDD
, 2009
"... Many applications of social networks require identity and/or relationship anonymity due to the sensitive, stigmatizing, or confidential nature of user identities and their behaviors. Recent work showed that the simple technique of anonymizing graphs by replacing the identifying information of the no ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Many applications of social networks require identity and/or relationship anonymity due to the sensitive, stigmatizing, or confidential nature of user identities and their behaviors. Recent work showed that the simple technique of anonymizing graphs by replacing the identifying information of the nodes with random ids does not guarantee privacy since the identification of the nodes can be seriously jeopardized by applying background based attacks. In this paper, we investigate how well an edge based graph randomization approach can protect node identities and sensitive links. We quantify both identity disclosure and link disclosure when adversaries have one specific type of background knowledge (i.e., knowing the degrees of target individuals). We also conduct empirical comparisons with the recently proposed Kdegree anonymization schemes in terms of both utility and risks of privacy disclosures.
Network Properties Revealed Through Matrix Functions
, 2008
"... The newly emerging field of Network Science deals with the tasks of modelling, comparing and summarizing large data sets that describe complex interactions. Because pairwise affinity data can be stored in a twodimensional array, graph theory and applied linear algebra provide extremely useful tools. ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
The newly emerging field of Network Science deals with the tasks of modelling, comparing and summarizing large data sets that describe complex interactions. Because pairwise affinity data can be stored in a twodimensional array, graph theory and applied linear algebra provide extremely useful tools. Here, we focus on the general concepts of centrality, communicability and betweenness, each of which quantifies important features in a network. Some recent work in the mathematical physics literature has shown that the exponential of a network’s adjacency matrix can be used as the basis for defining and computing specific versions of these measures. We introduce here a general class of measures based on matrix functions, and show that a particular case involving a matrix resolvent arises naturally from graphtheoretic arguments. We also point out connections between these measures and the quantities typically computed when spectral methods are used for data mining tasks such as clustering and ordering. We finish with computational examples showing the new matrix resolvent version applied to real networks.