Results 1 - 10
of
42
Statistical properties of community structure in large social and information networks
"... A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structur ..."
Abstract
-
Cited by 65 (6 self)
- Add to MetaCart
A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structural properties of such sets of nodes. We define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales, and we study over 70 large sparse real-world networks taken from a wide range of application domains. Our results suggest a significantly more refined picture of community structure in large real-world networks than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small scales, and at larger size scales, the best possible communities gradually “blend in ” with the rest of the network and thus become less “community-like.” This behavior is not explained, even at a qualitative level, by any of the commonly-used network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are well-embeddable in a low-dimensional structure, and from small social networks that have served as testbeds of community detection algorithms. We have found, however, that a generative model, in which new edges are added via an iterative “forest fire” burning process, is able to produce graphs exhibiting a network community structure similar to our observations.
Resisting Structural Re-identification in Anonymized Social Networks
, 2008
"... We identify privacy risks associated with releasing network data sets and provide an algorithm that mitigates those risks. A network consists of entities connected by links representing relations such as friendship, communication, or shared activity. Maintaining privacy when publishing networked dat ..."
Abstract
-
Cited by 38 (7 self)
- Add to MetaCart
We identify privacy risks associated with releasing network data sets and provide an algorithm that mitigates those risks. A network consists of entities connected by links representing relations such as friendship, communication, or shared activity. Maintaining privacy when publishing networked data is uniquely challenging because an individual’s network context can be used to identify them even if other identifying information is removed. In this paper, we quantify the privacy risks associated with three classes of attacks on the privacy of individuals in networks, based on the knowledge used by the adversary. We show that the risks of these attacks vary greatly based on network structure and size. We propose a novel approach to anonymizing network data that models aggregate network structure and then allows samples to be drawn from that model. The approach guarantees anonymity for network entities while preserving the ability to estimate a wide variety of network measures with relatively little bias.
Link-Based Characterization and Detection of Web Spam
- In AIRWeb
, 2006
"... We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We study several metrics such as degree correlations, number of neighbors, rank propagation through links, TrustRank and others to build several automatic web spam classifiers. This paper presents a stu ..."
Abstract
-
Cited by 38 (8 self)
- Add to MetaCart
We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We study several metrics such as degree correlations, number of neighbors, rank propagation through links, TrustRank and others to build several automatic web spam classifiers. This paper presents a study of the performance of each of these classifiers alone, as well as their combined performance. Using this approach we are able to detect 80.4% of the Web spam in our sample, with only 1.1% of false positives.
Community structure in large networks: Natural cluster sizes and the absence of large welldefined clusters
- CoRR
"... A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins wit ..."
Abstract
-
Cited by 34 (3 self)
- Add to MetaCart
A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins with the premise that a community or a cluster should be thought of as a set of nodes that has more and/or better connections between its members than to the remainder of the network. In this paper, we explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. Rather than defining a procedure to extract sets of nodes from a graph and then attempt to interpret these sets as a “real ” communities, we employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales. We study over 100 large real-world networks, ranging from traditional and on-line social networks, to technological and information networks and
Randomizing Social Networks: a Spectrum Preserving Approach
, 2008
"... Understanding the general properties of real social networks has gained much attention due to the proliferation of networked data. The nodes in the network are the individuals and the links among them denote their relationships. Many applications of networks such as anonymous Web browsing require re ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
Understanding the general properties of real social networks has gained much attention due to the proliferation of networked data. The nodes in the network are the individuals and the links among them denote their relationships. Many applications of networks such as anonymous Web browsing require relationship anonymity due to the sensitive, stigmatizing, or confidential nature of the relationship. One general approach for this problem is to randomize the edges in true networks, and only disclose the randomized networks. In this paper, we investigate how various properties of networks may be affected due to randomization. Specifically, we focus on the spectrum since the eigenvalues of a network are intimately connected to many important topological features. We also conduct theoretical analysis on the extent to which edge anonymity can be achieved. A spectrum preserving graph randomization method, which can better preserve network properties while protecting edge anonymity, is then presented and empirically evaluated.
Dynamics of Large Networks
, 2008
"... A basic premise behind the study of large networks is that interaction leads to complex collective behavior. In our work we found very interesting and counterintuitive patterns for time evolving networks, which change some of the basic assumptions that were made in the past. We then develop models ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
A basic premise behind the study of large networks is that interaction leads to complex collective behavior. In our work we found very interesting and counterintuitive patterns for time evolving networks, which change some of the basic assumptions that were made in the past. We then develop models that explain processes which govern the network evolution, fit such models to real networks, and use them to generate realistic graphs or give formal explanations about their properties. In addition, our work has a wide range of applications: it can help us spot anomalous graphs and outliers, forecast future graph structure and run simulations of network evolution. Another important aspect of our research is the study of “local ” patterns and structures of propagation in networks. We aim to identify building blocks of the networks and find the patterns of influence that these blocks have on information or virus propagation over the network. Our recent work included the study of the spread of influence in a large person-to-person
Graph generation with prescribed feature constraints
- In Proc. of the 9th SIAM Conference on Data Mining
, 2009
"... In this paper, we study the problem of how to generate synthetic graphs matching various properties of a real social network with two applications, privacy preserving social network publishing and significance testing of network analysis results. We present a simple switching based graph generation ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
In this paper, we study the problem of how to generate synthetic graphs matching various properties of a real social network with two applications, privacy preserving social network publishing and significance testing of network analysis results. We present a simple switching based graph generation approach to generate graphs preserving features of a real graph. We then investigate potential disclosures of sensitive links due to the preserved features. Our algorithms on graph generation with feature range and feature distribution constraints are based on the Metropolis-Hastings sampling. This is of importance for significance testing of network analysis results. 1
A Bibliometric and Network Analysis of the field of Computational Linguistics
, 2009
"... The ACL Anthology is a large collection of research papers in computational linguistics. Citation data was obtained using text extraction from a collection of PDF files with significant manual post-processing performed to clean up the results. Manual annotation of the references was then performed t ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The ACL Anthology is a large collection of research papers in computational linguistics. Citation data was obtained using text extraction from a collection of PDF files with significant manual post-processing performed to clean up the results. Manual annotation of the references was then performed to complete the citation network. We analyzed the networks of paper citations, author citations, and author collaborations in an attempt to identify the most central papers and authors. Also, we propose an improved method for comparing different measures of impact based on correlation. The analysis includes general network statistics, PageRank, metrics across publication years and venues, impact factor and h-index, as well as other measures. 1
Autonomously controlled processes characterisation of complex production systems
- Proceedings of 3rd CIRP Conference in Digital Enterprise Technology, Setubal
, 2006
"... Due to the increasing complexity of today’s logistic systems, new planning and control methods are necessary. Autonomously controlled processes are a possible solution to cope with these new requirements. In order to verify this thesis, the development of an evaluation system is necessary which meas ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Due to the increasing complexity of today’s logistic systems, new planning and control methods are necessary. Autonomously controlled processes are a possible solution to cope with these new requirements. In order to verify this thesis, the development of an evaluation system is necessary which measures the logistic objective achievement, the level of autonomous control and the level of complexity. Within this paper an adequate operationalisation of complexity in production systems is aspired. For this purpose a complexity cube is derived in order to characterize production systems regarding their level of complexity. The different types of complexity in this cube are represented by vectors which allow measurement and comparison of different types of complexity for different production systems. The application of the complexity cube is illustrated using an exemplary job shop manufacturing scenario. 1.

