Results 1  10
of
20
Resisting Structural Reidentification in Anonymized Social Networks
, 2008
"... We identify privacy risks associated with releasing network data sets and provide an algorithm that mitigates those risks. A network consists of entities connected by links representing relations such as friendship, communication, or shared activity. Maintaining privacy when publishing networked dat ..."
Abstract

Cited by 60 (7 self)
 Add to MetaCart
We identify privacy risks associated with releasing network data sets and provide an algorithm that mitigates those risks. A network consists of entities connected by links representing relations such as friendship, communication, or shared activity. Maintaining privacy when publishing networked data is uniquely challenging because an individual’s network context can be used to identify them even if other identifying information is removed. In this paper, we quantify the privacy risks associated with three classes of attacks on the privacy of individuals in networks, based on the knowledge used by the adversary. We show that the risks of these attacks vary greatly based on network structure and size. We propose a novel approach to anonymizing network data that models aggregate network structure and then allows samples to be drawn from that model. The approach guarantees anonymity for network entities while preserving the ability to estimate a wide variety of network measures with relatively little bias.
A Sequential Algorithm for Generating Random Graphs
, 2006
"... Abstract. We present a nearlylinear time algorithm for counting and randomly generating simple graphs with a given degree sequence in a certain range. For degree sequence (di) n i=1 with maximum degree dmax = O(m 1/4−τ), our algorithm generates almost uniform random graphs with that degree sequence ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Abstract. We present a nearlylinear time algorithm for counting and randomly generating simple graphs with a given degree sequence in a certain range. For degree sequence (di) n i=1 with maximum degree dmax = O(m 1/4−τ), our algorithm generates almost uniform random graphs with that degree sequence in time O(m dmax) where m = 1 2 i di is the number of edges in the graph and τ is any positive constant. The fastest known algorithm for uniform generation of these graphs [35] has a running time of O(m 2 d 2 max). Our method also gives an independent proof of McKay’s estimate [34] for the number of such graphs. We also use sequential importance sampling to derive fully Polynomialtime Randomized Approximation Schemes (FPRAS) for counting and uniformly generating random graphs for the same range of dmax = O(m 1/4−τ). Moreover, we show that for d = O(n 1/2−τ), our algorithm can generate an asymptotically uniform dregular graph. Our results improve the previous bound of d = O(n 1/3−τ) due to Kim and Vu [31] for regular graphs. 1
Negative Examples for Sequential Importance Sampling of Binary Contingency Tables. Submitted. Available from Mathematics arXiv math.ST/0606650
"... Abstract. The sequential importance sampling (SIS) algorithm has gained considerable popularity for its empirical success. One of its noted applications is to the binary contingency tables problem, an important problem in statistics, where the goal is to estimate the number of 0/1 matrices with pres ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Abstract. The sequential importance sampling (SIS) algorithm has gained considerable popularity for its empirical success. One of its noted applications is to the binary contingency tables problem, an important problem in statistics, where the goal is to estimate the number of 0/1 matrices with prescribed row and column sums. We give a family of examples in which the SIS procedure, if run for any subexponential number of trials, will underestimate the number of tables by an exponential factor. This result holds for any of the usual design choices in the SIS algorithm, namely the ordering of the columns and rows. These are apparently the first theoretical results on the efficiency of the SIS algorithm for binary contingency tables. Finally, we present experimental evidence that the SIS algorithm is efficient for row and column sums that are regular. Our work is a first step in determining rigorously the class of inputs for which SIS is effective. 1
Efficient Importance Sampling for Binary Contingency Tables”, preprint
, 2007
"... Importance sampling has been reported to produce algorithms with excellent empirical performance in counting problems. However, the theoretical support for its efficiency in these applications has been very limited. In this paper, we propose a methodology that can be used to design efficient importa ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Importance sampling has been reported to produce algorithms with excellent empirical performance in counting problems. However, the theoretical support for its efficiency in these applications has been very limited. In this paper, we propose a methodology that can be used to design efficient importance sampling algorithms for counting and test their efficiency rigorously. We apply our techniques after transforming the problem into a rareevent simulation problem— thereby connecting complexity analysis of counting problems with efficiency in the context of rareevent simulation. As an illustration of our approach, we consider the problem of counting the number of binary tables with fixed column and row sums, cj’s and ri’s, respectively, and total marginal sums d = ∑ cj. Assuming that maxj cj = o(d 1/2), ∑ c 2 j = O(d) and the rj’s are bounded, we show that a suitable importance sampling algorithm, proposed by Chen et al. [J. Amer. Statist. Assoc. 100 (2005) 109–120], requires O(d 3 ε −2 δ −1) operations to produce an estimate that has εrelative error with probability 1 − δ. In addition, if maxj cj = o(d 1/4−δ0) for some δ0> 0, the same coverage can be guaranteed with O(d 3 ε −2 log(δ −1)) operations. 1. Introduction. We
A Constructing and Sampling Graphs with a Prescribed Joint Degree Distribution
"... One of the most influential recent results in network analysis is that many natural networks exhibit a powerlaw or lognormal degree distribution. This has inspired numerous generative models that match this property. However, more recent work has shown that while these generative models do have th ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
One of the most influential recent results in network analysis is that many natural networks exhibit a powerlaw or lognormal degree distribution. This has inspired numerous generative models that match this property. However, more recent work has shown that while these generative models do have the right degree distribution, they are not good models for real life networks due to their differences on other important metrics like conductance. We believe this is, in part, because many of these realworld networks have very different joint degree distributions, i.e. the probability that a randomly selected edge will be between nodes of degree k and l. Assortativity is a sufficient statistic of the joint degree distribution, and it has been previously noted that social networks tend to be assortative, while biological and technological networks tend to be disassortative. We suggest understanding the relationship between network structure and the joint degree distribution of graphs is an interesting avenue of further research. An important tool for such studies are algorithms that can generate random instances of graphs with the same joint degree distribution. This is the main topic of this paper and we study the problem from both a theoretical and practical perspective. We provide an algorithm for constructing simple graphs from a given joint degree distribution, and a Monte Carlo Markov Chain method for sampling them. We also show that the state space of simple graphs with a fixed degree distribution is connected via end point switches. We empirically evaluate the mixing time of this Markov Chain by using experiments based on the autocorrelation of each edge. These experiments show that our Markov Chain mixes quickly on these real graphs, allowing for utilization of our techniques in practice.
Generating Random Graphs with Large Girth
"... We present a simple and efficient algorithm for randomly generating simple graphs without small cycles. These graphs can be used to design high performance LowDensity ParityCheck (LDPC) codes. For any constant k, α ≤ 1/2k(k + 3) and m = O(n 1+α), our algorithm generates an asymptotically uniform r ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
We present a simple and efficient algorithm for randomly generating simple graphs without small cycles. These graphs can be used to design high performance LowDensity ParityCheck (LDPC) codes. For any constant k, α ≤ 1/2k(k + 3) and m = O(n 1+α), our algorithm generates an asymptotically uniform random graph with n vertices, m edges, and girth larger than k in polynomial time. To the best of our knowledge this is the first polynomialalgorithm for the problem. Our algorithm generates a graph by sequentially adding m edges to an empty graph with n vertices. Recently, these types of sequential methods for counting and random generation have been very successful [35, 18, 11, 7, 5, 6]. 1
On the Synthetic Generation of Semantic Web Schemas
 in Procs of the Joint ODBIS & SWDB Workshop on Semantic Web, Ontologies, Databases. Colocated with VLDB2007
, 2007
"... Abstract. In order to cope with the expected size of the Semantic Web (SW) in the coming years, we need to benchmark existing SW tools (e.g., query language interpreters) in a credible manner. In this paper we present the first RDFS schema generator, termed PoweRGen, which takes into account the mor ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract. In order to cope with the expected size of the Semantic Web (SW) in the coming years, we need to benchmark existing SW tools (e.g., query language interpreters) in a credible manner. In this paper we present the first RDFS schema generator, termed PoweRGen, which takes into account the morphological features that schemas frequently exhibit in reality. In particular, we are interested in generating synthetically the two core components of an RDFS schema, namely the property (relationships between classes or attributes) and the subsumption (subsumption relationships among classes) graph. The totaldegree distribution of the former, as well as the outdegree distribution of the Transitive Closure (TC) of the latter, usually follow a powerlaw. PoweRGen produces synthetic property and subsumption graphs whose distributions respect the powerlaw exponents given as input with a confidence ranging between 90 − 98%. 1
Sampling Graphs with a Prescribed Joint Degree Distribution Using Markov Chains
"... One of the most influential results in network analysis is that many natural networks exhibit a powerlaw or lognormal degree distribution. This has inspired numerous generative models that match this property. However, more recent work has shown that while these generative models do have the right ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
One of the most influential results in network analysis is that many natural networks exhibit a powerlaw or lognormal degree distribution. This has inspired numerous generative models that match this property. However, more recent work has shown that while these generative models do have the right degree distribution, they are not good models for real life networks due to their differences on other important metrics like conductance. We believe this is, in part, because many of these realworld networks have very different joint degree distributions, i.e. the probability that a randomly selected edge will be between nodes of degree k and l. Assortativity is a sufficient statistic of the joint degree distribution, and it has been previously noted that social networks tend to be assortative, while biological and technological networks tend to be disassortative. We suggest that the joint degree distribution of graphs is an interesting avenue of study for further research into network structure. We provide a simple greedy algorithm for constructing simple graphs from a given joint degree distribution, and a Monte Carlo Markov Chain method for sampling them. We also show that the state space of simple graphs with a fixed degree distribution is connected via endpoint switches. We empirically evaluate the mixing time of this Markov Chain by using experiments based on the autocorrelation of each edge.
Maximum Likelihood Estimation in Network Models
"... We study maximum likelihood estimation for the statistical model for both directed and undirected random graph models in which the degree sequences are minimal sufficient statistics. In the undirected case, the model is known as the beta model. We derive necessary and sufficient conditions for the e ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We study maximum likelihood estimation for the statistical model for both directed and undirected random graph models in which the degree sequences are minimal sufficient statistics. In the undirected case, the model is known as the beta model. We derive necessary and sufficient conditions for the existence of the MLE that are based on the polytope of degree sequences, and wecharacterize in a combinatorial fashion sample points leading to a nonexistent MLE, and nonestimability of the probability parameters under a nonexistent MLE. We formulate conditions that guarantee that the MLE exists with probability tending to one as the number nodes increases. By reparametrizing the beta model as a loglinear model under product multinomial sampling scheme, we are able to provide usable algorithms for detecting nonexistence of the MLE and for identifying nonestimable parameters. We illustrate our approach on other random graph models for networks, such as the Rasch model, the BradleyTerry model and the more general p1 model of Holland and Leinhardt (1981).