Results 1  10
of
48
Resisting Structural Reidentification in Anonymized Social Networks
, 2008
"... We identify privacy risks associated with releasing network data sets and provide an algorithm that mitigates those risks. A network consists of entities connected by links representing relations such as friendship, communication, or shared activity. Maintaining privacy when publishing networked dat ..."
Abstract

Cited by 105 (6 self)
 Add to MetaCart
We identify privacy risks associated with releasing network data sets and provide an algorithm that mitigates those risks. A network consists of entities connected by links representing relations such as friendship, communication, or shared activity. Maintaining privacy when publishing networked data is uniquely challenging because an individual’s network context can be used to identify them even if other identifying information is removed. In this paper, we quantify the privacy risks associated with three classes of attacks on the privacy of individuals in networks, based on the knowledge used by the adversary. We show that the risks of these attacks vary greatly based on network structure and size. We propose a novel approach to anonymizing network data that models aggregate network structure and then allows samples to be drawn from that model. The approach guarantees anonymity for network entities while preserving the ability to estimate a wide variety of network measures with relatively little bias.
No Free Lunch in Data Privacy
"... Differential privacy is a powerful tool for providing privacypreserving noisy query answers over statistical databases. It guarantees that the distribution of noisy query answers changes very little with the addition or deletion of any tuple. It is frequently accompanied by popularized claims that i ..."
Abstract

Cited by 78 (6 self)
 Add to MetaCart
(Show Context)
Differential privacy is a powerful tool for providing privacypreserving noisy query answers over statistical databases. It guarantees that the distribution of noisy query answers changes very little with the addition or deletion of any tuple. It is frequently accompanied by popularized claims that it provides privacy without any assumptions about the data and that it protects against attackers who know all but one record. In this paper we critically analyze the privacy protections offered by differential privacy. First, we use a nofreelunch theorem, which defines nonprivacy as a game, to argue that it is not possible to provide privacy and utility without making assumptions about how the data are generated. Then we explain where assumptions are needed. We argue that privacy of an individual is preserved when it is possible to limit the inference of an attacker about the participation of the individual in the data generating process. This is different from limiting the inference about the presence of a tuple (for example, Bob’s participation in a social network may cause edges to form between pairs of his friends, so that it affects more than just the tuple labeled as “Bob”). The definition of evidence of participation, in turn, depends on how the data are generated – this is how assumptions enter the picture. We explain these ideas using examples from social network research as well as tabular data for which deterministic statistics have been previously released. In both cases the notion of participation varies, the use of differential privacy can lead to privacy breaches, and differential privacy does not always adequately limit inference about participation.
Boosting the accuracy of differentially private histograms through consistency
 Proc. VLDB Endow
, 2010
"... We show that it is possible to significantly improve the accuracy of a general class of histogram queries while satisfying differential privacy. Our approach carefully chooses a set of queries to evaluate, and then exploits consistency constraints that should hold over the noisy output. In a post ..."
Abstract

Cited by 67 (5 self)
 Add to MetaCart
We show that it is possible to significantly improve the accuracy of a general class of histogram queries while satisfying differential privacy. Our approach carefully chooses a set of queries to evaluate, and then exploits consistency constraints that should hold over the noisy output. In a postprocessing phase, we compute the consistent input most likely to have produced the noisy output. The final output is differentiallyprivate and consistent, but in addition, it is often much more accurate. We show, both theoretically and experimentally, that these techniques can be used for estimating the degree sequence of a graph very precisely, and for computing a histogram that can support arbitrary range queries accurately. 1.
Coevolution of social and affiliation networks
 In 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD
, 2009
"... In our work, we address the problem of modeling social network generation which explains both link and group formation. Recent studies on social network evolution propose generative models which capture the statistical properties of realworld networks related only to nodetonode link formation. We ..."
Abstract

Cited by 38 (2 self)
 Add to MetaCart
(Show Context)
In our work, we address the problem of modeling social network generation which explains both link and group formation. Recent studies on social network evolution propose generative models which capture the statistical properties of realworld networks related only to nodetonode link formation. We propose a novel model which captures the coevolution of social and affiliation networks. We provide surprising insights into group formation based on observations in several realworld networks, showing that users often join groups for reasons other than their friends. Our experiments show that the model is able to capture both the newly observed and previously studied network properties. This work is the first to propose a generative model which captures the statistical properties of these complex networks. The proposed model facilitates controlled experiments which study the effect of actors ’ behavior on the network evolution, and it allows the generation of realistic synthetic datasets.
Private analysis of graph structure
 In VLDB
, 2011
"... We present efficient algorithms for releasing useful statistics about graph data while providing rigorous privacy guarantees. Our algorithms work on data sets that consist of relationships between individuals, such as social ties or email communication. The algorithms satisfy edge differential priva ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
(Show Context)
We present efficient algorithms for releasing useful statistics about graph data while providing rigorous privacy guarantees. Our algorithms work on data sets that consist of relationships between individuals, such as social ties or email communication. The algorithms satisfy edge differential privacy, which essentially requires that the presence or absence of any particular relationship be hidden. Our algorithms output approximate answers to subgraph counting queries. Given a query graph H, e.g., a triangle, kstar or ktriangle, the goal is to return the number of edgeinduced isomorphic copies of H in the input graph. The special case of triangles was considered by Nissim, Raskhodnikova and Smith (STOC 2007), and a more general investigation of arbitrary query graphs was initiated by Rastogi, Hay, Miklau and Suciu (PODS 2009). We extend the approach of [NRS] to a new class of statistics, namely, kstar queries. We also give algorithms for ktriangle queries using a different approach, based on the higherorder local sensitivity. For the specific graph statistics we consider (i.e., kstars and ktriangles), we significantly improve on the work of [RHMS]: our algorithms satisfy a stronger notion of privacy, which does not rely on the adversary having a particular prior distribution on the data, and add less noise to the answers before releasing them. We evaluate the accuracy of our algorithms both theoretically and empirically, using a variety of real and synthetic data sets. We give explicit, simple conditions under which these algorithms add a small amount of noise. We also provide the averagecase analysis in the ErdősRényiGilbert G(n, p) random graph model. Finally, we give hardness results indicating that the approach NRS used for triangles cannot easily be extended to ktriangles (and hence justifying our development of a new algorithmic approach). 1.
Personalized Social Recommendations Accurate or Private?
"... With the recent surge of social networks like Facebook, new forms of recommendations have become possible – personalized recommendations of ads, content, and even new friend and product connections based on one’s social interactions. Since recommendations may use sensitive social information, it is ..."
Abstract

Cited by 21 (3 self)
 Add to MetaCart
With the recent surge of social networks like Facebook, new forms of recommendations have become possible – personalized recommendations of ads, content, and even new friend and product connections based on one’s social interactions. Since recommendations may use sensitive social information, it is speculated that these recommendations are associated with privacy risks. The main contribution of this work is in formalizing these expected tradeoffs between the accuracy and privacy of personalized social recommendations. In this paper, we study whether“social recommendations”, or recommendations that are solely based on a user’s social network, can be made without disclosing sensitive links in the social graph. More precisely, we quantify the loss in utility when existing recommendation algorithms are modified to satisfy a strong notion of privacy, called differential privacy. We prove lower bounds on the minimum loss in utility for any recommendation algorithm that is differentially private. We adapt two privacy preserving algorithms from the differential privacy literature to the problem of social recommendations, and analyze their performance in comparison to the lower bounds, both analytically and experimentally. We show that good private social recommendations are feasible only for a small subset of the users in the social network or for a lenient setting of privacy parameters. 1.
Sharing Graphs using Differentially Private Graph Models
"... Continuing success of research on social and computer networks requires open access to realistic measurement datasets. While these datasets can be shared, generally in the form of social or Internet graphs, doing so often risks exposing sensitive user data to the public. Unfortunately, current techn ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
(Show Context)
Continuing success of research on social and computer networks requires open access to realistic measurement datasets. While these datasets can be shared, generally in the form of social or Internet graphs, doing so often risks exposing sensitive user data to the public. Unfortunately, current techniques to improve privacy on graphs only target specific attacks, and have been proven to be vulnerable against powerful deanonymization attacks. Our work seeks a solution to share meaningful graph datasets while preserving privacy. We observe a clear tension between strength of privacy protection and maintaining structural similarity to the original graph. To navigate the tradeoff, we develop a differentiallyprivate graph model we call Pygmalion. Given a graph G and a desired level of ǫdifferential privacy guarantee, Pygmalion extracts
The JohnsonLindenstrauss Transform itself preserves differential privacy
 In IEEE 53rd Annual Symposium on Foundations of Computer Science (FOCS
, 2012
"... ar ..."
(Show Context)
Differentially private histogram publishing through lossy compression
 In ICDM
, 2012
"... Abstract—Differential privacy has emerged as one of the most promising privacy models for private data release. It can be used to release different types of data, and, in particular, histograms, which provide useful summaries of a dataset. Several differentially private histogram releasing schemes h ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Differential privacy has emerged as one of the most promising privacy models for private data release. It can be used to release different types of data, and, in particular, histograms, which provide useful summaries of a dataset. Several differentially private histogram releasing schemes have been proposed recently. However, most of them directly add noise to the histogram counts, resulting in undesirable accuracy. In this paper, we propose two sanitization techniques that exploit the inherent redundancy of reallife datasets in order to boost the accuracy of histograms. They lossily compress the data and sanitize the compressed data. Our first scheme is an optimization of the Fourier Perturbation Algorithm (FPA) presented in [13]. It improves the accuracy of the initial FPA by a factor of 10. The other scheme relies on clustering and exploits the redundancy between bins. Our extensive experimental evaluation over various reallife and synthetic datasets demonstrates that our techniques preserve very accurate distributions and considerably improve the accuracy of range queries over attributed histograms. KeywordsDifferential privacy, histogram, lossy compression, Fourier transform, clustering I.
Analyzing graphs with node differential privacy
 In TCC
, 2013
"... Abstract. We develop algorithms for the private analysis of network data that provide accurate analysis of realistic networks while satisfying stronger privacy guarantees than those of previous work. We present several techniques for designing node differentially private algorithms, that is, algori ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
Abstract. We develop algorithms for the private analysis of network data that provide accurate analysis of realistic networks while satisfying stronger privacy guarantees than those of previous work. We present several techniques for designing node differentially private algorithms, that is, algorithms whose output distribution does not change significantly when a node and all its adjacent edges are added to a graph. We also develop methodology for analyzing the accuracy of such algorithms on realistic networks. The main idea behind our techniques is to "project" (in one of several senses) the input graph onto the set of graphs with maximum degree below a certain threshold. We design projection operators, tailored to specific statistics that have low sensitivity and preserve information about the original statistic. These operators can be viewed as giving a fractional (lowdegree) graph that is a solution to an optimization problem described as a maximum flow instance, linear program, or convex program. In addition, we derive a generic, efficient reduction that allows us to apply any differentially private algorithm for boundeddegree graphs to an arbitrary graph. This reduction is based on analyzing the smooth sensitivity of the "naive" truncation that simply discards nodes of high degree.