Results 1  10
of
53
Resisting Structural Reidentification in Anonymized Social Networks
, 2008
"... We identify privacy risks associated with releasing network data sets and provide an algorithm that mitigates those risks. A network consists of entities connected by links representing relations such as friendship, communication, or shared activity. Maintaining privacy when publishing networked dat ..."
Abstract

Cited by 105 (6 self)
 Add to MetaCart
We identify privacy risks associated with releasing network data sets and provide an algorithm that mitigates those risks. A network consists of entities connected by links representing relations such as friendship, communication, or shared activity. Maintaining privacy when publishing networked data is uniquely challenging because an individual’s network context can be used to identify them even if other identifying information is removed. In this paper, we quantify the privacy risks associated with three classes of attacks on the privacy of individuals in networks, based on the knowledge used by the adversary. We show that the risks of these attacks vary greatly based on network structure and size. We propose a novel approach to anonymizing network data that models aggregate network structure and then allows samples to be drawn from that model. The approach guarantees anonymity for network entities while preserving the ability to estimate a wide variety of network measures with relatively little bias.
Differentially Private Aggregation of Distributed TimeSeries with Transformation and Encryption
"... We propose the first differentially private aggregation algorithm for distributed timeseries data that offers good practical utility without any trusted server. This addresses two important challenges in participatory datamining applications where (i) individual users wish to publish temporally co ..."
Abstract

Cited by 88 (3 self)
 Add to MetaCart
We propose the first differentially private aggregation algorithm for distributed timeseries data that offers good practical utility without any trusted server. This addresses two important challenges in participatory datamining applications where (i) individual users wish to publish temporally correlated timeseries data (such as location traces, web history, personal health data), and (ii) an untrusted thirdparty aggregator wishes to run aggregate queries on the data. To ensure differential privacy for timeseries data despite the presence of temporal correlation, we propose the Fourier Perturbation Algorithm (FPAk). Standard differential privacy techniques perform poorly for timeseries data. To answer n queries, such techniques can result in a noise of Θ(n) to each query answer, making the answers practically useless if n is large. Our FPAk algorithm perturbs the Discrete Fourier Transform of the query answers. For answering n queries, FPAk improves the expected error from Θ(n) to roughly Θ(k) where k is the number of Fourier coefficients that can (approximately) reconstruct all the n query answers. Our experiments show that k ≪ n for many reallife datasets resulting in a huge errorimprovement for FPAk. To deal with the absence of a trusted central server, we propose the Distributed Laplace Perturbation Algorithm (DLPA) to add noise in a distributed way in order to guarantee differential privacy. To the best of our knowledge, DLPA is the first distributed differentially private algorithm that can scale with a large number of users: DLPA outperforms the only other distributed solution for differential privacy proposed so far, by reducing the computational load per user from O(U) to O(1) where U is the number of users. 1
No Free Lunch in Data Privacy
"... Differential privacy is a powerful tool for providing privacypreserving noisy query answers over statistical databases. It guarantees that the distribution of noisy query answers changes very little with the addition or deletion of any tuple. It is frequently accompanied by popularized claims that i ..."
Abstract

Cited by 78 (6 self)
 Add to MetaCart
(Show Context)
Differential privacy is a powerful tool for providing privacypreserving noisy query answers over statistical databases. It guarantees that the distribution of noisy query answers changes very little with the addition or deletion of any tuple. It is frequently accompanied by popularized claims that it provides privacy without any assumptions about the data and that it protects against attackers who know all but one record. In this paper we critically analyze the privacy protections offered by differential privacy. First, we use a nofreelunch theorem, which defines nonprivacy as a game, to argue that it is not possible to provide privacy and utility without making assumptions about how the data are generated. Then we explain where assumptions are needed. We argue that privacy of an individual is preserved when it is possible to limit the inference of an attacker about the participation of the individual in the data generating process. This is different from limiting the inference about the presence of a tuple (for example, Bob’s participation in a social network may cause edges to form between pairs of his friends, so that it affects more than just the tuple labeled as “Bob”). The definition of evidence of participation, in turn, depends on how the data are generated – this is how assumptions enter the picture. We explain these ideas using examples from social network research as well as tabular data for which deterministic statistics have been previously released. In both cases the notion of participation varies, the use of differential privacy can lead to privacy breaches, and differential privacy does not always adequately limit inference about participation.
Accurate Estimation of the Degree Distribution of Private Networks
"... Abstract—We describe an efficient algorithm for releasing a provably private estimate of the degree distribution of a network. The algorithm satisfies a rigorous property of differential privacy, and is also extremely efficient, running on networks of 100 million nodes in a few seconds. Theoretical ..."
Abstract

Cited by 48 (6 self)
 Add to MetaCart
(Show Context)
Abstract—We describe an efficient algorithm for releasing a provably private estimate of the degree distribution of a network. The algorithm satisfies a rigorous property of differential privacy, and is also extremely efficient, running on networks of 100 million nodes in a few seconds. Theoretical analysis shows that the error scales linearly with the number of unique degrees, whereas the error of conventional techniques scales linearly with the number of nodes. We complement the theoretical analysis with a thorough empirical analysis on real and synthetic graphs, showing that the algorithm’s variance and bias is low, that the error diminishes as the size of the input graph increases, and that common analyses like fitting a powerlaw can be carried out very accurately. Keywordsprivacy; social networks; privacypreserving data mining; differential privacy. I.
Differential privacy for statistics: What we know and what we want to learn
, 2009
"... We motivate and review the definition of differential privacy, survey some results on differentially private statistical estimators, and outline a research agenda. This survey is based on two presentations given by the authors at an NCHS/CDC sponsored workshop on data privacy in May 2008. ..."
Abstract

Cited by 44 (1 self)
 Add to MetaCart
(Show Context)
We motivate and review the definition of differential privacy, survey some results on differentially private statistical estimators, and outline a research agenda. This survey is based on two presentations given by the authors at an NCHS/CDC sponsored workshop on data privacy in May 2008.
A Statistical Framework for Differential Privacy
"... One goal of statistical privacy research is to construct a data release mechanism that protects individual privacy while preserving information content. An example is a random mechanism that takes an input database X and outputs a random database Z according to a distribution Qn(·X). Differential p ..."
Abstract

Cited by 39 (4 self)
 Add to MetaCart
One goal of statistical privacy research is to construct a data release mechanism that protects individual privacy while preserving information content. An example is a random mechanism that takes an input database X and outputs a random database Z according to a distribution Qn(·X). Differential privacy is a particular privacy requirement developed by computer scientists in which Qn(·X) is required to be insensitive to changes in one data point in X. This makes it difficult to infer from Z whether a given individual is in the original database X. We consider differential privacy from a statistical perspective. We consider several datarelease mechanisms that satisfy the differential privacy requirement. We show that it is useful to compare these schemes by computing the rate of convergence of distributions and densities constructed from the released data. We study a general privacy method, called the exponential mechanism, introduced by McSherry and Talwar (2007). We show that the accuracy of this method is intimately linked to the rate at which the probability that the empirical distribution concentrates in a small ball around the true distribution.
Towards an axiomatization of statistical privacy and utility
 In PODS
, 2010
"... “Privacy ” and “utility ” are words that frequently appear in the literature on statistical privacy. But what do these words really mean? In recent years, many problems with intuitive notions of privacy and utility have been uncovered. Thus more formal notions of privacy and utility, which are amena ..."
Abstract

Cited by 25 (10 self)
 Add to MetaCart
“Privacy ” and “utility ” are words that frequently appear in the literature on statistical privacy. But what do these words really mean? In recent years, many problems with intuitive notions of privacy and utility have been uncovered. Thus more formal notions of privacy and utility, which are amenable to mathematical analysis, are needed. In this paper we present our initial work on an axiomatization of privacy and utility. In particular, we study how these concepts are affected by randomized algorithms. Our analysis yields new insights into the construction of both privacy definitions and mechanisms that generate data according to such definitions. In particular, it characterizes a class of relaxations of differential privacy and shows that desirable outputs of a differentially private mechanism are best interpreted as certain graphs rather than query answers or synthetic data.
Private analysis of graph structure
 In VLDB
, 2011
"... We present efficient algorithms for releasing useful statistics about graph data while providing rigorous privacy guarantees. Our algorithms work on data sets that consist of relationships between individuals, such as social ties or email communication. The algorithms satisfy edge differential priva ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
(Show Context)
We present efficient algorithms for releasing useful statistics about graph data while providing rigorous privacy guarantees. Our algorithms work on data sets that consist of relationships between individuals, such as social ties or email communication. The algorithms satisfy edge differential privacy, which essentially requires that the presence or absence of any particular relationship be hidden. Our algorithms output approximate answers to subgraph counting queries. Given a query graph H, e.g., a triangle, kstar or ktriangle, the goal is to return the number of edgeinduced isomorphic copies of H in the input graph. The special case of triangles was considered by Nissim, Raskhodnikova and Smith (STOC 2007), and a more general investigation of arbitrary query graphs was initiated by Rastogi, Hay, Miklau and Suciu (PODS 2009). We extend the approach of [NRS] to a new class of statistics, namely, kstar queries. We also give algorithms for ktriangle queries using a different approach, based on the higherorder local sensitivity. For the specific graph statistics we consider (i.e., kstars and ktriangles), we significantly improve on the work of [RHMS]: our algorithms satisfy a stronger notion of privacy, which does not rely on the adversary having a particular prior distribution on the data, and add less noise to the answers before releasing them. We evaluate the accuracy of our algorithms both theoretically and empirically, using a variety of real and synthetic data sets. We give explicit, simple conditions under which these algorithms add a small amount of noise. We also provide the averagecase analysis in the ErdősRényiGilbert G(n, p) random graph model. Finally, we give hardness results indicating that the approach NRS used for triangles cannot easily be extended to ktriangles (and hence justifying our development of a new algorithmic approach). 1.
S.: Provenance views for module privacy
 In: Proceedings of the 30th ACM SIGMODSIGACTSIGART Symposium on Principles of Database Systems
, 2011
"... Scientific workflow systems increasingly store provenance information about the module executions used to produce a data item, as well as the parameter settings and intermediate data items passed between module executions. However, authors/owners of workflows may wish to keep some of this informat ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
(Show Context)
Scientific workflow systems increasingly store provenance information about the module executions used to produce a data item, as well as the parameter settings and intermediate data items passed between module executions. However, authors/owners of workflows may wish to keep some of this information confidential. In particular, a module may be proprietary, and users should not be able to infer its behavior by seeing mappings between all data inputs and outputs. The problem we address in this paper is the following: Given a workflow, abstractly modeled by a relation R, a privacy requirement Γ and costs associated with data. The owner of the workflow decides which data (attributes) to hide, and provides the user with a view R ′ which is the projection of R over attributes which have not been hidden. The goal is to minimize the cost of hidden data while guaranteeing that individual modules are Γprivate. We call this the SecureView problem. We formally define the problem, study its complexity, and offer algorithmic solutions.
On Provenance and Privacy
"... Provenance in scientific workflows is a doubleedged sword. On the one hand, recording information about the module executions used to produce a data item, as well as the parameter settings and intermediate data items passed between module executions, enables transparency and reproducibilityof resul ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
(Show Context)
Provenance in scientific workflows is a doubleedged sword. On the one hand, recording information about the module executions used to produce a data item, as well as the parameter settings and intermediate data items passed between module executions, enables transparency and reproducibilityof results. Ontheotherhand, a scientificworkflow often contains private or confidential data and uses proprietary modules. Hence, providing exact answers to provenance queries over all executions of the workflow may reveal private information. In this paper we discuss privacy concerns in scientific workflows – data, module, and structural privacyandframeseveral naturalquestions: (i)Canweformally analyze data, module, and structural privacy, giving provable privacy guarantees for an unlimited/boundednumber of provenance queries? (ii) How can we answer search and structural queries over repositories of workflow specifications and their executions, providing as much information as possible to the user while still guaranteeing privacy? We then highlight some recent work in this area and point to several directions for future work. Categories and Subject Descriptors H.2.0 [Database Management]: General—Security, integrity