Results 1  10
of
197
Differential privacy: A survey of results
 In Theory and Applications of Models of Computation
, 2008
"... Abstract. Over the past five years a new approach to privacypreserving ..."
Abstract

Cited by 249 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Over the past five years a new approach to privacypreserving
A learning theory approach to noninteractive database privacy
 In Proceedings of the 40th annual ACM symposium on Theory of computing
, 2008
"... In this paper we demonstrate that, ignoring computational constraints, it is possible to release synthetic databases that are useful for accurately answering large classes of queries while preserving differential privacy. Specifically, we give a mechanism that privately releases synthetic data usefu ..."
Abstract

Cited by 222 (25 self)
 Add to MetaCart
(Show Context)
In this paper we demonstrate that, ignoring computational constraints, it is possible to release synthetic databases that are useful for accurately answering large classes of queries while preserving differential privacy. Specifically, we give a mechanism that privately releases synthetic data useful for answering a class of queries over a discrete domain with error that grows as a function of the size of the smallest net approximately representing the answers to that class of queries. We show that this in particular implies a mechanism for counting queries that gives error guarantees that grow only with the VCdimension of the class of queries, which itself grows at most logarithmically with the size of the query class. We also show that it is not possible to release even simple classes of queries (such as intervals and their generalizations) over continuous domains with worstcase utility guarantees while preserving differential privacy. In response to this, we consider a relaxation of the utility guarantee and give a privacy preserving polynomial time algorithm that for any halfspace query will provide an answer that is accurate for some small perturbation of the query. This algorithm does not release synthetic data, but instead another data structure capable of representing an answer for each query. We also give an efficient algorithm for releasing synthetic data for the class of interval queries and axisaligned rectangles of constant dimension over discrete domains. 1.
A firm foundation for private data analysis
 Commun. ACM
"... In the information realm, loss of privacy is usually associated with failure to control access to information, to control the flow of information, or to control the purposes for which information is employed. Differential privacy arose in a context in which ensuring privacy is a challenge even if al ..."
Abstract

Cited by 134 (3 self)
 Add to MetaCart
(Show Context)
In the information realm, loss of privacy is usually associated with failure to control access to information, to control the flow of information, or to control the purposes for which information is employed. Differential privacy arose in a context in which ensuring privacy is a challenge even if all these control problems are solved: privacypreserving statistical analysis of data. The problem of statistical disclosure control – revealing accurate statistics about a set of respondents while preserving the privacy of individuals – has a venerable history, with an extensive literature spanning statistics, theoretical computer science, security, databases, and cryptography (see, for example, the excellent survey [1], the discussion of related work in [2] and the Journal of Official Statistics 9 (2), dedicated to confidentiality and disclosure control). This long history
What Can We Learn Privately?
 49TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE
, 2008
"... Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large reallife data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or sp ..."
Abstract

Cited by 99 (10 self)
 Add to MetaCart
(Show Context)
Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large reallife data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or specific training example? More precisely, we investigate learning algorithms that satisfy differential privacy, a notion that provides strong confidentiality guarantees in the contexts where aggregate information is released about a database containing sensitive information about individuals. We present several basic results that demonstrate general feasibility of private learning and relate several models previously studied separately in the contexts of privacy and standard learning.
Universally UtilityMaximizing Privacy Mechanisms
"... A mechanism for releasing information about a statistical database with sensitive data must resolve a tradeoff between utility and privacy. Publishing fully accurate information maximizes utility while minimizing privacy, while publishing random noise accomplishes the opposite. Privacy can be rigor ..."
Abstract

Cited by 94 (2 self)
 Add to MetaCart
(Show Context)
A mechanism for releasing information about a statistical database with sensitive data must resolve a tradeoff between utility and privacy. Publishing fully accurate information maximizes utility while minimizing privacy, while publishing random noise accomplishes the opposite. Privacy can be rigorously quantified using the framework of differential privacy, which requires that a mechanism’s output distribution is nearly the same whether or not a given database row is included or excluded. The goal of this paper is strong and general utility guarantees, subject to differential privacy. We pursue mechanisms that guarantee nearoptimal utility to every potential user, independent of its side information (modeled as a prior distribution over query results) and preferences (modeled via a loss function). Our main result is: for each fixed count query and differential privacy level, there is a geometric mechanism M ∗ — a discrete variant of the simple and wellstudied Laplace mechanism — that is simultaneously expected lossminimizing for every possible user, subject to the differential privacy constraint. This is an extremely strong utility guarantee: every potential user u, no matter what its side information and preferences, derives as much utility from M ∗ as from interacting with a differentially private mechanism Mu that is optimally tailored to u. More precisely, for every user u there is an optimal mecha
Differential privacy and robust statistics
 Proceedings of the 41st Annual ACM Symposium on Theory of Computing (STOC
"... In the past several years a promising approach to private data analysis has emerged, based on the notion of differential privacy. Informally, this ensures that any outcome of an analysis is “roughly as likely ” to occur independent of whether any individual opts in to, or to opts out of, the databas ..."
Abstract

Cited by 93 (3 self)
 Add to MetaCart
(Show Context)
In the past several years a promising approach to private data analysis has emerged, based on the notion of differential privacy. Informally, this ensures that any outcome of an analysis is “roughly as likely ” to occur independent of whether any individual opts in to, or to opts out of, the database. In consequence, the specific data of any one individual can never greatly affect the outcome of the analysis. General techniques for ensuring differential privacy have now been proposed, and many datamining tasks can be carried out in a differentially private fashion, frequently with very accurate results. In this work we explore privacypreserving parameter estimation. Privacypreserving statistics would appear to be connected to robust statistics, the subfield of statistics that attempts to cope with several small errors due, for example, to rounding errors in measurements, as well as a few arbitrarily wild errors occurring, say, from data entry failures. In consequence, in a robust analysis the specific data for any one individual should not greatly affect the outcome of the analysis. It would therefore seem that robust statistical estimators, or procedures, might form a starting point for designing accurate differentially private statistical estimators, and the approach based on influence functions is particularly suggestive of differential privacy. We report here on a successful attempt to instantiate this intuition. We obtain differentially private algorithms for estimating the data scale, median, αtrimmed mean, and linear regression coefficients. Our algorithms always ensure privacy. Under mild statistical assumptions they produce highly accurate outputs, with distortion vanishing in the size of the dataset; however, when the statistical assumptions fail the algorithm may halt with an output “No Reply.” Our algorithms follow a new paradigm for differentially private mechanisms, which we call ProposeTestRelease (PTR). We give general composition theorems for PTR mechanisms.
Differentially Private Aggregation of Distributed TimeSeries with Transformation and Encryption
"... We propose the first differentially private aggregation algorithm for distributed timeseries data that offers good practical utility without any trusted server. This addresses two important challenges in participatory datamining applications where (i) individual users wish to publish temporally co ..."
Abstract

Cited by 87 (3 self)
 Add to MetaCart
We propose the first differentially private aggregation algorithm for distributed timeseries data that offers good practical utility without any trusted server. This addresses two important challenges in participatory datamining applications where (i) individual users wish to publish temporally correlated timeseries data (such as location traces, web history, personal health data), and (ii) an untrusted thirdparty aggregator wishes to run aggregate queries on the data. To ensure differential privacy for timeseries data despite the presence of temporal correlation, we propose the Fourier Perturbation Algorithm (FPAk). Standard differential privacy techniques perform poorly for timeseries data. To answer n queries, such techniques can result in a noise of Θ(n) to each query answer, making the answers practically useless if n is large. Our FPAk algorithm perturbs the Discrete Fourier Transform of the query answers. For answering n queries, FPAk improves the expected error from Θ(n) to roughly Θ(k) where k is the number of Fourier coefficients that can (approximately) reconstruct all the n query answers. Our experiments show that k ≪ n for many reallife datasets resulting in a huge errorimprovement for FPAk. To deal with the absence of a trusted central server, we propose the Distributed Laplace Perturbation Algorithm (DLPA) to add noise in a distributed way in order to guarantee differential privacy. To the best of our knowledge, DLPA is the first distributed differentially private algorithm that can scale with a large number of users: DLPA outperforms the only other distributed solution for differential privacy proposed so far, by reducing the computational load per user from O(U) to O(1) where U is the number of users. 1
On the complexity of differentially private data release: efficient algorithms and hardness results
 In STOC
, 2009
"... ..."
(Show Context)
Airavat: Security and Privacy for MapReduce
, 2009
"... The cloud computing paradigm, which involves distributed computation on multiple largescale datasets, will become successful only if it ensures privacy, confidentiality, and integrity for the data belonging to individuals and organizations. We present Airavat, a novel integration of decentralized i ..."
Abstract

Cited by 76 (4 self)
 Add to MetaCart
(Show Context)
The cloud computing paradigm, which involves distributed computation on multiple largescale datasets, will become successful only if it ensures privacy, confidentiality, and integrity for the data belonging to individuals and organizations. We present Airavat, a novel integration of decentralized information flow control (DIFC) and differential privacy that provides strong security and privacy guarantees for MapReduce computations. Airavat allows users to use arbitrary mappers, prevents unauthorized leakage of sensitive data during the computation, and supports automatic declassification of the results when the latter do not violate individual privacy. Airavat minimizes the amount of trusted code in the system and allows users without security expertise to perform privacypreserving computations on sensitive data. Our prototype implementation demonstrates the flexibility of Airavat on a wide variety of case studies. The prototype is efficient, with runtimes on Amazon’s cloud computing infrastructure within 25 % of a MapReduce system with no security.
No Free Lunch in Data Privacy
"... Differential privacy is a powerful tool for providing privacypreserving noisy query answers over statistical databases. It guarantees that the distribution of noisy query answers changes very little with the addition or deletion of any tuple. It is frequently accompanied by popularized claims that i ..."
Abstract

Cited by 73 (5 self)
 Add to MetaCart
(Show Context)
Differential privacy is a powerful tool for providing privacypreserving noisy query answers over statistical databases. It guarantees that the distribution of noisy query answers changes very little with the addition or deletion of any tuple. It is frequently accompanied by popularized claims that it provides privacy without any assumptions about the data and that it protects against attackers who know all but one record. In this paper we critically analyze the privacy protections offered by differential privacy. First, we use a nofreelunch theorem, which defines nonprivacy as a game, to argue that it is not possible to provide privacy and utility without making assumptions about how the data are generated. Then we explain where assumptions are needed. We argue that privacy of an individual is preserved when it is possible to limit the inference of an attacker about the participation of the individual in the data generating process. This is different from limiting the inference about the presence of a tuple (for example, Bob’s participation in a social network may cause edges to form between pairs of his friends, so that it affects more than just the tuple labeled as “Bob”). The definition of evidence of participation, in turn, depends on how the data are generated – this is how assumptions enter the picture. We explain these ideas using examples from social network research as well as tabular data for which deterministic statistics have been previously released. In both cases the notion of participation varies, the use of differential privacy can lead to privacy breaches, and differential privacy does not always adequately limit inference about participation.