Results 1  10
of
100
A learning theory approach to noninteractive database privacy
 In Proceedings of the 40th annual ACM symposium on Theory of computing
, 2008
"... In this paper we demonstrate that, ignoring computational constraints, it is possible to release synthetic databases that are useful for accurately answering large classes of queries while preserving differential privacy. Specifically, we give a mechanism that privately releases synthetic data usefu ..."
Abstract

Cited by 121 (13 self)
 Add to MetaCart
In this paper we demonstrate that, ignoring computational constraints, it is possible to release synthetic databases that are useful for accurately answering large classes of queries while preserving differential privacy. Specifically, we give a mechanism that privately releases synthetic data useful for answering a class of queries over a discrete domain with error that grows as a function of the size of the smallest net approximately representing the answers to that class of queries. We show that this in particular implies a mechanism for counting queries that gives error guarantees that grow only with the VCdimension of the class of queries, which itself grows at most logarithmically with the size of the query class. We also show that it is not possible to release even simple classes of queries (such as intervals and their generalizations) over continuous domains with worstcase utility guarantees while preserving differential privacy. In response to this, we consider a relaxation of the utility guarantee and give a privacy preserving polynomial time algorithm that for any halfspace query will provide an answer that is accurate for some small perturbation of the query. This algorithm does not release synthetic data, but instead another data structure capable of representing an answer for each query. We also give an efficient algorithm for releasing synthetic data for the class of interval queries and axisaligned rectangles of constant dimension over discrete domains. 1.
Differential privacy: A survey of results
 In Theory and Applications of Models of Computation
, 2008
"... Abstract. Over the past five years a new approach to privacypreserving ..."
Abstract

Cited by 120 (0 self)
 Add to MetaCart
Abstract. Over the past five years a new approach to privacypreserving
What Can We Learn Privately?
 49TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE
, 2008
"... Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large reallife data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or sp ..."
Abstract

Cited by 57 (9 self)
 Add to MetaCart
Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large reallife data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or specific training example? More precisely, we investigate learning algorithms that satisfy differential privacy, a notion that provides strong confidentiality guarantees in the contexts where aggregate information is released about a database containing sensitive information about individuals. We present several basic results that demonstrate general feasibility of private learning and relate several models previously studied separately in the contexts of privacy and standard learning.
A firm foundation for private data analysis
 Commun. ACM
"... In the information realm, loss of privacy is usually associated with failure to control access to information, to control the flow of information, or to control the purposes for which information is employed. Differential privacy arose in a context in which ensuring privacy is a challenge even if al ..."
Abstract

Cited by 56 (3 self)
 Add to MetaCart
In the information realm, loss of privacy is usually associated with failure to control access to information, to control the flow of information, or to control the purposes for which information is employed. Differential privacy arose in a context in which ensuring privacy is a challenge even if all these control problems are solved: privacypreserving statistical analysis of data. The problem of statistical disclosure control – revealing accurate statistics about a set of respondents while preserving the privacy of individuals – has a venerable history, with an extensive literature spanning statistics, theoretical computer science, security, databases, and cryptography (see, for example, the excellent survey [1], the discussion of related work in [2] and the Journal of Official Statistics 9 (2), dedicated to confidentiality and disclosure control). This long history
Universally UtilityMaximizing Privacy Mechanisms
"... A mechanism for releasing information about a statistical database with sensitive data must resolve a tradeoff between utility and privacy. Publishing fully accurate information maximizes utility while minimizing privacy, while publishing random noise accomplishes the opposite. Privacy can be rigor ..."
Abstract

Cited by 49 (1 self)
 Add to MetaCart
A mechanism for releasing information about a statistical database with sensitive data must resolve a tradeoff between utility and privacy. Publishing fully accurate information maximizes utility while minimizing privacy, while publishing random noise accomplishes the opposite. Privacy can be rigorously quantified using the framework of differential privacy, which requires that a mechanism’s output distribution is nearly the same whether or not a given database row is included or excluded. The goal of this paper is strong and general utility guarantees, subject to differential privacy. We pursue mechanisms that guarantee nearoptimal utility to every potential user, independent of its side information (modeled as a prior distribution over query results) and preferences (modeled via a loss function). Our main result is: for each fixed count query and differential privacy level, there is a geometric mechanism M ∗ — a discrete variant of the simple and wellstudied Laplace mechanism — that is simultaneously expected lossminimizing for every possible user, subject to the differential privacy constraint. This is an extremely strong utility guarantee: every potential user u, no matter what its side information and preferences, derives as much utility from M ∗ as from interacting with a differentially private mechanism Mu that is optimally tailored to u. More precisely, for every user u there is an optimal mecha
Differentially Private Aggregation of Distributed TimeSeries with Transformation and Encryption
"... We propose the first differentially private aggregation algorithm for distributed timeseries data that offers good practical utility without any trusted server. This addresses two important challenges in participatory datamining applications where (i) individual users wish to publish temporally co ..."
Abstract

Cited by 41 (2 self)
 Add to MetaCart
We propose the first differentially private aggregation algorithm for distributed timeseries data that offers good practical utility without any trusted server. This addresses two important challenges in participatory datamining applications where (i) individual users wish to publish temporally correlated timeseries data (such as location traces, web history, personal health data), and (ii) an untrusted thirdparty aggregator wishes to run aggregate queries on the data. To ensure differential privacy for timeseries data despite the presence of temporal correlation, we propose the Fourier Perturbation Algorithm (FPAk). Standard differential privacy techniques perform poorly for timeseries data. To answer n queries, such techniques can result in a noise of Θ(n) to each query answer, making the answers practically useless if n is large. Our FPAk algorithm perturbs the Discrete Fourier Transform of the query answers. For answering n queries, FPAk improves the expected error from Θ(n) to roughly Θ(k) where k is the number of Fourier coefficients that can (approximately) reconstruct all the n query answers. Our experiments show that k ≪ n for many reallife datasets resulting in a huge errorimprovement for FPAk. To deal with the absence of a trusted central server, we propose the Distributed Laplace Perturbation Algorithm (DLPA) to add noise in a distributed way in order to guarantee differential privacy. To the best of our knowledge, DLPA is the first distributed differentially private algorithm that can scale with a large number of users: DLPA outperforms the only other distributed solution for differential privacy proposed so far, by reducing the computational load per user from O(U) to O(1) where U is the number of users. 1
Privacypreserving logistic regression
"... This paper addresses the important tradeoff between privacy and learnability, when designing algorithms for learning from private databases. We focus on privacypreserving logistic regression. First we apply an idea of Dwork et al. [7] to design a privacypreserving logistic regression algorithm. Th ..."
Abstract

Cited by 35 (2 self)
 Add to MetaCart
This paper addresses the important tradeoff between privacy and learnability, when designing algorithms for learning from private databases. We focus on privacypreserving logistic regression. First we apply an idea of Dwork et al. [7] to design a privacypreserving logistic regression algorithm. This involves bounding the sensitivity of regularized logistic regression, and perturbing the learned classifier with noise proportional to the sensitivity. We show that for certain data distributions, this algorithm has poor learning generalization, compared with standard regularized logistic regression. We then provide a privacypreserving regularized logistic regression algorithm based on a new privacypreserving technique: solving a perturbed optimization problem. We prove that our algorithm preserves privacy in the model due to [7], and we provide learning guarantees. We show that our algorithm performs almost as well as standard regularized logistic regression, in terms of generalization error. Experiments demonstrate improved learning performance of our method, versus the sensitivity method. Our privacypreserving technique does not depend on the sensitivity of the function, and extends easily to a class of convex loss functions. Our work also reveals an interesting connection between regularization and privacy. 1
On the complexity of differentially private data release: efficient algorithms and hardness results
 In STOC
, 2009
"... ..."
Airavat: Security and Privacy for MapReduce
, 2009
"... The cloud computing paradigm, which involves distributed computation on multiple largescale datasets, will become successful only if it ensures privacy, confidentiality, and integrity for the data belonging to individuals and organizations. We present Airavat, a novel integration of decentralized i ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
The cloud computing paradigm, which involves distributed computation on multiple largescale datasets, will become successful only if it ensures privacy, confidentiality, and integrity for the data belonging to individuals and organizations. We present Airavat, a novel integration of decentralized information flow control (DIFC) and differential privacy that provides strong security and privacy guarantees for MapReduce computations. Airavat allows users to use arbitrary mappers, prevents unauthorized leakage of sensitive data during the computation, and supports automatic declassification of the results when the latter do not violate individual privacy. Airavat minimizes the amount of trusted code in the system and allows users without security expertise to perform privacypreserving computations on sensitive data. Our prototype implementation demonstrates the flexibility of Airavat on a wide variety of case studies. The prototype is efficient, with runtimes on Amazon’s cloud computing infrastructure within 25 % of a MapReduce system with no security.
New efficient attacks on statistical disclosure control mechanisms
 In CRYPTO
, 2008
"... Abstract. The goal of a statistical database is to provide statistics about a population while simultaneously protecting the privacy of the individual records in the database. The tension between privacy and usability of statistical databases has attracted much attention in statistics, theoretical c ..."
Abstract

Cited by 27 (6 self)
 Add to MetaCart
Abstract. The goal of a statistical database is to provide statistics about a population while simultaneously protecting the privacy of the individual records in the database. The tension between privacy and usability of statistical databases has attracted much attention in statistics, theoretical computer science, security, and database communities in recent years. A line of research initiated by Dinur and Nissim investigates for a particular type of queries, lower bounds on the distortion needed in order to prevent gross violations of privacy. The first result in the current paper simplifies and sharpens the Dinur and Nissim result. The DinurNissim style results are strong because they demonstrate insecurity of all lowdistortion privacy mechanisms. The attacks have an allornothing flavor: letting n denote the size of the database, Ω(n) queries are made before anything is learned, at which point Θ(n) secret bits are revealed. Restricting attention to a wide and realistic subset of possible lowdistortion mechanisms, our second result is a more acute attack, requiring only a fixed number of queries for each bit revealed. 1