Results 1 - 10
of
30
Optimizing linear counting queries under differential privacy
- In PODS ’10: Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems of data
, 2010
"... Differential privacy is a robust privacy standard that has been successfully applied to a range of data analysis tasks. But despite much recent work, optimal strategies for answering a collection of related queries are not known. We propose the matrix mechanism, a new algorithm for answering a workl ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
Differential privacy is a robust privacy standard that has been successfully applied to a range of data analysis tasks. But despite much recent work, optimal strategies for answering a collection of related queries are not known. We propose the matrix mechanism, a new algorithm for answering a workload of predicate counting queries. Given a workload, the mechanism requests answers to a different set of queries, called a query strategy, which are answered using the standard Laplace mechanism. Noisy answers to the workload queries are then derived from the noisy answers to the strategy queries. This two stage process can result in a more complex correlated noise distribution that preserves differential privacy but increases accuracy. We provide a formal analysis of the error of query answers produced by the mechanism and investigate the problem of computing the optimal query strategy in support of a given workload. We show this problem can be formulated as a rank-constrained semidefinite program. Finally, we analyze two seemingly distinct techniques, whose similar behavior is explained by viewing them as instances of the matrix mechanism.
The differential privacy frontier (extended abstract
- In TCC
, 2009
"... Abstract. We review the definition of differential privacy and briefly survey a handful of very recent contributions to the differential privacy frontier. 1 Background Differential privacy is a strong privacy guarantee for an individual’s input to a (randomized) function or sequence of functions, wh ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Abstract. We review the definition of differential privacy and briefly survey a handful of very recent contributions to the differential privacy frontier. 1 Background Differential privacy is a strong privacy guarantee for an individual’s input to a (randomized) function or sequence of functions, which we call a privacy mechanism. Informally, the guarantee says that the behavior of the mechanism is essentially unchanged independent of whether any individual opts into or opts out of the data set. Designed for statistical analysis, for example, of health or census data, the definition protects the privacy of individuals, and small groups of individuals, while permitting very different outcomes in the case of very different data sets. We begin by recalling some differential privacy basics. While the frontier of a vibrant area is always in flux, we will endeavor to give an impression of the state of the art by surveying a handful of extremely recent advances
On the Geometry of Differential Privacy
, 2009
"... We consider the noise complexity of differentially private mechanisms in the setting where the user asks d linear queries f: ℜ n → ℜ non-adaptively. Here, the database is represented by a vector in ℜ n and proximity between databases is measured in the ℓ1-metric. We show that the noise complexity is ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
We consider the noise complexity of differentially private mechanisms in the setting where the user asks d linear queries f: ℜ n → ℜ non-adaptively. Here, the database is represented by a vector in ℜ n and proximity between databases is measured in the ℓ1-metric. We show that the noise complexity is determined by two geometric parameters associated with the set of queries. We use this connection to give tight upper and lower bounds on the noise complexity for any d � n. We show that for d random linear queries of sensitivity 1, it is necessary and sufficient to add ℓ2-error Θ(min{d √ d/ε, d √ log(n/d)/ε}) to achieve ε-differential privacy. Assuming the truth of a deep conjecture from convex geometry, known as the Hyperplane conjecture, we can extend our results to arbitrary linear queries giving nearly matching upper and lower bounds. Our bound translates to error O(min{d/ε, √ d log(n/d)/ε}) per answer. The best previous upper bound (Laplacian mechanism) gives a bound of O(min{d/ε, √ n/ε}) per answer, while the best known lower bound was Ω ( √ d/ε). In contrast, our lower bound is strong enough to separate the concept of differential privacy from the notion of approximate differential privacy where an upper bound of O ( √ d/ε) can be achieved.
Interactive Privacy via the Median Mechanism
- In The 42nd ACM Symposium on the Theory of Computing
, 2010
"... We define a new interactive differentially private mechanism — the median mechanism — for answering arbitrary predicate queries that arrive online. Given fixed accuracy and privacy constraints, this mechanism can answer exponentially more queries than the previously best known interactive privacy me ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
We define a new interactive differentially private mechanism — the median mechanism — for answering arbitrary predicate queries that arrive online. Given fixed accuracy and privacy constraints, this mechanism can answer exponentially more queries than the previously best known interactive privacy mechanism (the Laplace mechanism, which independently perturbs each query result). With respect to the number of queries, our guarantee is close to the best possible, even for non-interactive privacy mechanisms. Conceptually, the median mechanism is the first privacy mechanism capable of identifying and exploiting correlations among queries in an interactive setting. We also give an efficient implementation of the median mechanism, with running time polynomial in the number of queries, the database size, and the domain size. This efficient implementation guarantees privacy for all input databases, and accurate query results for almost all input distributions. The dependence of the privacy on the number of queries in this mechanism improves over that of the best previously known efficient mechanism by a super-polynomial factor, even in the non-interactive setting.
A Statistical Framework for Differential Privacy
"... One goal of statistical privacy research is to construct a data release mechanism that protects individual privacy while preserving information content. An example is a random mechanism that takes an input database X and outputs a random database Z according to a distribution Qn(·|X). Differential p ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
One goal of statistical privacy research is to construct a data release mechanism that protects individual privacy while preserving information content. An example is a random mechanism that takes an input database X and outputs a random database Z according to a distribution Qn(·|X). Differential privacy is a particular privacy requirement developed by computer scientists in which Qn(·|X) is required to be insensitive to changes in one data point in X. This makes it difficult to infer from Z whether a given individual is in the original database X. We consider differential privacy from a statistical perspective. We consider several data-release mechanisms that satisfy the differential privacy requirement. We show that it is useful to compare these schemes by computing the rate of convergence of distributions and densities constructed from the released data. We study a general privacy method, called the exponential mechanism, introduced by McSherry and Talwar (2007). We show that the accuracy of this method is intimately linked to the rate at which the probability that the empirical distribution concentrates in a small ball around the true distribution.
Preserving Module Privacy in Workflow Provenance
, 2010
"... We study the problem of providing workflow data provenance without revealing the functionality of any module. We develop a model that formalizes the notion of privacy of modules embedded in a workflow structure as a natural extension of privacy of standalone modules. Our model shows that by hiding a ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
We study the problem of providing workflow data provenance without revealing the functionality of any module. We develop a model that formalizes the notion of privacy of modules embedded in a workflow structure as a natural extension of privacy of standalone modules. Our model shows that by hiding a small amount of carefully chosen data, one can ensure privacy of all modules over an unbounded number of executions. The problem of identifying the smallest possible amount of such data is NP-hard, and in the full generality of our model it is in fact even hard to get a good approximation. However, we are able to design good approximation algorithms for optimizing the amount of hidden data when either the privacy model is slighted restricted or there is bounded sharing of data items among various modules.
Differentially Private Data Cubes: Optimizing Noise Sources and Consistency
"... Data cubes play an essential role in data analysis and decision support. In a data cube, data from a fact table is aggregated on subsets of the table’s dimensions, forming a collection of smaller tables called cuboids. When the fact table includes sensitive data such as salary or diagnosis, publishi ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Data cubes play an essential role in data analysis and decision support. In a data cube, data from a fact table is aggregated on subsets of the table’s dimensions, forming a collection of smaller tables called cuboids. When the fact table includes sensitive data such as salary or diagnosis, publishing even a subset of its cuboids may compromise individuals ’ privacy. In this paper, we address this problem using differential privacy (DP), which provides provable privacy guarantees for individuals by adding noise to query answers. We choose an initial subset of cuboids to compute directly from the fact table, injecting DP noise as usual; and then compute the remaining cuboids from the initial set. Given a fixed privacy guarantee, we show that it is NP-hard to choose the initial set of cuboids so that the maximal noise over all published cuboids is minimized, or so that the number of cuboids with noise below a given threshold (precise cuboids) is maximized. We provide an efficient procedure with running time polynomial in the number of cuboids to select the initial set of cuboids, such that the maximal noise in all published cuboids will be within a factor (ln |L | +1) 2 of the optimal, where |L | is the number of cuboids to be published, or the number of precise cuboids will be within a factor (1 − 1/e) of the optimal. We also show how to enforce consistency in the published cuboids while simultaneously improving their utility (reducing error). In an empirical evaluation on real and synthetic data, we report the amounts of error of different publishing algorithms, and show that our approaches outperform baselines significantly.
Approximate Privacy: Foundations and Quantification
, 2009
"... Increasing use of computers and networks in business, government, recreation, and almost all aspects of daily life has led to a proliferation of online sensitive data about individuals and organizations. Consequently, concern about the privacy of these data has become a top priority, particularly th ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Increasing use of computers and networks in business, government, recreation, and almost all aspects of daily life has led to a proliferation of online sensitive data about individuals and organizations. Consequently, concern about the privacy of these data has become a top priority, particularly those data that are created and used in electronic commerce. There have been many formulations of privacy and, unfortunately, many negative results about the feasibility of maintaining privacy of sensitive data in realistic networked environments. We formulate communication-complexity-based definitions, both worst-case and average-case, of a problem’s privacy-approximation ratio. We use our definitions to investigate the extent to which approximate privacy is achievable in two standard problems: the 2 nd-price Vickrey auction [18] and the millionaires problem of Yao [20]. For both the 2 nd-price Vickrey auction and the millionaires problem, we show that not only is perfect privacy impossible or infeasibly costly to achieve, but even close approximations of perfect privacy suffer from the same lower bounds. By contrast, we show that, if the values of the parties are drawn uniformly at random from {0,..., 2 k − 1}, then, for
Approximate Privacy: Foundations and Quantification (Extended Abstract)
"... Increasing use of computers and networks in business, government, recreation, and almost all aspects of daily life has led to a proliferation of online sensitive data about individuals and organizations. Consequently, concern about the privacy of these data has become a top priority, particularly th ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Increasing use of computers and networks in business, government, recreation, and almost all aspects of daily life has led to a proliferation of online sensitive data about individuals and organizations. Consequently, concern about the privacy of these data has become a top priority, particularly those data that are created and used in electronic commerce. Despite many careful formulations and extensive study, there are still open questions about the feasibility of maintaining meaningful privacy in realistic networked environments. We formulate communication-complexity-based definitions, both worst-case and average-case, of a problem’s privacy-approximation ratio. We use our definitions to investigate the extent to which approximate privacy is achievable in many well studied contexts: the 2 nd-price Vickrey auction [20], the millionaires problem of Yao [22], the provisioning of a public good, and also set disjointness and set intersection. We present both positive and negative results and many interesting directions for future research. Categories and Subject Descriptors
iReduct: Differential privacy with reduced relative errors
- In SIGMOD
, 2011
"... Prior work in differential privacy has produced techniques for answering aggregate queries over sensitive data in a privacypreserving way. These techniques achieve privacy by adding noise to the query answers. Their objective is typically to minimize absolute errors while satisfying differential pri ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Prior work in differential privacy has produced techniques for answering aggregate queries over sensitive data in a privacypreserving way. These techniques achieve privacy by adding noise to the query answers. Their objective is typically to minimize absolute errors while satisfying differential privacy. Thus, query answers are injected with noise whose scale is independent of whether the answers are large or small. The noisy results for queries whose true answers are small therefore tend to be dominated by noise, which leads to inferior data utility. This paper introduces iReduct, a differentially private algorithm for computing answers with reduced relative errors. The basic idea of iReduct is to inject different amounts of noise to different query results, so that smaller (larger) values are more likely to be injected with less (more) noise. The algorithm is based on a novel resampling technique that employs correlated noise to improve data utility. Performance is evaluated on an instantiation of iReduct that generates marginals, i.e., projections of multi-dimensional histograms onto subsets of their attributes. Experiments on real data demonstrate the effectiveness of our solution. Categories and Subject Descriptors H.2.0 [DATABASE MANAGEMENT]: Security, integrity, and

