Results 1  10
of
52
Optimizing linear counting queries under differential privacy
 In PODS ’10: Proceedings of the twentyninth ACM SIGMODSIGACTSIGART symposium on Principles of database systems of data
, 2010
"... Differential privacy is a robust privacy standard that has been successfully applied to a range of data analysis tasks. But despite much recent work, optimal strategies for answering a collection of related queries are not known. We propose the matrix mechanism, a new algorithm for answering a workl ..."
Abstract

Cited by 42 (6 self)
 Add to MetaCart
Differential privacy is a robust privacy standard that has been successfully applied to a range of data analysis tasks. But despite much recent work, optimal strategies for answering a collection of related queries are not known. We propose the matrix mechanism, a new algorithm for answering a workload of predicate counting queries. Given a workload, the mechanism requests answers to a different set of queries, called a query strategy, which are answered using the standard Laplace mechanism. Noisy answers to the workload queries are then derived from the noisy answers to the strategy queries. This two stage process can result in a more complex correlated noise distribution that preserves differential privacy but increases accuracy. We provide a formal analysis of the error of query answers produced by the mechanism and investigate the problem of computing the optimal query strategy in support of a given workload. We show this problem can be formulated as a rankconstrained semidefinite program. Finally, we analyze two seemingly distinct techniques, whose similar behavior is explained by viewing them as instances of the matrix mechanism.
On the Geometry of Differential Privacy
, 2009
"... We consider the noise complexity of differentially private mechanisms in the setting where the user asks d linear queries f: ℜ n → ℜ nonadaptively. Here, the database is represented by a vector in ℜ n and proximity between databases is measured in the ℓ1metric. We show that the noise complexity is ..."
Abstract

Cited by 38 (2 self)
 Add to MetaCart
We consider the noise complexity of differentially private mechanisms in the setting where the user asks d linear queries f: ℜ n → ℜ nonadaptively. Here, the database is represented by a vector in ℜ n and proximity between databases is measured in the ℓ1metric. We show that the noise complexity is determined by two geometric parameters associated with the set of queries. We use this connection to give tight upper and lower bounds on the noise complexity for any d � n. We show that for d random linear queries of sensitivity 1, it is necessary and sufficient to add ℓ2error Θ(min{d √ d/ε, d √ log(n/d)/ε}) to achieve εdifferential privacy. Assuming the truth of a deep conjecture from convex geometry, known as the Hyperplane conjecture, we can extend our results to arbitrary linear queries giving nearly matching upper and lower bounds. Our bound translates to error O(min{d/ε, √ d log(n/d)/ε}) per answer. The best previous upper bound (Laplacian mechanism) gives a bound of O(min{d/ε, √ n/ε}) per answer, while the best known lower bound was Ω ( √ d/ε). In contrast, our lower bound is strong enough to separate the concept of differential privacy from the notion of approximate differential privacy where an upper bound of O ( √ d/ε) can be achieved.
The differential privacy frontier (extended abstract
 In TCC
, 2009
"... Abstract. We review the definition of differential privacy and briefly survey a handful of very recent contributions to the differential privacy frontier. 1 Background Differential privacy is a strong privacy guarantee for an individual’s input to a (randomized) function or sequence of functions, wh ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
Abstract. We review the definition of differential privacy and briefly survey a handful of very recent contributions to the differential privacy frontier. 1 Background Differential privacy is a strong privacy guarantee for an individual’s input to a (randomized) function or sequence of functions, which we call a privacy mechanism. Informally, the guarantee says that the behavior of the mechanism is essentially unchanged independent of whether any individual opts into or opts out of the data set. Designed for statistical analysis, for example, of health or census data, the definition protects the privacy of individuals, and small groups of individuals, while permitting very different outcomes in the case of very different data sets. We begin by recalling some differential privacy basics. While the frontier of a vibrant area is always in flux, we will endeavor to give an impression of the state of the art by surveying a handful of extremely recent advances
Interactive Privacy via the Median Mechanism
 In The 42nd ACM Symposium on the Theory of Computing
, 2010
"... We define a new interactive differentially private mechanism — the median mechanism — for answering arbitrary predicate queries that arrive online. Given fixed accuracy and privacy constraints, this mechanism can answer exponentially more queries than the previously best known interactive privacy me ..."
Abstract

Cited by 23 (4 self)
 Add to MetaCart
We define a new interactive differentially private mechanism — the median mechanism — for answering arbitrary predicate queries that arrive online. Given fixed accuracy and privacy constraints, this mechanism can answer exponentially more queries than the previously best known interactive privacy mechanism (the Laplace mechanism, which independently perturbs each query result). With respect to the number of queries, our guarantee is close to the best possible, even for noninteractive privacy mechanisms. Conceptually, the median mechanism is the first privacy mechanism capable of identifying and exploiting correlations among queries in an interactive setting. We also give an efficient implementation of the median mechanism, with running time polynomial in the number of queries, the database size, and the domain size. This efficient implementation guarantees privacy for all input databases, and accurate query results for almost all input distributions. The dependence of the privacy on the number of queries in this mechanism improves over that of the best previously known efficient mechanism by a superpolynomial factor, even in the noninteractive setting.
The Price of Privately Releasing Contingency Tables and the Spectra of Random Matrices with Correlated Rows
"... Marginal (contingency) tables are the method of choice for government agencies releasing statistical summaries of categorical data. In this paper, we consider lower bounds on how much distortion (noise) is necessary in these tables to provide privacy guarantees when the data being summarized is sens ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
Marginal (contingency) tables are the method of choice for government agencies releasing statistical summaries of categorical data. In this paper, we consider lower bounds on how much distortion (noise) is necessary in these tables to provide privacy guarantees when the data being summarized is sensitive. We extend a line of recent work on lower bounds on noise for private data analysis [9, 14, 15, 16] to a natural and important class of functionalities. Our investigation also leads to new results on the spectra of random matrices with correlated rows. Consider a database D consisting of n rows (one per individual), each row comprising d binary attributes. For any subset of T attributes of size T  = k, the marginal table for T has 2 k entries; each entry counts how many times in the database a particular setting of these attributes occurs. We provide lower bounds for releasing kattribute marginal tables under (i) minimal privacy, a general privacy notion which captures a large class of privacy definitions, and (ii) differential privacy, a rigorous notion of privacy that has received extensive recent study. Our main contributions are: • We give efficient polynomial time attacks which allow an adversary to reconstruct sensitive information given insufficiently perturbed marginal table releases. Using these reconstruction attacks,
A Statistical Framework for Differential Privacy
"... One goal of statistical privacy research is to construct a data release mechanism that protects individual privacy while preserving information content. An example is a random mechanism that takes an input database X and outputs a random database Z according to a distribution Qn(·X). Differential p ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
One goal of statistical privacy research is to construct a data release mechanism that protects individual privacy while preserving information content. An example is a random mechanism that takes an input database X and outputs a random database Z according to a distribution Qn(·X). Differential privacy is a particular privacy requirement developed by computer scientists in which Qn(·X) is required to be insensitive to changes in one data point in X. This makes it difficult to infer from Z whether a given individual is in the original database X. We consider differential privacy from a statistical perspective. We consider several datarelease mechanisms that satisfy the differential privacy requirement. We show that it is useful to compare these schemes by computing the rate of convergence of distributions and densities constructed from the released data. We study a general privacy method, called the exponential mechanism, introduced by McSherry and Talwar (2007). We show that the accuracy of this method is intimately linked to the rate at which the probability that the empirical distribution concentrates in a small ball around the true distribution.
Differentially Private Data Cubes: Optimizing Noise Sources and Consistency
"... Data cubes play an essential role in data analysis and decision support. In a data cube, data from a fact table is aggregated on subsets of the table’s dimensions, forming a collection of smaller tables called cuboids. When the fact table includes sensitive data such as salary or diagnosis, publishi ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
Data cubes play an essential role in data analysis and decision support. In a data cube, data from a fact table is aggregated on subsets of the table’s dimensions, forming a collection of smaller tables called cuboids. When the fact table includes sensitive data such as salary or diagnosis, publishing even a subset of its cuboids may compromise individuals ’ privacy. In this paper, we address this problem using differential privacy (DP), which provides provable privacy guarantees for individuals by adding noise to query answers. We choose an initial subset of cuboids to compute directly from the fact table, injecting DP noise as usual; and then compute the remaining cuboids from the initial set. Given a fixed privacy guarantee, we show that it is NPhard to choose the initial set of cuboids so that the maximal noise over all published cuboids is minimized, or so that the number of cuboids with noise below a given threshold (precise cuboids) is maximized. We provide an efficient procedure with running time polynomial in the number of cuboids to select the initial set of cuboids, such that the maximal noise in all published cuboids will be within a factor (ln L  +1) 2 of the optimal, where L  is the number of cuboids to be published, or the number of precise cuboids will be within a factor (1 − 1/e) of the optimal. We also show how to enforce consistency in the published cuboids while simultaneously improving their utility (reducing error). In an empirical evaluation on real and synthetic data, we report the amounts of error of different publishing algorithms, and show that our approaches outperform baselines significantly.
Towards an axiomatization of statistical privacy and utility
 In PODS
, 2010
"... “Privacy ” and “utility ” are words that frequently appear in the literature on statistical privacy. But what do these words really mean? In recent years, many problems with intuitive notions of privacy and utility have been uncovered. Thus more formal notions of privacy and utility, which are amena ..."
Abstract

Cited by 13 (8 self)
 Add to MetaCart
“Privacy ” and “utility ” are words that frequently appear in the literature on statistical privacy. But what do these words really mean? In recent years, many problems with intuitive notions of privacy and utility have been uncovered. Thus more formal notions of privacy and utility, which are amenable to mathematical analysis, are needed. In this paper we present our initial work on an axiomatization of privacy and utility. In particular, we study how these concepts are affected by randomized algorithms. Our analysis yields new insights into the construction of both privacy definitions and mechanisms that generate data according to such definitions. In particular, it characterizes a class of relaxations of differential privacy and shows that desirable outputs of a differentially private mechanism are best interpreted as certain graphs rather than query answers or synthetic data.
Differentially private spatial decompositions
 In ICDE
, 2012
"... Abstract — Differential privacy has recently emerged as the de facto standard for private data release. This makes it possible to provide strong theoretical guarantees on the privacy and utility of released data. While it is wellunderstood how to release data based on counts and simple functions un ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Abstract — Differential privacy has recently emerged as the de facto standard for private data release. This makes it possible to provide strong theoretical guarantees on the privacy and utility of released data. While it is wellunderstood how to release data based on counts and simple functions under this guarantee, it remains to provide general purpose techniques that are useful for a wider variety of queries. In this paper, we focus on spatial data, i.e., any multidimensional data that can be indexed by a tree structure. Directly applying existing differential privacy methods to this type of data simply generates noise. We propose instead the class of “private spatial decompositions”: these adapt standard spatial indexing methods such as quadtrees and kdtrees to provide a private description of the data distribution. Equipping such structures with differential privacy requires several steps to ensure that they provide meaningful privacy guarantees. Various basic steps, such as choosing splitting points and describing the distribution of points within a region, must be done privately, and the guarantees of the different building blocks must be composed into an overall guarantee. Consequently, we expose the design space for private spatial decompositions, and analyze some key examples. A major contribution of our work is to provide new techniques for parameter setting and postprocessing of the output to improve the accuracy of query answers. Our experimental study demonstrates that it is possible to build such decompositions efficiently, and use them to answer a variety of queries privately and with high accuracy. I.
iReduct: Differential privacy with reduced relative errors
 In SIGMOD
, 2011
"... Prior work in differential privacy has produced techniques for answering aggregate queries over sensitive data in a privacypreserving way. These techniques achieve privacy by adding noise to the query answers. Their objective is typically to minimize absolute errors while satisfying differential pri ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Prior work in differential privacy has produced techniques for answering aggregate queries over sensitive data in a privacypreserving way. These techniques achieve privacy by adding noise to the query answers. Their objective is typically to minimize absolute errors while satisfying differential privacy. Thus, query answers are injected with noise whose scale is independent of whether the answers are large or small. The noisy results for queries whose true answers are small therefore tend to be dominated by noise, which leads to inferior data utility. This paper introduces iReduct, a differentially private algorithm for computing answers with reduced relative errors. The basic idea of iReduct is to inject different amounts of noise to different query results, so that smaller (larger) values are more likely to be injected with less (more) noise. The algorithm is based on a novel resampling technique that employs correlated noise to improve data utility. Performance is evaluated on an instantiation of iReduct that generates marginals, i.e., projections of multidimensional histograms onto subsets of their attributes. Experiments on real data demonstrate the effectiveness of our solution. Categories and Subject Descriptors H.2.0 [DATABASE MANAGEMENT]: Security, integrity, and