Results 1  10
of
220
ℓdiversity: Privacy beyond kanonymity
 IN ICDE
, 2006
"... Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called kanonymity has gained popularity. In a kanonymized dataset, each record is indistinguishable from at least k − 1 other records with resp ..."
Abstract

Cited by 672 (13 self)
 Add to MetaCart
Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called kanonymity has gained popularity. In a kanonymized dataset, each record is indistinguishable from at least k − 1 other records with respect to certain “identifying ” attributes. In this paper we show using two simple attacks that a kanonymized dataset has some subtle, but severe privacy problems. First, an attacker can discover the values of sensitive attributes when there is little diversity in those sensitive attributes. This kind of attack is a known problem [60]. Second, attackers often have background knowledge, and we show that kanonymity does not guarantee privacy against attackers using background knowledge. We give a detailed analysis of these two attacks and we propose a novel and powerful privacy criterion called ℓdiversity that can defend against such attacks. In addition to building a formal foundation for ℓdiversity, we show in an experimental evaluation that ℓdiversity is practical and can be implemented efficiently.
Calibrating noise to sensitivity in private data analysis
 In Proceedings of the 3rd Theory of Cryptography Conference
, 2006
"... Abstract. We continue a line of research initiated in [10, 11] on privacypreserving statistical databases. Consider a trusted server that holds a database of sensitive information. Given a query function f mapping databases to reals, the socalled true answer is the result of applying f to the datab ..."
Abstract

Cited by 649 (60 self)
 Add to MetaCart
(Show Context)
Abstract. We continue a line of research initiated in [10, 11] on privacypreserving statistical databases. Consider a trusted server that holds a database of sensitive information. Given a query function f mapping databases to reals, the socalled true answer is the result of applying f to the database. To protect privacy, the true answer is perturbed by the addition of random noise generated according to a carefully chosen distribution, and this response, the true answer plus noise, is returned to the user. Previous work focused on the case of noisy sums, in which f =P i g(xi), where xi denotes the ith row of the database and g maps database rows to [0, 1]. We extend the study to general functions f, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the sensitivity of the function f. Roughly speaking, this is the amount that any single argument to f can change its output. The new analysis shows that for several particular applications substantially less noise is needed than was previously understood to be the case. The first step is a very clean characterization of privacy in terms of indistinguishability of transcripts. Additionally, we obtain separation results showing the increased value of interactive sanitization mechanisms over noninteractive. 1 Introduction We continue a line of research initiated in [10, 11] on privacy in statistical databases. A statistic is a quantity computed from a sample. Intuitively, if the database is a representative sample of an underlying population, the goal ofa privacypreserving statistical database is to enable the user to learn properties of the population as a whole while protecting the privacy of the individualcontributors.
Robust Deanonymization of Large Sparse Datasets.
 IEEE Symposium on Security and Privacy
, 2008
"... ..."
Differential privacy: A survey of results
 In Theory and Applications of Models of Computation
, 2008
"... Abstract. Over the past five years a new approach to privacypreserving ..."
Abstract

Cited by 258 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Over the past five years a new approach to privacypreserving
Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Steganography
, 2007
"... In a social network, nodes correspond to people or other social entities, and edges correspond to social links between them. In an effort to preserve privacy, the practice of anonymization replaces names with meaningless unique identifiers. We describe a family of attacks such that even from a singl ..."
Abstract

Cited by 220 (2 self)
 Add to MetaCart
(Show Context)
In a social network, nodes correspond to people or other social entities, and edges correspond to social links between them. In an effort to preserve privacy, the practice of anonymization replaces names with meaningless unique identifiers. We describe a family of attacks such that even from a single anonymized copy of a social network, it is possible for an adversary to learn whether edges exist or not between specific targeted pairs of nodes.
A learning theory approach to noninteractive database privacy
 In Proceedings of the 40th annual ACM symposium on Theory of computing
, 2008
"... In this paper we demonstrate that, ignoring computational constraints, it is possible to release synthetic databases that are useful for accurately answering large classes of queries while preserving differential privacy. Specifically, we give a mechanism that privately releases synthetic data usefu ..."
Abstract

Cited by 220 (25 self)
 Add to MetaCart
(Show Context)
In this paper we demonstrate that, ignoring computational constraints, it is possible to release synthetic databases that are useful for accurately answering large classes of queries while preserving differential privacy. Specifically, we give a mechanism that privately releases synthetic data useful for answering a class of queries over a discrete domain with error that grows as a function of the size of the smallest net approximately representing the answers to that class of queries. We show that this in particular implies a mechanism for counting queries that gives error guarantees that grow only with the VCdimension of the class of queries, which itself grows at most logarithmically with the size of the query class. We also show that it is not possible to release even simple classes of queries (such as intervals and their generalizations) over continuous domains with worstcase utility guarantees while preserving differential privacy. In response to this, we consider a relaxation of the utility guarantee and give a privacy preserving polynomial time algorithm that for any halfspace query will provide an answer that is accurate for some small perturbation of the query. This algorithm does not release synthetic data, but instead another data structure capable of representing an answer for each query. We also give an efficient algorithm for releasing synthetic data for the class of interval queries and axisaligned rectangles of constant dimension over discrete domains. 1.
PrivacyPreserving Data Publishing: A Survey on Recent Developments
"... The collection of digital information by governments, corporations, and individuals has created tremendous opportunities for knowledge and informationbased decision making. Driven by mutual benefits, or by regulations that require certain data to be published, there is a demand for the exchange an ..."
Abstract

Cited by 219 (16 self)
 Add to MetaCart
The collection of digital information by governments, corporations, and individuals has created tremendous opportunities for knowledge and informationbased decision making. Driven by mutual benefits, or by regulations that require certain data to be published, there is a demand for the exchange and publication of data among various parties. Data in its original form, however, typically contains sensitive information about individuals, and publishing such data will violate individual privacy. The current practice in data publishing relies mainly on policies and guidelines as to what types of data can be published, and agreements on the use of published data. This approach alone may lead to excessive data distortion or insufficient protection. Privacypreserving data publishing (PPDP) provides methods and tools for publishing useful information while preserving data privacy. Recently, PPDP has received considerable attention in research communities, and many approaches have been proposed for different data publishing scenarios. In this survey, we will systematically summarize and evaluate different approaches to PPDP, study the challenges in practical data publishing, clarify the differences and requirements that distinguish PPDP from other related problems, and propose future research directions.
Mechanism design via differential privacy
 Proceedings of the 48th Annual Symposium on Foundations of Computer Science
, 2007
"... We study the role that privacypreserving algorithms, which prevent the leakage of specific information about participants, can play in the design of mechanisms for strategic agents, which must encourage players to honestly report information. Specifically, we show that the recent notion of differen ..."
Abstract

Cited by 212 (3 self)
 Add to MetaCart
(Show Context)
We study the role that privacypreserving algorithms, which prevent the leakage of specific information about participants, can play in the design of mechanisms for strategic agents, which must encourage players to honestly report information. Specifically, we show that the recent notion of differential privacy [15, 14], in addition to its own intrinsic virtue, can ensure that participants have limited effect on the outcome of the mechanism, and as a consequence have limited incentive to lie. More precisely, mechanisms with differential privacy are approximate dominant strategy under arbitrary player utility functions, are automatically resilient to coalitions, and easily allow repeatability. We study several special cases of the unlimited supply auction problem, providing new results for digital goods auctions, attribute auctions, and auctions with arbitrary structural constraints on the prices. As an important prelude to developing a privacypreserving auction mechanism, we introduce and study a generalization of previous privacy work that accommodates the high sensitivity of the auction setting, where a single participant may dramatically alter the optimal fixed price, and a slight change in the offered price may take the revenue from optimal to zero. 1
Smooth sensitivity and sampling in private data analysis
 In STOC
, 2007
"... We introduce a new, generic framework for private data analysis. The goal of private data analysis is to release aggregate information about a data set while protecting the privacy of the individuals whose information the data set contains. Our framework allows one to release functions f of the data ..."
Abstract

Cited by 173 (16 self)
 Add to MetaCart
(Show Context)
We introduce a new, generic framework for private data analysis. The goal of private data analysis is to release aggregate information about a data set while protecting the privacy of the individuals whose information the data set contains. Our framework allows one to release functions f of the data with instancebased additive noise. That is, the noise magnitude is determined not only by the function we want to release, but also by the database itself. One of the challenges is to ensure that the noise magnitude does not leak information about the database. To address that, we calibrate the noise magnitude to the smooth sensitivity of f on the database x — a measure of variability of f in the neighborhood of the instance x. The new framework greatly expands the applicability of output perturbation, a technique for protecting individuals ’ privacy by adding a small amount of random noise to the released statistics. To our knowledge, this is the first formal analysis of the effect of instancebased noise in the context of data privacy. Our framework raises many interesting algorithmic questions. Namely, to apply the framework one must compute or approximate the smooth sensitivity of f on x. We show how to do this efficiently for several different functions, including the median and the cost of the minimum spanning tree. We also give a generic procedure based on sampling that allows one to release f(x) accurately on many databases x. This procedure is applicable even when no efficient algorithm for approximating smooth sensitivity of f is known or when f is given as a black box. We illustrate the procedure by applying it to kSED (kmeans) clustering and learning mixtures of Gaussians.
Our Data, Ourselves: Privacy via Distributed Noise Generation
 In EUROCRYPT
, 2006
"... Abstract. In this work we provide efficient distributed protocols for generating shares of random noise, secure against malicious participants. The purpose of the noise generation is to create a distributed implementation of the privacypreserving statistical databases described in recent papers [14 ..."
Abstract

Cited by 152 (15 self)
 Add to MetaCart
(Show Context)
Abstract. In this work we provide efficient distributed protocols for generating shares of random noise, secure against malicious participants. The purpose of the noise generation is to create a distributed implementation of the privacypreserving statistical databases described in recent papers [14,4,13]. In these databases, privacy is obtained by perturbing the true answer to a database query by the addition of a small amount of Gaussian or exponentially distributed random noise. The computational power of evenasimple form of these databases, when the queryis just of the form È i f(di), that is, the sum over all rows i in the database of a function f applied to the data in row i, has been demonstrated in [4]. A distributed implementation eliminates the need for a trusted database administrator. The results for noise generation are of independent interest. The generation of Gaussian noise introduces a technique for distributing shares of many unbiased coins with fewer executions of verifiable secret sharing than would be needed using previous approaches (reduced by afactorofn). The generation of exponentially distributed noise uses two shallow circuits: one for generating many arbitrarily but identically biased coins at an amortized cost of two unbiased random bits apiece, independent of the bias, and the other to combine bits of appropriate biases to obtain an exponential distribution. 1