Results 1  10
of
57
PrivacyPreserving Data Mining
, 2000
"... A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models with ..."
Abstract

Cited by 608 (3 self)
 Add to MetaCart
A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models without access to precise information in individual data records? We consider the concrete case of building a decisiontree classifier from tredning data in which the values of individual records have been perturbed. The resulting data records look very different from the original records and the distribution of data values is also very different from the original distribution. While it is not possible to accurately estimate original values in individual data records, we propose anovel reconstruction procedure to accurately estimate the distribution of original data values. By using these reconstructed distributions, we are able to build classifiers whose accuracy is comparable to the accuracy of classifiers built with the original data.
ℓdiversity: Privacy beyond kanonymity
 In ICDE
, 2006
"... Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called kanonymity has gained popularity. In a kanonymized dataset, each record is indistinguishable from at least k − 1 other records with resp ..."
Abstract

Cited by 445 (12 self)
 Add to MetaCart
Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called kanonymity has gained popularity. In a kanonymized dataset, each record is indistinguishable from at least k − 1 other records with respect to certain “identifying ” attributes. In this paper we show using two simple attacks that a kanonymized dataset has some subtle, but severe privacy problems. First, an attacker can discover the values of sensitive attributes when there is little diversity in those sensitive attributes. This kind of attack is a known problem [60]. Second, attackers often have background knowledge, and we show that kanonymity does not guarantee privacy against attackers using background knowledge. We give a detailed analysis of these two attacks and we propose a novel and powerful privacy criterion called ℓdiversity that can defend against such attacks. In addition to building a formal foundation for ℓdiversity, we show in an experimental evaluation that ℓdiversity is practical and can be implemented efficiently. 1.
Securitycontrol methods for statistical databases: a comparative study
 ACM Computing Surveys
, 1989
"... This paper considers the problem of providing security to statistical databases against disclosure of confidential information. Securitycontrol methods suggested in the literature are classified into four general approaches: conceptual, query restriction, data perturbation, and output perturbation. ..."
Abstract

Cited by 315 (0 self)
 Add to MetaCart
This paper considers the problem of providing security to statistical databases against disclosure of confidential information. Securitycontrol methods suggested in the literature are classified into four general approaches: conceptual, query restriction, data perturbation, and output perturbation. Criteria for evaluating the performance of the various securitycontrol methods are identified. Securitycontrol methods that are based on each of the four approaches are discussed, together with their performance with respect to the identified evaluation criteria. A detailed comparative analysis of the most promising methods for protecting dynamiconline statistical databases is also presented. To date no single securitycontrol method prevents both exact and partial disclosures. There are, however, a few perturbationbased methods that prevent exact disclosure and enable the database administrator to exercise “statistical disclosure control. ” Some of these methods, however introduce bias into query responses or suffer from the O/l querysetsize problem (i.e., partial disclosure is possible in case of null query set or a query set of size 1). We recommend directing future research efforts toward developing new methods that prevent exact disclosure and provide statisticaldisclosure control, while at the same time do not suffer from the bias problem and the O/l querysetsize problem. Furthermore, efforts directed toward developing a biascorrection mechanism and solving the general problem of small querysetsize would help salvage a few of the current perturbationbased methods.
Calibrating noise to sensitivity in private data analysis
 In Proceedings of the 3rd Theory of Cryptography Conference
, 2006
"... Abstract. We continue a line of research initiated in [10, 11] on privacypreserving statistical databases. Consider a trusted server that holds a database of sensitive information. Given a query function f mapping databases to reals, the socalled true answer is the result of applying f to the datab ..."
Abstract

Cited by 309 (47 self)
 Add to MetaCart
Abstract. We continue a line of research initiated in [10, 11] on privacypreserving statistical databases. Consider a trusted server that holds a database of sensitive information. Given a query function f mapping databases to reals, the socalled true answer is the result of applying f to the database. To protect privacy, the true answer is perturbed by the addition of random noise generated according to a carefully chosen distribution, and this response, the true answer plus noise, is returned to the user. Previous work focused on the case of noisy sums, in which f =P i g(xi), where xi denotes the ith row of the database and g maps database rows to [0, 1]. We extend the study to general functions f, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the sensitivity of the function f. Roughly speaking, this is the amount that any single argument to f can change its output. The new analysis shows that for several particular applications substantially less noise is needed than was previously understood to be the case. The first step is a very clean characterization of privacy in terms of indistinguishability of transcripts. Additionally, we obtain separation results showing the increased value of interactive sanitization mechanisms over noninteractive. 1 Introduction We continue a line of research initiated in [10, 11] on privacy in statistical databases. A statistic is a quantity computed from a sample. Intuitively, if the database is a representative sample of an underlying population, the goal ofa privacypreserving statistical database is to enable the user to learn properties of the population as a whole while protecting the privacy of the individualcontributors.
Boosting and differential privacy
, 2010
"... Abstract—Boosting is a general method for improving the accuracy of learning algorithms. We use boosting to construct improved privacypreserving synopses of an input database. These are data structures that yield, for a given set Q of queries over an input database, reasonably accurate estimates of ..."
Abstract

Cited by 293 (8 self)
 Add to MetaCart
Abstract—Boosting is a general method for improving the accuracy of learning algorithms. We use boosting to construct improved privacypreserving synopses of an input database. These are data structures that yield, for a given set Q of queries over an input database, reasonably accurate estimates of the responses to every query in Q, even when the number of queries is much larger than the number of rows in the database. Given a base synopsis generator that takes a distribution on Q and produces a “weak ” synopsis that yields “good ” answers for a majority of the weight in Q, our Boosting for Queries algorithm obtains a synopsis that is good for all of Q. We ensure privacy for the rows of the database, but the boosting is performed on the queries. We also provide the first synopsis generators for arbitrary sets of arbitrary lowsensitivity
Revealing information while preserving privacy
 In PODS
, 2003
"... We examine the tradeoff between privacy and usability of statistical databases. We model a statistical database by an nbit string d1,.., dn, with a query being a subset q ⊆ [n] to be answered by � i∈q di. Our main result is a polynomial reconstruction algorithm of data from noisy (perturbed) subset ..."
Abstract

Cited by 199 (10 self)
 Add to MetaCart
We examine the tradeoff between privacy and usability of statistical databases. We model a statistical database by an nbit string d1,.., dn, with a query being a subset q ⊆ [n] to be answered by � i∈q di. Our main result is a polynomial reconstruction algorithm of data from noisy (perturbed) subset sums. Applying this reconstruction algorithm to statistical databases we show that in order to achieve privacy one has to add perturbation of magnitude Ω ( √ n). That is, smaller perturbation always results in a strong violation of privacy. We show that this result is tight by exemplifying access algorithms for statistical databases that preserve privacy while adding perturbation of magnitude Õ(√n). For timeT bounded adversaries we demonstrate a privacypreserving access algorithm whose perturbation magnitude is ≈ √ T. 1
Hippocratic databases
 In 28th Int’l Conference on Very Large Databases, Hong Kong
, 2002
"... The Hippocratic Oath has guided the conduct of physicians for centuries. Inspired by its tenet of preserving privacy, we argue that future database systems must include responsibility for the privacy of data they manage as a founding tenet. We enunciate the key privacy principles for such Hippocrati ..."
Abstract

Cited by 190 (17 self)
 Add to MetaCart
The Hippocratic Oath has guided the conduct of physicians for centuries. Inspired by its tenet of preserving privacy, we argue that future database systems must include responsibility for the privacy of data they manage as a founding tenet. We enunciate the key privacy principles for such Hippocratic database systems. We propose a strawman design for Hippocratic databases, identify the technical challenges and problems in designing such databases, and suggest some approaches that may lead to solutions. Our hope is that this paper will serve to catalyze a fruitful and exciting direction for future database research. 1
Practical privacy: the sulq framework
 In PODS ’05: Proceedings of the twentyfourth ACM SIGMODSIGACTSIGART symposium on Principles of database systems
, 2005
"... We consider a statistical database in which a trusted administrator introduces noise to the query responses with the goal of maintaining privacy of individual database entries. In such a database, a query consists of a pair (S, f) where S is a set of rows in the database and f is a function mapping ..."
Abstract

Cited by 158 (34 self)
 Add to MetaCart
We consider a statistical database in which a trusted administrator introduces noise to the query responses with the goal of maintaining privacy of individual database entries. In such a database, a query consists of a pair (S, f) where S is a set of rows in the database and f is a function mapping database rows to {0, 1}. The true answer is P i∈S f(di), and a noisy version is released as the response to the query. Results of Dinur, Dwork, and Nissim show that a strong form of privacy can be maintained using a surprisingly small amount of noise – much less than the sampling error – provided the total number of queries is sublinear in the number of database rows. We call this query and (slightly) noisy reply the SuLQ (SubLinear Queries) primitive. The assumption of sublinearity becomes reasonable as databases grow increasingly large. We extend this work in two ways. First, we modify the privacy analysis to realvalued functions f and arbitrary row types, as a consequence greatly improving the bounds on noise required for privacy. Second, we examine the computational power of the SuLQ primitive. We show that it is very powerful indeed, in that slightly noisy versions of the following computations can be carried out with very few invocations of the primitive: principal component analysis, k means clustering, the Perceptron Algorithm, the ID3 algorithm, and (apparently!) all algorithms that operate in the in the statistical query learning model [11].
Differential privacy: A survey of results
 In Theory and Applications of Models of Computation
, 2008
"... Abstract. Over the past five years a new approach to privacypreserving ..."
Abstract

Cited by 120 (0 self)
 Add to MetaCart
Abstract. Over the past five years a new approach to privacypreserving