Results 1 -
7 of
7
Privacy-Preserving Data Mining
, 2000
"... A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models with ..."
Abstract
-
Cited by 483 (3 self)
- Add to MetaCart
A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models without access to precise information in individual data records? We consider the concrete case of building a decision-tree classifier from tredning data in which the values of individual records have been perturbed. The resulting data records look very different from the original records and the distribution of data values is also very different from the original distribution. While it is not possible to accurately estimate original values in individual data records, we propose a-novel reconstruction procedure to accurately estimate the distribution of original data values. By using these reconstructed distributions, we are able to build classifiers whose accuracy is comparable to the accuracy of classifiers built with the original data.
Hippocratic databases
- In 28th Int’l Conference on Very Large Databases, Hong Kong
, 2002
"... The Hippocratic Oath has guided the conduct of physicians for centuries. Inspired by its tenet of preserving privacy, we argue that future database systems must include responsibility for the privacy of data they manage as a founding tenet. We enunciate the key privacy principles for such Hippocrati ..."
Abstract
-
Cited by 156 (17 self)
- Add to MetaCart
The Hippocratic Oath has guided the conduct of physicians for centuries. Inspired by its tenet of preserving privacy, we argue that future database systems must include responsibility for the privacy of data they manage as a founding tenet. We enunciate the key privacy principles for such Hippocratic database systems. We propose a strawman design for Hippocratic databases, identify the technical challenges and problems in designing such databases, and suggest some approaches that may lead to solutions. Our hope is that this paper will serve to catalyze a fruitful and exciting direction for future database research. 1
Revealing information while preserving privacy
- In PODS
, 2003
"... We examine the tradeoff between privacy and usability of statistical databases. We model a statistical database by an n-bit string d1,.., dn, with a query being a subset q ⊆ [n] to be answered by � i∈q di. Our main result is a polynomial reconstruction algorithm of data from noisy (perturbed) subset ..."
Abstract
-
Cited by 141 (8 self)
- Add to MetaCart
We examine the tradeoff between privacy and usability of statistical databases. We model a statistical database by an n-bit string d1,.., dn, with a query being a subset q ⊆ [n] to be answered by � i∈q di. Our main result is a polynomial reconstruction algorithm of data from noisy (perturbed) subset sums. Applying this reconstruction algorithm to statistical databases we show that in order to achieve privacy one has to add perturbation of magnitude Ω ( √ n). That is, smaller perturbation always results in a strong violation of privacy. We show that this result is tight by exemplifying access algorithms for statistical databases that preserve privacy while adding perturbation of magnitude Õ(√n). For time-T bounded adversaries we demonstrate a privacy-preserving access algorithm whose perturbation magnitude is ≈ √ T. 1
Differential privacy: A survey of results
- In Theory and Applications of Models of Computation
, 2008
"... Abstract. Over the past five years a new approach to privacy-preserving ..."
Abstract
-
Cited by 65 (0 self)
- Add to MetaCart
Abstract. Over the past five years a new approach to privacy-preserving
An ad omnia approach to defining and achieving private data analysis
- In Proceedings of the 1st ACM SIGKDD international conference on Privacy, security, and trust in KDD
, 2007
"... Abstract. We briefly survey several privacy compromises in published datasets, some historical and some on paper. An inspection of these suggests that the problem lies with the nature of the privacy-motivated promises in question. These are typically syntactic, rather than semantic. They are also ad ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Abstract. We briefly survey several privacy compromises in published datasets, some historical and some on paper. An inspection of these suggests that the problem lies with the nature of the privacy-motivated promises in question. These are typically syntactic, rather than semantic. They are also ad hoc, with insufficient argument that fulfilling these syntactic and ad hoc conditions yields anything like what most people would regard as privacy. We examine two comprehensive, or ad omnia, guarantees for privacy in statistical databases discussed in the literature, note that one is unachievable, and describe implementations of the other. In this note we survey a body of work, developed over the past five years, addressing the problem known variously as statistical disclosure control, inference control, privacy-preserving datamining, and private data analysis. Our principal motivating scenario is a statistical database. A statistic is a quantity computed from a sample. Suppose a trusted and trustworthy curator gathers sensitive information from a large number of respondents (the sample), with the goal of learning (and releasing to the public) statistical facts about the underlying population. The problem is to release statistical information without compromising the privacy of the individual respondents. There are two settings: in the noninteractive setting the curator computes and publishes some statistics, and the data are not used further. Privacy concerns may affect the precise answers released by the curator, or even the set of statistics released. Note that since the data will never be used again the curator can destroy the data (and himself) once the statistics have been published. In the interactive setting the curator sits between the users and the database. Queries posed by the users, and/or the responses to these queries, may be modified by the curator in order to protect the privacy of the respondents. The data cannot be destroyed, and the curator must remain present throughout the lifetime of the database. There is a rich literature on this problem, principally from the satistics community
State-Of-The-Art in Refurbishing-Based Approach for Privacy Preserving Data Mining
"... Abstract—Data mining, with its pledge to competently discern valuable, non-obvious information from bulky databases, is principally defenseless to misuse. In this paper we present an outline of the, new and rapidly emerging research area of Privacy Preserving Data Mining and the technical likelihood ..."
Abstract
- Add to MetaCart
Abstract—Data mining, with its pledge to competently discern valuable, non-obvious information from bulky databases, is principally defenseless to misuse. In this paper we present an outline of the, new and rapidly emerging research area of Privacy Preserving Data Mining and the technical likelihood of realizing Privacy Preserving Data Mining. We also propose a classification hierarchy for refurbishing based techniques for privacy preservation that will lay down the origin for analyzing the toil which has been performed in this environment. A comprehensive appraisal of the toil accomplished in this vicinity is also given, along with the coordinates of each exertion to the classification hierarchy. For the purposes of this paper, we take for granted that suitable right of entry controls and security procedures are in position and effectual in preventing unconstitutional access to the system. Whenever sensitive information is exchanged, it must be transmitted in excess of a secure channel and stored securely. A short and snappy appraisal is performed, and some preliminary conclusions are made along with the metrics used for it.
Statistical Databases: Query Restriction
, 2004
"... Introduction A statistical database typically contains information about n individuals where n is very large. A statistical database system gives users the ability to both obtain statistical information (like average, median, count) and preserve the privacy of any individual. Examples include census ..."
Abstract
- Add to MetaCart
Introduction A statistical database typically contains information about n individuals where n is very large. A statistical database system gives users the ability to both obtain statistical information (like average, median, count) and preserve the privacy of any individual. Examples include census and medical databases.

