Results 1  10
of
60
PrivacyPreserving Data Mining
, 2000
"... A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models with ..."
Abstract

Cited by 665 (3 self)
 Add to MetaCart
(Show Context)
A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models without access to precise information in individual data records? We consider the concrete case of building a decisiontree classifier from tredning data in which the values of individual records have been perturbed. The resulting data records look very different from the original records and the distribution of data values is also very different from the original distribution. While it is not possible to accurately estimate original values in individual data records, we propose anovel reconstruction procedure to accurately estimate the distribution of original data values. By using these reconstructed distributions, we are able to build classifiers whose accuracy is comparable to the accuracy of classifiers built with the original data.
Securitycontrol methods for statistical databases: a comparative study
 ACM Computing Surveys
, 1989
"... This paper considers the problem of providing security to statistical databases against disclosure of confidential information. Securitycontrol methods suggested in the literature are classified into four general approaches: conceptual, query restriction, data perturbation, and output perturbation. ..."
Abstract

Cited by 353 (0 self)
 Add to MetaCart
This paper considers the problem of providing security to statistical databases against disclosure of confidential information. Securitycontrol methods suggested in the literature are classified into four general approaches: conceptual, query restriction, data perturbation, and output perturbation. Criteria for evaluating the performance of the various securitycontrol methods are identified. Securitycontrol methods that are based on each of the four approaches are discussed, together with their performance with respect to the identified evaluation criteria. A detailed comparative analysis of the most promising methods for protecting dynamiconline statistical databases is also presented. To date no single securitycontrol method prevents both exact and partial disclosures. There are, however, a few perturbationbased methods that prevent exact disclosure and enable the database administrator to exercise “statistical disclosure control. ” Some of these methods, however introduce bias into query responses or suffer from the O/l querysetsize problem (i.e., partial disclosure is possible in case of null query set or a query set of size 1). We recommend directing future research efforts toward developing new methods that prevent exact disclosure and provide statisticaldisclosure control, while at the same time do not suffer from the bias problem and the O/l querysetsize problem. Furthermore, efforts directed toward developing a biascorrection mechanism and solving the general problem of small querysetsize would help salvage a few of the current perturbationbased methods.
Hippocratic databases
 In 28th Int’l Conference on Very Large Databases, Hong Kong
, 2002
"... The Hippocratic Oath has guided the conduct of physicians for centuries. Inspired by its tenet of preserving privacy, we argue that future database systems must include responsibility for the privacy of data they manage as a founding tenet. We enunciate the key privacy principles for such Hippocrati ..."
Abstract

Cited by 219 (17 self)
 Add to MetaCart
(Show Context)
The Hippocratic Oath has guided the conduct of physicians for centuries. Inspired by its tenet of preserving privacy, we argue that future database systems must include responsibility for the privacy of data they manage as a founding tenet. We enunciate the key privacy principles for such Hippocratic database systems. We propose a strawman design for Hippocratic databases, identify the technical challenges and problems in designing such databases, and suggest some approaches that may lead to solutions. Our hope is that this paper will serve to catalyze a fruitful and exciting direction for future database research. 1
Revealing information while preserving privacy
 In PODS
, 2003
"... We examine the tradeoff between privacy and usability of statistical databases. We model a statistical database by an nbit string d1,.., dn, with a query being a subset q ⊆ [n] to be answered by � i∈q di. Our main result is a polynomial reconstruction algorithm of data from noisy (perturbed) subset ..."
Abstract

Cited by 211 (9 self)
 Add to MetaCart
(Show Context)
We examine the tradeoff between privacy and usability of statistical databases. We model a statistical database by an nbit string d1,.., dn, with a query being a subset q ⊆ [n] to be answered by � i∈q di. Our main result is a polynomial reconstruction algorithm of data from noisy (perturbed) subset sums. Applying this reconstruction algorithm to statistical databases we show that in order to achieve privacy one has to add perturbation of magnitude Ω ( √ n). That is, smaller perturbation always results in a strong violation of privacy. We show that this result is tight by exemplifying access algorithms for statistical databases that preserve privacy while adding perturbation of magnitude Õ(√n). For timeT bounded adversaries we demonstrate a privacypreserving access algorithm whose perturbation magnitude is ≈ √ T. 1
Differential privacy: A survey of results
 In Theory and Applications of Models of Computation
, 2008
"... Abstract. Over the past five years a new approach to privacypreserving ..."
Abstract

Cited by 132 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Over the past five years a new approach to privacypreserving
PrivacyPreserving Data Publishing: A Survey on Recent Developments
"... The collection of digital information by governments, corporations, and individuals has created tremendous opportunities for knowledge and informationbased decision making. Driven by mutual benefits, or by regulations that require certain data to be published, there is a demand for the exchange an ..."
Abstract

Cited by 100 (7 self)
 Add to MetaCart
The collection of digital information by governments, corporations, and individuals has created tremendous opportunities for knowledge and informationbased decision making. Driven by mutual benefits, or by regulations that require certain data to be published, there is a demand for the exchange and publication of data among various parties. Data in its original form, however, typically contains sensitive information about individuals, and publishing such data will violate individual privacy. The current practice in data publishing relies mainly on policies and guidelines as to what types of data can be published, and agreements on the use of published data. This approach alone may lead to excessive data distortion or insufficient protection. Privacypreserving data publishing (PPDP) provides methods and tools for publishing useful information while preserving data privacy. Recently, PPDP has received considerable attention in research communities, and many approaches have been proposed for different data publishing scenarios. In this survey, we will systematically summarize and evaluate different approaches to PPDP, study the challenges in practical data publishing, clarify the differences and requirements that distinguish PPDP from other related problems, and propose future research directions.
Protecting Location Privacy with Personalized kAnonymity: Architecture and Algorithms
 IEEE TRANSACTIONS ON MOBILE COMPUTING
, 2008
"... Continued advances in mobile networks and positioning technologies have created a strong market push for locationbased applications. Examples include locationaware emergency response, locationbased advertisement, and locationbased entertainment. An important challenge in the wide deployment of l ..."
Abstract

Cited by 71 (4 self)
 Add to MetaCart
(Show Context)
Continued advances in mobile networks and positioning technologies have created a strong market push for locationbased applications. Examples include locationaware emergency response, locationbased advertisement, and locationbased entertainment. An important challenge in the wide deployment of locationbased services (LBSs) is the privacyaware management of location information, providing safeguards for location privacy of mobile clients against vulnerabilities for abuse. This paper describes a scalable architecture for protecting the location privacy from various privacy threats resulting from uncontrolled usage of LBSs. This architecture includes the development of a personalized location anonymization model and a suite of location perturbation algorithms. A unique characteristic of our location privacy architecture is the use of a flexible privacy personalization framework to support location kanonymity for a wide range of mobile clients with contextsensitive privacy requirements. This framework enables each mobile client to specify the minimum level of anonymity that it desires and the maximum temporal and spatial tolerances that it is willing to accept when requesting kanonymitypreserving LBSs. We devise an efficient message perturbation engine to implement the proposed location privacy framework. The prototype that we develop is designed to be run by the anonymity server on a trusted platform and performs location anonymization on LBS request messages of mobile clients such as identity removal and spatiotemporal cloaking of the location information. We study the effectiveness of our location cloaking algorithms under various conditions by using realistic location data that is synthetically generated from real road maps and traffic volume data. Our experiments show that the personalized location kanonymity model, together with our location perturbation engine, can achieve high resilience to location privacy threats without introducing any significant performance penalty.
Auditing Boolean Attributes
 Journal of Computer and System Sciences
, 2000
"... We study the problem of auditing databases which support statistical sum queries to protect the security of sensitive information; we focus on the special case in which the sensitive information is Boolean. Principles and techniques developed for the security of statistical databases in the case ..."
Abstract

Cited by 62 (0 self)
 Add to MetaCart
(Show Context)
We study the problem of auditing databases which support statistical sum queries to protect the security of sensitive information; we focus on the special case in which the sensitive information is Boolean. Principles and techniques developed for the security of statistical databases in the case of continuous attributes do not apply here. We prove certain strong complexity results suggesting that there is no general e#cient solution for the auditing problem in this case. We propose two e#cient algorithms: The first is applicable when the sum queries are onedimensional range queries (we prove that the problem is NPhard even in the twodimensional case). The second is an approximate algorithm that maintains security, although it may be too restrictive. Finally, we consider a "dual" variant, with continuous data but an aggregate function that is combinatorial in nature. Specifically, we provide algorithms for two natural definitions of the auditing condition when the aggregate function is max. 1
A data distortion by probability distribution
 ACM TRANSACTIONS ON DATABASE SYSTEMS
, 1985
"... This paper introduces data distortion by probability distribution, a probability distortion that involves three steps. The first step is to identify the underlying density function of the original series and to estimate the parameters of this density function. The second step is to generate a series ..."
Abstract

Cited by 62 (0 self)
 Add to MetaCart
(Show Context)
This paper introduces data distortion by probability distribution, a probability distortion that involves three steps. The first step is to identify the underlying density function of the original series and to estimate the parameters of this density function. The second step is to generate a series of data from the estimated density function. And the final step is to map and replace the generated series for the original one. Because it is replaced by the distorted data set, probability distortion guards the privacy of an individual belonging to the original data set. At the same time, the probability distorted series provides asymptotically the same statistical properties as those of the original series, since both are under the same distribution. Unlike conventional point distortion, probability distortion is difficult to compromise by repeated queries, and provides a maximum exposure for statistical analysis.
Workloadaware Anonymization
, 2006
"... Protecting data privacy is an important problem in microdata distribution. Anonymization algorithms typically aim to protect individual privacy, with minimal impact on the quality of the resulting data. While the bulk of previous work has measured quality through onesizefitsall measures, we argue ..."
Abstract

Cited by 61 (2 self)
 Add to MetaCart
Protecting data privacy is an important problem in microdata distribution. Anonymization algorithms typically aim to protect individual privacy, with minimal impact on the quality of the resulting data. While the bulk of previous work has measured quality through onesizefitsall measures, we argue that quality is best judged with respect to the workload for which the data will ultimately be used. This paper provides a suite of anonymization algorithms that produce an anonymous view based on a target class of workloads, consisting of one or more data mining tasks, as well as selection predicates. An extensive experimental evaluation indicates that this approach is often more effective than previous anonymization techniques.