Results 1 - 10
of
39
Privacy-Preserving Data Publishing: A Survey on Recent Developments
"... The collection of digital information by governments, corporations, and individuals has created tremendous opportunities for knowledge- and information-based decision making. Driven by mutual benefits, or by regulations that require certain data to be published, there is a demand for the exchange an ..."
Abstract
-
Cited by 219 (16 self)
- Add to MetaCart
The collection of digital information by governments, corporations, and individuals has created tremendous opportunities for knowledge- and information-based decision making. Driven by mutual benefits, or by regulations that require certain data to be published, there is a demand for the exchange and publication of data among various parties. Data in its original form, however, typically contains sensitive information about individuals, and publishing such data will violate individual privacy. The current practice in data publishing relies mainly on policies and guidelines as to what types of data can be published, and agreements on the use of published data. This approach alone may lead to excessive data distortion or insufficient protection. Privacy-preserving data publishing (PPDP) provides methods and tools for publishing useful information while preserving data privacy. Recently, PPDP has received considerable attention in research communities, and many approaches have been proposed for different data publishing scenarios. In this survey, we will systematically summarize and evaluate different approaches to PPDP, study the challenges in practical data publishing, clarify the differences and requirements that distinguish PPDP from other related problems, and propose future research directions.
Preservation of Proximity Privacy in Publishing Numerical Sensitive Data
, 2008
"... We identify proximity breach as a privacy threat specific to numerical sensitive attributes in anonymized data publication. Such breach occurs when an adversary concludes with high confidence that the sensitive value of a victim individual must fall in a short interval — even though the adversary ma ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
(Show Context)
We identify proximity breach as a privacy threat specific to numerical sensitive attributes in anonymized data publication. Such breach occurs when an adversary concludes with high confidence that the sensitive value of a victim individual must fall in a short interval — even though the adversary may have low confidence about the victim’s actual value. None of the existing anonymization principles (e.g., kanonymity, l-diversity, etc.) can effectively prevent proximity breach. We remedy the problem by introducing a novel principle called (ε, m)-anonymity. Intuitively, the principle demands that, given a QI-group G, for every sensitive value x in G, at most 1/m of the tuples in G can have sensitive values “similar” to x, where the similarity is controlled by ε. We provide a careful analytical study of the theoretical characteristics of (ε, m)-anonymity, and the corresponding generalization algorithm. Our findings are verified by experiments with real data.
Privacy-preserving data mashup
- In Proc. of the 12th International Conference on Extending Database Technology (EDBT). Saint-Petersburg
, 2009
"... Mashup is a web technology that combines information from more than one source into a single web application. This technique provides a new platform for different data providers to flexibly integrate their expertise and deliver highly customizable services to their customers. Nonetheless, combining ..."
Abstract
-
Cited by 16 (9 self)
- Add to MetaCart
(Show Context)
Mashup is a web technology that combines information from more than one source into a single web application. This technique provides a new platform for different data providers to flexibly integrate their expertise and deliver highly customizable services to their customers. Nonetheless, combining data from different sources could potentially reveal person-specific sensitive information. In this paper, we study and resolve a real-life privacy problem in a data mashup application for the financial industry in Sweden, and propose a privacy-preserving data mashup (PPMashup) algorithm to securely integrate private data from different data providers, whereas the integrated data still retains the essential information for supporting general data exploration or a specific data mining task, such as classification analysis. Experiments on real-life data suggest that our proposed method is effective for simultaneously preserving both privacy and information usefulness, and is scalable for handling large volume of data. 1.
Distributed Anonymization: Achieving Privacy for Both Data Subjects and Data Providers
"... Abstract. There is an increasing need for sharing data repositories containing personal information across multiple distributed and private databases. However, such data sharing is subject to constraints imposed by privacy of individuals or data subjects as well as data confidentiality of institutio ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
(Show Context)
Abstract. There is an increasing need for sharing data repositories containing personal information across multiple distributed and private databases. However, such data sharing is subject to constraints imposed by privacy of individuals or data subjects as well as data confidentiality of institutions or data providers. Concretely, given a query spanning multiple databases, query results should not contain individually identifiable information. In addition, institutions should not reveal their databases to each other apart from the query results. In this paper, we develop a set of decentralized protocols that enable data sharing for horizontally partitioned databases given these constraints. Our approach includes a new notion, l-site-diversity, for data anonymization to ensure anonymity of data providers in addition to that of data subjects, and a distributed anonymization protocol that allows independent data providers to build a virtual anonymized database while maintaining both privacy constraints. 1
m-Privacy for Collaborative Data Publishing
"... Abstract—In this paper, we consider the collaborative data publishing problem for anonymizing horizontally partitioned data at multiple data providers. We consider a new type of “insider attack ” by colluding data providers who may use their own data records (a subset of the overall data) in additio ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
(Show Context)
Abstract—In this paper, we consider the collaborative data publishing problem for anonymizing horizontally partitioned data at multiple data providers. We consider a new type of “insider attack ” by colluding data providers who may use their own data records (a subset of the overall data) in addition to the external background knowledge to infer the data records contributed by other data providers. The paper addresses this new threat and makes several contributions. First, we introduce the notion of m-privacy, which guarantees that the anonymized data satisfies a given privacy constraint against any group of up to m colluding data providers. Second, we present heuristic algorithms exploiting the equivalence group monotonicity of privacy constraints and adaptive ordering techniques for efficiently checking m-privacy given a set of records. Finally, we present a data provider-aware anonymization algorithm with adaptive m-privacy checking strategies to ensure high utility and m-privacy of anonymized data with efficiency. Experiments on real-life datasets suggest that our approach achieves better or comparable utility and efficiency than existing and baseline algorithms while providing m-privacy guarantee. I.
Reasoning about the Appropriate Use of Private Data through Computational Workflows
, 2010
"... While there is a plethora of mechanisms to ensure lawful access to privacy-protected data, additional research is required in order to reassure individuals that their personal data is being used for the purpose that they consented to. This is particularly important in the context of new data mining ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
While there is a plethora of mechanisms to ensure lawful access to privacy-protected data, additional research is required in order to reassure individuals that their personal data is being used for the purpose that they consented to. This is particularly important in the context of new data mining approaches, as used, for instance, in biomedical research and commercial data mining. We argue for the use of computational workflows to ensure and enforce appropriate use of sensitive personal data. Computational workflows describe in a declarative manner the data processing steps and the expected results of complex data analysis processes such as data mining (Gil et al. 2007b; Taylor et al. 2006). We see workflows as an artifact that captures, among other things, how data is being used and for what purpose. Existing frameworks for computational workflows need to be extended to incorporate privacy policies that can govern the use of data.
The Hardness and Approximation Algorithms for L-Diversity
"... The existing solutions to privacy preserving publication can be classified into the theoretical and heuristic categories. The former guarantees provably low information loss, whereas the latter incurs gigantic loss in the worst case, but is shown empirically to perform well on many real inputs. Whil ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
(Show Context)
The existing solutions to privacy preserving publication can be classified into the theoretical and heuristic categories. The former guarantees provably low information loss, whereas the latter incurs gigantic loss in the worst case, but is shown empirically to perform well on many real inputs. While numerous heuristic algorithms have been developed to satisfy advanced privacy principles such as l-diversity, t-closeness, etc., the theoretical category is currently limited to k-anonymity which is the earliest principle known to have severe vulnerability to privacy attacks. Motivated by this, we present the first theoretical study on l-diversity, a popular principle that is widely adopted in the literature. First, we show that optimal l-diverse generalization is NP-hard even when there are only 3 distinct sensitive values in the microdata. Then, an (l · d)approximation algorithm is developed, where d is the dimensionality of the underlying dataset. This is the first known algorithm with a non-trivial bound on information loss. Extensive experiments with real datasets validate the effectiveness and efficiency of proposed solution. 1.
ANGEL: Enhancing the utility of generalization for privacy preserving publication
"... Generalization is a well-known method for privacy preserving data publication. Despite its vast popularity, it has several drawbacks such as heavy information loss, difficulty of supporting marginal publication, and so on. To overcome these drawbacks, we develop ANGEL 1, a new anonymization techniqu ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Generalization is a well-known method for privacy preserving data publication. Despite its vast popularity, it has several drawbacks such as heavy information loss, difficulty of supporting marginal publication, and so on. To overcome these drawbacks, we develop ANGEL 1, a new anonymization technique that is as effective as generalization in privacy protection, but is able to retain significantly more information in the microdata. ANGEL is applicable to any monotonic principles (e.g., l-diversity, t-closeness, etc.), with its superiority (in correlation preservation) especially obvious when tight privacy control must be enforced. We show that ANGEL lends itself elegantly to the hard problem of marginal publication. In particular, unlike generalization that can release only restricted marginals, our technique can be easily used to publish any marginals with strong privacy guarantees.
Butterfly: Privacy Preserving Publishing on Multiple Quasi-Identifiers
, 2009
"... Recently, privacy preserving data publishing has attracted significant interest in research. Most of the existing studies focus on only the situations where the data in question is published using one quasi-identifier. However, in a few important applications, a practical demand is to publish a data ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
Recently, privacy preserving data publishing has attracted significant interest in research. Most of the existing studies focus on only the situations where the data in question is published using one quasi-identifier. However, in a few important applications, a practical demand is to publish a data set on multiple quasi-identifiers for multiple users simultaneously, which poses several challenges. How can we generate one anonymized version of the data so that the privacy preservation requirement like k-anonymity is satisfied for all users? Moreover, how can we reduce the information loss as much as possible while the privacy preservation requirements are met? In this paper, we identify and tackle the novel problem of privacy preserving publishing on multiple quasi-identifiers. A naïve solution of respectively publishing multiple versions for different quasi-identifiers unfortunately suffers from the possibility that those releases can be joined to intrude the privacy. Interestingly, we show that it is possible to generate only one anonymized table to satisfy the k-anonymity on all quasi-identifiers for all users without significant information loss. We systematically develop an effective method for privacy preserving publishing for multiple users, and report an empirical study using real data to verify the feasibility and the effectiveness of our method.
Secure Mining of Association Rules in Horizontally Distributed Databases
- Sateesh Reddy et al, / (IJCSIT) International Journal of Computer Science and Information Technologies
"... ar ..."
(Show Context)