• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

A secure distributed framework for achieving k-anonymity,” (2006)

by W Jiang, C Clifton
Venue:VLDB J.,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 39
Next 10 →

Privacy-Preserving Data Publishing: A Survey on Recent Developments

by Benjamin C. M. Fung, Ke Wang, Rui Chen, Philip S. Yu
"... The collection of digital information by governments, corporations, and individuals has created tremendous opportunities for knowledge- and information-based decision making. Driven by mutual benefits, or by regulations that require certain data to be published, there is a demand for the exchange an ..."
Abstract - Cited by 219 (16 self) - Add to MetaCart
The collection of digital information by governments, corporations, and individuals has created tremendous opportunities for knowledge- and information-based decision making. Driven by mutual benefits, or by regulations that require certain data to be published, there is a demand for the exchange and publication of data among various parties. Data in its original form, however, typically contains sensitive information about individuals, and publishing such data will violate individual privacy. The current practice in data publishing relies mainly on policies and guidelines as to what types of data can be published, and agreements on the use of published data. This approach alone may lead to excessive data distortion or insufficient protection. Privacy-preserving data publishing (PPDP) provides methods and tools for publishing useful information while preserving data privacy. Recently, PPDP has received considerable attention in research communities, and many approaches have been proposed for different data publishing scenarios. In this survey, we will systematically summarize and evaluate different approaches to PPDP, study the challenges in practical data publishing, clarify the differences and requirements that distinguish PPDP from other related problems, and propose future research directions.

Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

by Jiexing Li, Yufei Tao, Xiaokui Xiao , 2008
"... We identify proximity breach as a privacy threat specific to numerical sensitive attributes in anonymized data publication. Such breach occurs when an adversary concludes with high confidence that the sensitive value of a victim individual must fall in a short interval — even though the adversary ma ..."
Abstract - Cited by 21 (0 self) - Add to MetaCart
We identify proximity breach as a privacy threat specific to numerical sensitive attributes in anonymized data publication. Such breach occurs when an adversary concludes with high confidence that the sensitive value of a victim individual must fall in a short interval — even though the adversary may have low confidence about the victim’s actual value. None of the existing anonymization principles (e.g., kanonymity, l-diversity, etc.) can effectively prevent proximity breach. We remedy the problem by introducing a novel principle called (ε, m)-anonymity. Intuitively, the principle demands that, given a QI-group G, for every sensitive value x in G, at most 1/m of the tuples in G can have sensitive values “similar” to x, where the similarity is controlled by ε. We provide a careful analytical study of the theoretical characteristics of (ε, m)-anonymity, and the corresponding generalization algorithm. Our findings are verified by experiments with real data.
(Show Context)

Citation Context

.... Finally, besides data publication, anonymity issues arise in many other environments. Some examples include anonymized surveying [5, 15], statistical databases [10, 14, 30], cryptographic computing =-=[21, 34, 37]-=-, access control [4, 8, 9], and so on. 8. CONCLUSIONS Although proximity breach is a natural privacy threat to numerical sensitive data, it has not received dedicated attention in the literature. We e...

Privacy-preserving data mashup

by Noman Mohammed, Benjamin C. M. Fung, Ke Wang, Patrick C. K. Hung - In Proc. of the 12th International Conference on Extending Database Technology (EDBT). Saint-Petersburg , 2009
"... Mashup is a web technology that combines information from more than one source into a single web application. This technique provides a new platform for different data providers to flexibly integrate their expertise and deliver highly customizable services to their customers. Nonetheless, combining ..."
Abstract - Cited by 16 (9 self) - Add to MetaCart
Mashup is a web technology that combines information from more than one source into a single web application. This technique provides a new platform for different data providers to flexibly integrate their expertise and deliver highly customizable services to their customers. Nonetheless, combining data from different sources could potentially reveal person-specific sensitive information. In this paper, we study and resolve a real-life privacy problem in a data mashup application for the financial industry in Sweden, and propose a privacy-preserving data mashup (PPMashup) algorithm to securely integrate private data from different data providers, whereas the integrated data still retains the essential information for supporting general data exploration or a specific data mining task, such as classification analysis. Experiments on real-life data suggest that our proposed method is effective for simultaneously preserving both privacy and information usefulness, and is scalable for handling large volume of data. 1.
(Show Context)

Citation Context

... data integration is not an issue. In the case of multiple private databases, joining all private databases and applying a single table method would violate the privacy requirement. Jiang and Clifton =-=[20, 21]-=- proposed a cryptographic approach to securely integrate two distributed data tables to a k-anonymous table without considering a data mining task. First, each party determines a locally k-anonymous t...

Distributed Anonymization: Achieving Privacy for Both Data Subjects and Data Providers

by Pawel Jurczyk, Li Xiong
"... Abstract. There is an increasing need for sharing data repositories containing personal information across multiple distributed and private databases. However, such data sharing is subject to constraints imposed by privacy of individuals or data subjects as well as data confidentiality of institutio ..."
Abstract - Cited by 13 (6 self) - Add to MetaCart
Abstract. There is an increasing need for sharing data repositories containing personal information across multiple distributed and private databases. However, such data sharing is subject to constraints imposed by privacy of individuals or data subjects as well as data confidentiality of institutions or data providers. Concretely, given a query spanning multiple databases, query results should not contain individually identifiable information. In addition, institutions should not reveal their databases to each other apart from the query results. In this paper, we develop a set of decentralized protocols that enable data sharing for horizontally partitioned databases given these constraints. Our approach includes a new notion, l-site-diversity, for data anonymization to ensure anonymity of data providers in addition to that of data subjects, and a distributed anonymization protocol that allows independent data providers to build a virtual anonymized database while maintaining both privacy constraints. 1
(Show Context)

Citation Context

...the k-anonymity and l-diversity principles and the greedy topdown Mondrian multidimensional k-anonymization algorithm [4]. There are some works focused on data anonymization of distributed databases. =-=[5]-=- presented a two-party framework along with an application that generates k-anonymous data from two vertically partitioned sources without disclosing data from one site to the other. [6] proposed prov...

m-Privacy for Collaborative Data Publishing

by Slawomir Goryczka, Li Xiong, Benjamin C. M. Fung
"... Abstract—In this paper, we consider the collaborative data publishing problem for anonymizing horizontally partitioned data at multiple data providers. We consider a new type of “insider attack ” by colluding data providers who may use their own data records (a subset of the overall data) in additio ..."
Abstract - Cited by 9 (4 self) - Add to MetaCart
Abstract—In this paper, we consider the collaborative data publishing problem for anonymizing horizontally partitioned data at multiple data providers. We consider a new type of “insider attack ” by colluding data providers who may use their own data records (a subset of the overall data) in addition to the external background knowledge to infer the data records contributed by other data providers. The paper addresses this new threat and makes several contributions. First, we introduce the notion of m-privacy, which guarantees that the anonymized data satisfies a given privacy constraint against any group of up to m colluding data providers. Second, we present heuristic algorithms exploiting the equivalence group monotonicity of privacy constraints and adaptive ordering techniques for efficiently checking m-privacy given a set of records. Finally, we present a data provider-aware anonymization algorithm with adaptive m-privacy checking strategies to ensure high utility and m-privacy of anonymized data with efficiency. Experiments on real-life datasets suggest that our approach achieves better or comparable utility and efficiency than existing and baseline algorithms while providing m-privacy guarantee. I.
(Show Context)

Citation Context

...r to anonymize the data independently (anonymize-and-aggregate, Figure 1A), which results in potential loss of integrated data utility. A more desirable approach is collaborative data publishing [5], =-=[6]-=-, [2], [4], which anonymizes data from all providers as if they would come from one source (aggregateand-anonymize, Figure 1B), using either a trusted third-party (TTP) or Secure Multi-party Computati...

Reasoning about the Appropriate Use of Private Data through Computational Workflows

by Yolanda Gil, Christian Fritz , 2010
"... While there is a plethora of mechanisms to ensure lawful access to privacy-protected data, additional research is required in order to reassure individuals that their personal data is being used for the purpose that they consented to. This is particularly important in the context of new data mining ..."
Abstract - Cited by 9 (0 self) - Add to MetaCart
While there is a plethora of mechanisms to ensure lawful access to privacy-protected data, additional research is required in order to reassure individuals that their personal data is being used for the purpose that they consented to. This is particularly important in the context of new data mining approaches, as used, for instance, in biomedical research and commercial data mining. We argue for the use of computational workflows to ensure and enforce appropriate use of sensitive personal data. Computational workflows describe in a declarative manner the data processing steps and the expected results of complex data analysis processes such as data mining (Gil et al. 2007b; Taylor et al. 2006). We see workflows as an artifact that captures, among other things, how data is being used and for what purpose. Existing frameworks for computational workflows need to be extended to incorporate privacy policies that can govern the use of data.

The Hardness and Approximation Algorithms for L-Diversity

by Xiaokui Xiao, Ke Yi, Yufei Tao
"... The existing solutions to privacy preserving publication can be classified into the theoretical and heuristic categories. The former guarantees provably low information loss, whereas the latter incurs gigantic loss in the worst case, but is shown empirically to perform well on many real inputs. Whil ..."
Abstract - Cited by 8 (1 self) - Add to MetaCart
The existing solutions to privacy preserving publication can be classified into the theoretical and heuristic categories. The former guarantees provably low information loss, whereas the latter incurs gigantic loss in the worst case, but is shown empirically to perform well on many real inputs. While numerous heuristic algorithms have been developed to satisfy advanced privacy principles such as l-diversity, t-closeness, etc., the theoretical category is currently limited to k-anonymity which is the earliest principle known to have severe vulnerability to privacy attacks. Motivated by this, we present the first theoretical study on l-diversity, a popular principle that is widely adopted in the literature. First, we show that optimal l-diverse generalization is NP-hard even when there are only 3 distinct sensitive values in the microdata. Then, an (l · d)approximation algorithm is developed, where d is the dimensionality of the underlying dataset. This is the first known algorithm with a non-trivial bound on information loss. Extensive experiments with real datasets validate the effectiveness and efficiency of proposed solution. 1.
(Show Context)

Citation Context

...iscussion focuses on data publication, whereas anonymity issues arise in many other environments. Some examples include anonymized surveying [6,14], statistical databases [9], cryptographic computing =-=[21]-=-, access control [8], and so on. 3. PROBLEM DEFINITIONS Let T be the raw microdata table, which has d quasi-identifier (QI) attributes A1, ..., Ad, and a sensitive attribute (SA) B. Here, d is the dim...

ANGEL: Enhancing the utility of generalization for privacy preserving publication

by Yufei Tao, et al.
"... Generalization is a well-known method for privacy preserving data publication. Despite its vast popularity, it has several drawbacks such as heavy information loss, difficulty of supporting marginal publication, and so on. To overcome these drawbacks, we develop ANGEL 1, a new anonymization techniqu ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
Generalization is a well-known method for privacy preserving data publication. Despite its vast popularity, it has several drawbacks such as heavy information loss, difficulty of supporting marginal publication, and so on. To overcome these drawbacks, we develop ANGEL 1, a new anonymization technique that is as effective as generalization in privacy protection, but is able to retain significantly more information in the microdata. ANGEL is applicable to any monotonic principles (e.g., l-diversity, t-closeness, etc.), with its superiority (in correlation preservation) especially obvious when tight privacy control must be enforced. We show that ANGEL lends itself elegantly to the hard problem of marginal publication. In particular, unlike generalization that can release only restricted marginals, our technique can be easily used to publish any marginals with strong privacy guarantees.
(Show Context)

Citation Context

.... Finally, besides data publication, anonymity issues arise in many other environments. Some examples include anonymized surveying [6, 14], statistical databases [10, 13, 28], cryptographic computing =-=[19, 32, 36]-=-, access control [5, 8, 9], and so on. 8 Conclusions This paper proposes angelization as a new anonymization technique for privacy preserving publication, which is applicable to any monotonic anonymiz...

Butterfly: Privacy Preserving Publishing on Multiple Quasi-Identifiers

by Jian Pei, Yufei Tao, Jiexing Li, Xiaokui Xiao , 2009
"... Recently, privacy preserving data publishing has attracted significant interest in research. Most of the existing studies focus on only the situations where the data in question is published using one quasi-identifier. However, in a few important applications, a practical demand is to publish a data ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
Recently, privacy preserving data publishing has attracted significant interest in research. Most of the existing studies focus on only the situations where the data in question is published using one quasi-identifier. However, in a few important applications, a practical demand is to publish a data set on multiple quasi-identifiers for multiple users simultaneously, which poses several challenges. How can we generate one anonymized version of the data so that the privacy preservation requirement like k-anonymity is satisfied for all users? Moreover, how can we reduce the information loss as much as possible while the privacy preservation requirements are met? In this paper, we identify and tackle the novel problem of privacy preserving publishing on multiple quasi-identifiers. A naïve solution of respectively publishing multiple versions for different quasi-identifiers unfortunately suffers from the possibility that those releases can be joined to intrude the privacy. Interestingly, we show that it is possible to generate only one anonymized table to satisfy the k-anonymity on all quasi-identifiers for all users without significant information loss. We systematically develop an effective method for privacy preserving publishing for multiple users, and report an empirical study using real data to verify the feasibility and the effectiveness of our method.
(Show Context)

Citation Context

...ymous table. Finally, it is worth mentioning that data anonymity is a general concern in several other applications as well, including association rule hiding [2, 13, 40, 42], multi-party computation =-=[21, 34, 39]-=-, privacy-aware query processing [11, 15, 30], and access control [6, 9, 10]. Summary. Despite the bulk of literature on privacy preservation, we are not aware of any work on generalization in the pre...

Secure Mining of Association Rules in Horizontally Distributed Databases

by Tamir Tassa - Sateesh Reddy et al, / (IJCSIT) International Journal of Computer Science and Information Technologies
"... ar ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...conclude the paper in Section 6. Like in [12] we assume that the players are semi-honest; namely, they follow the protocol but try to extract as much information as possible from their own view. (See =-=[11,18,23]-=- for a discussion and justification of that assumption.) We too, like [12], assume that M > 2. (The case M = 2 is discussed in [12, Section 5]; the conclusion is that the problem of secure computation...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University