## Data privacy through optimal k-anonymization (2005)

### Cached

### Download Links

Venue: | In ICDE |

Citations: | 280 - 3 self |

### BibTeX

@INPROCEEDINGS{Bayardo05dataprivacy,

author = {Roberto J. Bayardo},

title = {Data privacy through optimal k-anonymization},

booktitle = {In ICDE},

year = {2005},

pages = {217--228}

}

### Years of Citing Articles

### OpenURL

### Abstract

Data de-identification reconciles the demand for release of data for research purposes and the demand for privacy from individuals. This paper proposes and evaluates an optimization algorithm for the powerful de-identification procedure known as k-anonymization. A k-anonymized dataset has the property that each record is indistinguishable from at least k – 1 others. Even simple restrictions of optimized k-anonymity are NP-hard, leading to significant computational challenges. We present a new approach to exploring the space of possible anonymizations that tames the combinatorics of the problem, and develop data-management strategies to reduce reliance on expensive operations such as sorting. Through experiments on real census data, we show the resulting algorithm can find optimal k-anonymizations under two representative cost measures and a wide range of k. We also show that the algorithm can produce good anonymizations in circumstances where the input data or input parameters preclude finding an optimal solution in reasonable time. Finally, we use the algorithm to explore the effects of different coding approaches and problem variations on anonymization quality and performance. To our knowledge, this is the first result demonstrating optimal k-anonymization of a nontrivial dataset under a general model of the problem. 1.

### Citations

883 |
K-anonymity: A model for protecting privacy
- Sweeney
- 2002
(Show Context)
Citation Context ...he input dataset as little as is necessary to achieve k -anonymity, where “as little as is necessary” is typically quantified by a given cost metric. Several different cost metrics have been proposed =-=[5,6,10,14]-=-, though most aim in one way or another to minimize the amount of information loss resulting from the generalization and suppression operations that are applied to produce the transformed dataset. The... |

703 |
Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning
- Fayyad, Irani
- 1993
(Show Context)
Citation Context ...g Linux OS (kernel version 2.4.20) and gcc 100 1 Though we used this simple equidistant pre-partitioning strategy for all experiments, a domain discretization approach that considers the class column =-=[4]-=- might be a better approach when using class-conscious metrics such as the classification metric. 50 k 25 coarse,sup_limit=0 fine,sup_limit=0 coarse,sup_limit=100 fine,sup_limit=100 coarse,sup_limit=i... |

364 | Protecting respondents’ identities in microdata release
- Samarati
- 2001
(Show Context)
Citation Context ...he input dataset as little as is necessary to achieve k -anonymity, where “as little as is necessary” is typically quantified by a given cost metric. Several different cost metrics have been proposed =-=[5,6,10,14]-=-, though most aim in one way or another to minimize the amount of information loss resulting from the generalization and suppression operations that are applied to produce the transformed dataset. The... |

320 | Achieving k-anonymity privacy protection using generalization and suppression
- Sweeney
- 2002
(Show Context)
Citation Context ...thms proposed in the literature are suitable only for input datasets with trivially small domains. 162 k 2. Related Work While there are several k -anonymization algorithm proposals in the literature =-=[5,6,8,10,12,13,16]-=-, only a few are suitable for use in practice. Iyengar [5] shows how to attack a very flexible (and highly combinatorial) formulation of k -anonymity using a genetic algorithm. The algorithm may run f... |

227 |
On the complexity of optimal kanonymity
- Meyerson, Williams
(Show Context)
Citation Context ...ll records within a k -anonymized dataset remain truthful. De-identifying data through common formulations of k -anonymity is unfortunately NP-hard if one wishes to guarantee an optimal anonymization =-=[8]-=-. Algorithms that are suitable for use in practice typically employ greedy methods [6,13] or incomplete stochastic search [5,16], and do not provide any guarantees on the quality of the resulting anon... |

198 | Transforming data to satisfy privacy constraints
- Iyengar
- 2002
(Show Context)
Citation Context ... unfortunately NP-hard if one wishes to guarantee an optimal anonymization [8]. Algorithms that are suitable for use in practice typically employ greedy methods [6,13] or incomplete stochastic search =-=[5,16]-=-, and do not provide any guarantees on the quality of the resulting anonymization. We propose a practical method for determining an optimal k -anonymization of a given dataset. An optimal anonymizatio... |

158 | Constraintbased rule mining in large, dense databases
- Bayardo, Agrawal, et al.
(Show Context)
Citation Context ...cond, our algorithm uses a tree-search strategy exploiting both cost-based pruning and dynamic search rearrangement. These techniques have proven successful in datamining and machine learning domains =-=[3,9,15]-=-, but to our knowledge have not been applied to the problem of k -anonymization. Third, we propose novel data-management strategies to reduce the cost of evaluating a given anonymization. Computing th... |

145 | Generalizing data to provide anonymity when disclosing information - Samarati, Sweeney - 1998 |

116 |
Search Through Systematic Set Enumeration
- Rymon
- 1992
(Show Context)
Citation Context ...cond, our algorithm uses a tree-search strategy exploiting both cost-based pruning and dynamic search rearrangement. These techniques have proven successful in datamining and machine learning domains =-=[3,9,15]-=-, but to our knowledge have not been applied to the problem of k -anonymization. Third, we propose novel data-management strategies to reduce the cost of evaluating a given anonymization. Computing th... |

87 | A condensation approach to privacy preserving data mining
- Aggarwal, Yu
- 2004
(Show Context)
Citation Context ... involves replacing specific values such as a phone number with a more general one, such as the area code alone. Unlike the outcome of other disclosure protection techniques that involve condensation =-=[1]-=-, data scrambling and swapping [6,7], or adding noise [2], all records within a k -anonymized dataset remain truthful. De-identifying data through common formulations of k -anonymity is unfortunately ... |

82 | OPUS: An efficient admissible algorithm for unordered search
- Webb
- 1995
(Show Context)
Citation Context ...cond, our algorithm uses a tree-search strategy exploiting both cost-based pruning and dynamic search rearrangement. These techniques have proven successful in datamining and machine learning domains =-=[3,9,15]-=-, but to our knowledge have not been applied to the problem of k -anonymization. Third, we propose novel data-management strategies to reduce the cost of evaluating a given anonymization. Computing th... |

25 |
Datafly: A system for providing anonymity in medical data
- Sweeney
- 1998
(Show Context)
Citation Context ... common formulations of k -anonymity is unfortunately NP-hard if one wishes to guarantee an optimal anonymization [8]. Algorithms that are suitable for use in practice typically employ greedy methods =-=[6,13]-=- or incomplete stochastic search [5,16], and do not provide any guarantees on the quality of the resulting anonymization. We propose a practical method for determining an optimal k -anonymization of a... |

23 | Masking microdata files
- Kim, Winkler
- 1995
(Show Context)
Citation Context ...s such as a phone number with a more general one, such as the area code alone. Unlike the outcome of other disclosure protection techniques that involve condensation [1], data scrambling and swapping =-=[6,7]-=-, or adding noise [2], all records within a k -anonymized dataset remain truthful. De-identifying data through common formulations of k -anonymity is unfortunately NP-hard if one wishes to guarantee a... |

16 |
Using simulated annealing for k-anonymity
- Winkler
- 2002
(Show Context)
Citation Context ... unfortunately NP-hard if one wishes to guarantee an optimal anonymization [8]. Algorithms that are suitable for use in practice typically employ greedy methods [6,13] or incomplete stochastic search =-=[5,16]-=-, and do not provide any guarantees on the quality of the resulting anonymization. We propose a practical method for determining an optimal k -anonymization of a given dataset. An optimal anonymizatio... |

10 | and Tau Argus: Software for Statistical Disclosure Control - Hundepool, Willenborg - 1996 |

4 |
Privacy preserving datamining
- Agrawal, Srikant
(Show Context)
Citation Context ...r with a more general one, such as the area code alone. Unlike the outcome of other disclosure protection techniques that involve condensation [1], data scrambling and swapping [6,7], or adding noise =-=[2]-=-, all records within a k -anonymized dataset remain truthful. De-identifying data through common formulations of k -anonymity is unfortunately NP-hard if one wishes to guarantee an optimal anonymizati... |