## Achieving Anonymity via Clustering (2006)

### Cached

### Download Links

Venue: | In PODS |

Citations: | 83 - 2 self |

### BibTeX

@INPROCEEDINGS{Aggarwal06achievinganonymity,

author = {Gagan Aggarwal and Samir Khuller and Tomás Feder},

title = {Achieving Anonymity via Clustering},

booktitle = {In PODS},

year = {2006},

pages = {153--162}

}

### Years of Citing Articles

### OpenURL

### Abstract

Publishing data for analysis from a table containing personal records, while maintaining individual privacy, is a problem of increasing importance today. The traditional approach of de-identifying records is to remove identifying fields such as social security number, name etc. However, recent research has shown that a large fraction of the US population can be identified using non-key attributes (called quasi-identifiers) such as date of birth, gender, and zip code [15]. Sweeney [16] proposed the k-anonymity model for privacy where non-key attributes that leak information are suppressed or generalized so that, for every record in the modified table, there are at least k−1 other records having exactly the same values for quasi-identifiers. We propose a new method for anonymizing data records, where quasi-identifiers of data records are first clustered and then cluster centers are published. To ensure privacy of the data records, we impose the constraint

### Citations

11502 |
Computers and Intractability, A Guide to the Theory of NPCompleteness
- Garey, Johnson
- 1979
(Show Context)
Citation Context ...on for NP-completeness and hardness proofs. 2.1 Lower Bound We show that this problem is NP -complete by a reduction from the 3-Satisfiability problem, where each literal belongs to at most 3 clauses =-=[6]-=-. Suppose that we have a boolean formula F in 3-CNF form with m clauses and n variables. Let F = C1 ∧ . . . ∧ Cm, be a formula composed of variables xi, i = 1 . . . n and their complements xi. From th... |

887 |
k-anonymity: a model for protecting privacy
- Sweeney
(Show Context)
Citation Context ...r, recent research has shown that a large fraction of the US population can be identified using non-key attributes (called quasi-identifiers) such as date of birth, gender, and zip code [15]. Sweeney =-=[16]-=- proposed the k-anonymity model for privacy where non-key attributes that leak information are suppressed or generalized so that, for every record in the modified table, there are at least k−1 other r... |

484 | L-diversity: Privacy beyond k-anonymity
- Machanavajjhala, Kifer, et al.
- 2007
(Show Context)
Citation Context ...y with this generalization. We note that, as in k-Anonymity, the objective function is oblivious to the sensitive attribute labels. Extensions to the k-Anonymity model, like the notion of l-diversity =-=[13]-=-, can be applied independently to our clustering formulation. We provide constant-factor approximation algorithms for both the r-Gather and r-Cellular Clustering problems. In particular, we first show... |

326 | Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and Lagrangian relaxation
- Jain, Vazirani
(Show Context)
Citation Context ... 3 = 624. As this cellular clustering objective could be relevant even in contexts other than anonymity, we study a slightly different version of the problem: similar to the Facility Location problem =-=[9]-=-, we add an additional setup cost for each potential cluster center, associated with opening a cluster centered at that point, but we don’t have the lower bound on number of points per cluster. We cal... |

280 | Data privacy through optimal kanonymization
- Bayardo, Agrawal
(Show Context)
Citation Context ...o provide a k-anonymized version of the table with the minimum amount of suppression or generalization of the table entries. There has been a lot of recent work on kanonymizing a given database table =-=[3, 12]-=-. An O(k log k) approximation algorithm was first proposed for the problem of k-Anonymity with suppressions only [14]. This was recently improved to an O(k) approximation for the general version of th... |

244 | Incognito: Efficient full-domain k-anonymity
- LeFevre, DeWitt, et al.
(Show Context)
Citation Context ...o provide a k-anonymized version of the table with the minimum amount of suppression or generalization of the table entries. There has been a lot of recent work on kanonymizing a given database table =-=[3, 12]-=-. An O(k log k) approximation algorithm was first proposed for the problem of k-Anonymity with suppressions only [14]. This was recently improved to an O(k) approximation for the general version of th... |

227 |
On the complexity of optimal kanonymity
- Meyerson, Williams
(Show Context)
Citation Context ...ies. There has been a lot of recent work on kanonymizing a given database table [3, 12]. An O(k log k) approximation algorithm was first proposed for the problem of k-Anonymity with suppressions only =-=[14]-=-. This was recently improved to an O(k) approximation for the general version of the problem [1]. In this paper, instead of generalization and suppression, we propose a new technique for anonymizing t... |

99 | Toward privacy in public databases
- Chawla, Dwork, et al.
- 2005
(Show Context)
Citation Context ... define a metric space (i.e., pairwise distances satisfying the triangle inequality) over the database records, which are then viewed as points in this space. This is similar to the approach taken in =-=[5]-=-, except that we do not restrict ourselves to points in R d ; instead, we allow our points to be in an arbitrary metric space. We then cluster the points and publish only the final cluster centers alo... |

77 | Building steiner trees with incomplete global knowledge
- Karger, Minkoff
(Show Context)
Citation Context ...at least r points. The objective is to minimize the maximum radius among the clusters. We note that the minimum cluster size constraint has been considered earlier in the context of facility location =-=[10]-=-. We first show the reduction for NP-completeness and hardness proofs. 2.1 Lower Bound We show that this problem is NP -complete by a reduction from the 3-Satisfiability problem, where each literal be... |

69 | How to allocate network centers
- Bar-Ilan, Kortsarz, et al.
- 1993
(Show Context)
Citation Context ... as an additional constraint to the original k-Center formulation, an upper bound on the cluster size is specified. This is called the Capatitated k-Center problem [11]. Bar-Ilan, Kortsarz, and Peleg =-=[2]-=- gave the first constant approximation factor of 10 for this problem. The bound was improved subsequently to 5 by Khuller and Sussmann [11]. In this subsection though we only concentrate on the (k, r)... |

67 | Hierarchical placement and network design problems
- Guha, Meyerson, et al.
- 2000
(Show Context)
Citation Context ...luster size r and solution cost C, bi-criteria approximation for the facility location problem of (r/2, 5.184OP Tr) was achieved independently by Guha, Meyerson and Munagala and by Karger and Minkoff =-=[7, 10]-=-. It is not known whether it is possible to achieve a one-sided approximation on facility location cost alone. In contrast, for the r-Cellular Clustering problem, we provide an one-sided approximation... |

56 |
Protecting respondent’s privacy in microdata release
- Samarati
(Show Context)
Citation Context ...ibutes that leak information are suppressed or generalized so that, for every record in the modified table, there are at least k − 1 other records having exactly the same values for quasi-identifiers =-=[15, 17]-=-. We propose a new method for anonymizing data records, where quasi-identifiers of data records are first clustered and then cluster centers are published. To ensure privacy of the data records, we im... |

50 | Approximation algorithms for k-anonymity
- Aggarwal, Feder, et al.
- 2005
(Show Context)
Citation Context ...og k) approximation algorithm was first proposed for the problem of k-Anonymity with suppressions only [14]. This was recently improved to an O(k) approximation for the general version of the problem =-=[1]-=-. In this paper, instead of generalization and suppression, we propose a new technique for anonymizing tables before their release. We first use the quasi-identifying attributes to define a metric spa... |

37 |
A best possible approximation algorithm for the k-center problem
- HOCHBAUM, SHMOYS
- 1985
(Show Context)
Citation Context ...thm applied to the quasi-identifiers, shown as points in a metric space in Figure 2(a). Our formulation of the rGather problem is related to, but not to be confused with, the classic k-Center problem =-=[8]-=-. The k-Center problem has the same objective of minimizing the maximum radius among the clusters, however, the constraint is that we can have no more than k clusters in total. The r-Gather problem is... |

34 | The capacitated k-center problem
- Khuller, Sussmann
- 1996
(Show Context)
Citation Context ... a lower bound r on the cluster size as an additional constraint to the original k-Center formulation, an upper bound on the cluster size is specified. This is called the Capatitated k-Center problem =-=[11]-=-. Bar-Ilan, Kortsarz, and Peleg [2] gave the first constant approximation factor of 10 for this problem. The bound was improved subsequently to 5 by Khuller and Sussmann [11]. In this subsection thoug... |

16 |
Computers and Intractability: A Guide to the Theory of NP
- Garey, Johnson
- 1979
(Show Context)
Citation Context ...r NP -completeness and hardness proofs. 52.1. Lower Bound. We show that this problem is NP -complete by a reduction from the 3-Satisfiability problem, where each literal belongs to at most 3 clauses =-=[6]-=-. Suppose that we have a boolean formula F in 3-CNF form with m clauses and n variables. Let F = C1 ∧ . . . ∧ Cm, be a formula composed of variables xi, i = 1 . . . n and their complements xi. From th... |

14 |
Uniqueness of simple demographics in the U.S. population. LIDAP-WP4
- Sweeney
- 2000
(Show Context)
Citation Context ...me etc. However, recent research has shown that a large fraction of the US population can be identified using non-key attributes (called quasi-identifiers) such as date of birth, gender, and zip code =-=[15]-=-. Sweeney [16] proposed the k-anonymity model for privacy where non-key attributes that leak information are suppressed or generalized so that, for every record in the modified table, there are at lea... |

12 |
Facility location with outliers
- Charikar, Khuller, et al.
- 2001
(Show Context)
Citation Context ... relaxation whereby the clustering solution is allowed to leave an ɛ fraction of the points unclustered, i.e., to delete an ɛ fraction of points from the published k-anonymized table. Charikar et al. =-=[4]-=- studied various facility location problems with this relaxation and gave constant-factor approximation algorithms for them. For the (r, ɛ)-Gather problem, where each cluster is constrained to have at... |

12 |
The Death of Privacy
- Time
- 1997
(Show Context)
Citation Context ...can be integrated and analyzed digitally, leading to an increased use of data-mining tools to infer trends and patterns. This has raised universal concerns about protecting the privacy of individuals =-=[17]-=-. Combining data tables from multiple data sources allows us to draw inferences which are not possible from a single source. For example, combining patient data from multiple hospitals is useful to pr... |