Results 1 - 10
of
17
Data privacy through optimal k-anonymization
- In ICDE
, 2005
"... Data de-identification reconciles the demand for release of data for research purposes and the demand for privacy from individuals. This paper proposes and evaluates an optimization algorithm for the powerful de-identification procedure known as k-anonymization. A k-anonymized dataset has the proper ..."
Abstract
-
Cited by 197 (3 self)
- Add to MetaCart
Data de-identification reconciles the demand for release of data for research purposes and the demand for privacy from individuals. This paper proposes and evaluates an optimization algorithm for the powerful de-identification procedure known as k-anonymization. A k-anonymized dataset has the property that each record is indistinguishable from at least k – 1 others. Even simple restrictions of optimized k-anonymity are NP-hard, leading to significant computational challenges. We present a new approach to exploring the space of possible anonymizations that tames the combinatorics of the problem, and develop data-management strategies to reduce reliance on expensive operations such as sorting. Through experiments on real census data, we show the resulting algorithm can find optimal k-anonymizations under two representative cost measures and a wide range of k. We also show that the algorithm can produce good anonymizations in circumstances where the input data or input parameters preclude finding an optimal solution in reasonable time. Finally, we use the algorithm to explore the effects of different coding approaches and problem variations on anonymization quality and performance. To our knowledge, this is the first result demonstrating optimal k-anonymization of a nontrivial dataset under a general model of the problem. 1.
Transforming Data to Satisfy Privacy Constraints
, 2002
"... Data on individuals and entities are being collected widely. These data can contain information that explicitly identifies the individual (e.g., social security number). Data can also contain other kinds of personal information (e.g., date of birth, zip code, gender) that are potentially identifying ..."
Abstract
-
Cited by 145 (0 self)
- Add to MetaCart
Data on individuals and entities are being collected widely. These data can contain information that explicitly identifies the individual (e.g., social security number). Data can also contain other kinds of personal information (e.g., date of birth, zip code, gender) that are potentially identifying when linked with other available data sets. Data are often shared for business or legal reasons. This paper addresses the important issue of preserving the anonymity of the individuals or entities during the data dissemination process. We explore preserving the anonymity by the use of generalizations and suppressions on the potentially identifying portions of the data. We extend earlier works in this area along various dimensions. First, satisfying privacy constraints is considered in conjunction with the usage for the data being disseminated. This allows us to optimize the process of preserving privacy for the specified usage. In particular, we investigate the privacy transformation in the context of data mining applications like building classification and regression models. Second, our work improves on previous approaches by allowing more flexible generalizations for the data. Lastly, this is combined with a more thorough exploration of the solution space using the genetic algorithm framework. These extensions allow us to transform the data so that they are more useful for their intended purpose while satisfying the privacy constraints.
Top-down specialization for information and privacy preservation
- in Proc. of the 21st IEEE ICDE
, 2005
"... Releasing person-specific data in its most specific state poses a threat to individual privacy. This paper presents a practical and efficient algorithm for determining a generalized version of data that masks sensitive information and remains useful for modelling classification. The generalization o ..."
Abstract
-
Cited by 101 (14 self)
- Add to MetaCart
Releasing person-specific data in its most specific state poses a threat to individual privacy. This paper presents a practical and efficient algorithm for determining a generalized version of data that masks sensitive information and remains useful for modelling classification. The generalization of data is implemented by specializing or detailing the level of information in a top-down manner until a minimum privacy requirement is violated. This top-down specialization is natural and efficient for handling both categorical and continuous attributes. Our approach exploits the fact that data usually contains redundant structures for classification. While generalization may eliminate some structures, other structures emerge to help. Our results show that quality of classification can be preserved even for highly restrictive privacy requirements. This work has great applicability to both public and private sectors that share information for mutual benefits and productivity. 1.
Privacy-Preserving Data Publishing: A Survey on Recent Developments
"... The collection of digital information by governments, corporations, and individuals has created tremendous opportunities for knowledge- and information-based decision making. Driven by mutual benefits, or by regulations that require certain data to be published, there is a demand for the exchange an ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
The collection of digital information by governments, corporations, and individuals has created tremendous opportunities for knowledge- and information-based decision making. Driven by mutual benefits, or by regulations that require certain data to be published, there is a demand for the exchange and publication of data among various parties. Data in its original form, however, typically contains sensitive information about individuals, and publishing such data will violate individual privacy. The current practice in data publishing relies mainly on policies and guidelines as to what types of data can be published, and agreements on the use of published data. This approach alone may lead to excessive data distortion or insufficient protection. Privacy-preserving data publishing (PPDP) provides methods and tools for publishing useful information while preserving data privacy. Recently, PPDP has received considerable attention in research communities, and many approaches have been proposed for different data publishing scenarios. In this survey, we will systematically summarize and evaluate different approaches to PPDP, study the challenges in practical data publishing, clarify the differences and requirements that distinguish PPDP from other related problems, and propose future research directions.
Anonymizing classification data for privacy preservation
- IEEE Transactions on Knowledge and Data Engineering
, 2007
"... Abstract — Classification is a fundamental problem in data analysis. Training a classifier requires accessing a large collection of data. Releasing person-specific data, such as customer data or patient records, may pose a threat to individual’s privacy. Even after removing explicit identifying info ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
Abstract — Classification is a fundamental problem in data analysis. Training a classifier requires accessing a large collection of data. Releasing person-specific data, such as customer data or patient records, may pose a threat to individual’s privacy. Even after removing explicit identifying information such as Name and SSN, it is still possible to link released records back to their identities by matching some combination of non-identifying attributes such as {Sex, Zip, Birthdate}. A useful approach to combat such linking attacks, called k-anonymization [1], is anonymizing the linking attributes so that at least k released records match each value combination of the linking attributes. Previous work attempted to find an optimal k-anonymization that minimizes some data distortion metric. We argue that minimizing the distortion to the training data is not relevant to the classification goal that requires extracting the structure of predication on the “future ” data. In this paper, we propose a k-anonymization solution for classification. Our goal is to find a k-anonymization, not necessarily optimal in the sense of minimizing data distortion, that preserves the classification structure. We conducted intensive experiments to evaluate the impact of anonymization on the classification on future data. Experiments on real life data show that the quality of classification can be preserved even for highly restrictive anonymity requirements. Index Terms — Privacy protection, anonymity, security, integrity, data mining, classification, data sharing
Efficient Multi-Dimensional Suppression for K-Anonymity
"... Many applications that employ data mining techniques involve mining data that include private and sensitive information about the subjects. One way to enable effective data mining while preserving privacy is to anonymize the dataset that include private information about subjects before being releas ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Many applications that employ data mining techniques involve mining data that include private and sensitive information about the subjects. One way to enable effective data mining while preserving privacy is to anonymize the dataset that include private information about subjects before being released for data mining. One way to anonymize dataet is to manipulate its content so that the records adhere to k-anonymity. Two common manipulation techniques used to achieve k-anonymity of a dataset are generalization and suppression. Generalization refers to replacing a value with a less specific but semantically consistent value, while suppression refers to not releasing a value at all. Generalization is more commonly applied in this domain since suppression may dramatically reduce the quality of the data mining results if not properly used. However, generalization presents a major drawback as it requires a manually generated domain hierarchy taxonomy for every quasiidentifier in the dataset on which k-anonymity has to be performed. In this paper we propose a new method for achieving k-anonymity named K-anonymity of Classification Trees Using Suppression (kACTUS). In kACTUS efficient multi-dimensional suppression is performed, i.e., values are suppressed only on certain records depending on other attribute values, without the need for manually-produced domain hierarchy trees. Thus, in kACTUS we identify attributes that have less influence on the classification of the data records and we suppress them if needed in order to comly with k-anonymity. The kACTUS method was evaluated on ten separate datasets to evaluate its accuracy as compared to other k-anonymity generalization and suppressionbased methods. Encouraging results suggest that kACTUS' predictive performance is better than that of existing k-anonymity
1 Interval Approach to Preserving Privacy in Statistical Databases
"... Need for statistical databases. In many practical situations, it is very useful tocollect large amounts of data. For example, from the data that we collect during a census, we can extract alot of information about health, mortality, employment in different regions- for different age ranges, and for ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Need for statistical databases. In many practical situations, it is very useful tocollect large amounts of data. For example, from the data that we collect during a census, we can extract alot of information about health, mortality, employment in different regions- for different age ranges, and for people from different genders and of different ethnicgroups. By analyzing this statistics, we can reveal troubling spots and allocate (usually limited) resources so that the help goes first to social groups that needit most. Similarly, by gathering data about people's health in a large medicaldatabase, we can extract a lot of useful information on how the geographic location, age, and gender affect a person's health. Thus, we can make measures,which are aimed at improving public health, more focused. Finally, a large statistical database of purchases can help find out what peo-ple are looking for, make shopping easier for customers and at the same time, decrease the stores ' expenses related to storing unnecessary items. Need for privacy. Privacy is an important issue in the statistical analysis ofhuman-related data. For example, to check whether in a certain geographic area,
MPCS: Mobile-Phone Based Patient Compliance System for Chronic Illness Care
"... Abstract—More than 100 million Americans are currently living with at least one chronic health condition and expenditures on chronic diseases account for more than 75 percent of the $2.3 trillion cost of our healthcare system. To improve chronic illness care, patients must be empowered and engaged i ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—More than 100 million Americans are currently living with at least one chronic health condition and expenditures on chronic diseases account for more than 75 percent of the $2.3 trillion cost of our healthcare system. To improve chronic illness care, patients must be empowered and engaged in health self-management. However, only half of all patients with chronic illness comply with treatment regimen. The self-regulation model, while seemingly valuable, needs practical tools to help patients adopt this self-centered approach for long-term care. In this position paper, we propose Mobile-phone based Patient Compliance System (MPCS) that can reduce the time-consuming and error-prone processes of existing self-regulation practice to facilitate self-reporting, non-compliance detection, and compliance reminders. The novelty of this work is to apply socialbehavior theories to engineer the MPCS to positively influence patients ’ compliance behaviors, including mobile-delivered contextual reminders based on association theory; mobile-triggered questionnaires based on self-perception theory; and mobileenabled social interactions based on social-construction theory. We discuss the architecture and the research challenges to realize the proposed MPCS. I.
ABSTRACT Transforming Data to Satisfy Privacy Constraints
"... Data on individuals and entities are being collected widely. These data can contain information that explicitly identi-ties the individual (e.g., social security number). Data can also contain other kinds of personal information (e.g., date of birth, zip code, gender) that are potentially identifyin ..."
Abstract
- Add to MetaCart
Data on individuals and entities are being collected widely. These data can contain information that explicitly identi-ties the individual (e.g., social security number). Data can also contain other kinds of personal information (e.g., date of birth, zip code, gender) that are potentially identifying when linked with other available data sets. Data are often shared for business or legal reasons. This paper addresses the important issue of preserving the anonymity of the in-dividuals or entities during the data dissemination process. We explore preserving the anonymity by the use of general-izations and suppressions on the potentially identifying por-tions of the data. We extend earlier works in this area along various dimensions. First, satisfying privacy constraints is considered in conjunction with the usage for the data be-ing disseminated. This allows us to optimize the process of preserving privacy for the specified usage. In particular, we investigate the privacy transformation in the context of data mining applications like building classification and re-gression models. Second, our work improves on previous approaches by allowing more flexible generalizations for the data. Lastly, this is combined with a more thorough ex-ploration of the solution space using the genetic algorithm framework. These extensions allow us to transform the data so that they are more useful for their intended purpose while satisfying the privacy constraints.
Interval Computations Related to Privacy in Statistical Databases
, 2002
"... We show that the need to maintain privacy in statistical databases naturally leads to interval computations, and provide feasible algorithms for the corresponding interval computation problems. ..."
Abstract
- Add to MetaCart
We show that the need to maintain privacy in statistical databases naturally leads to interval computations, and provide feasible algorithms for the corresponding interval computation problems.

