Results 1 - 10
of
21
Protecting respondents’ identities in microdata release
- In IEEE Transactions on Knowledge and Data Engineering (TKDE
, 2001
"... Today’s globally networked society places great demand on the dissemination and sharing of information. While in the past released information was mostly in tabular and statistical form, many situations call today for the release of specific data (microdata). In order to protect the anonymity of the ..."
Abstract
-
Cited by 236 (28 self)
- Add to MetaCart
Today’s globally networked society places great demand on the dissemination and sharing of information. While in the past released information was mostly in tabular and statistical form, many situations call today for the release of specific data (microdata). In order to protect the anonymity of the entities (called respondents) to which information refers, data holders often remove or encrypt explicit identifiers such as names, addresses, and phone numbers. De-identifying data, however, provides no guarantee of anonymity. Released information often contains other data, such as race, birth date, sex, and ZIP code, that can be linked to publicly available information to re-identify respondents and inferring information that was not intended for disclosure. In this paper we address the problem of releasing microdata while safeguarding the anonymity of the respondents to which the data refer. The approach is based on the definition of k-anonymity. A table provides k-anonymity if attempts to link explicitly identifying information to its content map the information to at least k entities. We illustrate how k-anonymity can be provided without compromising the integrity (or truthfulness) of the information released by using generalization and suppression techniques. We introduce the concept of minimal generalization that captures the property of the release process not to distort the data more than needed to achieve k-anonymity, and present an algorithm for the computation of such a generalization. We also discuss possible preference policies to choose among different minimal generalizations. Index terms:
Computational Disclosure Control - A Primer on Data Privacy Protection
- Massachusetts Institute of Technology
, 2001
"... Today's globally networked society places great demand on the dissemination and sharing of person-specific data for many new and exciting uses. Even situations where aggregate statistical information was once the reporting norm now rely heavily on the transfer of microscopically detailed transaction ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
Today's globally networked society places great demand on the dissemination and sharing of person-specific data for many new and exciting uses. Even situations where aggregate statistical information was once the reporting norm now rely heavily on the transfer of microscopically detailed transaction and encounter information. This happens at a time when more and more historically public information is also electronically available. When these data are linked together, they provide an electronic shadow of a person or organization that is as identifying and personal as a fingerprint even when the information contains no explicit identifiers, such as name and phone number. Other distinctive data, such as birth date and ZIP code, often combine uniquely and can be linked to publicly available information to re-identify individuals. Producing anonymous data that remains specific enough to be useful is often a very difficult task and practice today tends to either incorrectly believe confidentiality is maintained when it is not or produces data that are practically useless. The goal of the work presented in this book is to explore computational techniques for releasing useful information in such a way that the identity of any individual or entity contained in data cannot be recognized while the data remain practically useful. I begin by demonstrating ways to learn information about entities from publicly available information. I then provide a formal framework for reasoning about disclosure control and the ability to infer the identities of entities contained within the data. I formally define and present null-map, k-map and wrong-map as models of protection. Each model provides protection by ensuring that released information maps to no, k or incorrect entities, respectively. The book ends by examining four computational systems that attempt to maintain privacy while releasing electronic information. These systems are: (1) my Scrub System, which locates personally-identifying information in letters between doctors and notes written by clinicians; (2) my Datafly II System, which generalizes and suppresses values in field-structured data sets; (3) Statistics Netherlands' m-Argus System, which is becoming a European standard for producing public-use data; and, (4) my k-Similar algorithm, which finds optimal solutions such that data are minimally distorted while still providing adequate protection. By introducing anonymity and quality metrics, I show that Datafly II can overprotect data, Scrub and m-Argus can fail to provide adequate protection, but k-similar finds optimal results.
Level Inference Detection Database Systems
"... Existing work on inference detection for database systems mainly employ functional dependencies in the database schema to detect inferences. It has been noticed that analyzing the data stored in the database may help to detect more inferences. In this paper, we describe our e#ort in developing a dat ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Existing work on inference detection for database systems mainly employ functional dependencies in the database schema to detect inferences. It has been noticed that analyzing the data stored in the database may help to detect more inferences. In this paper, we describe our e#ort in developing a data level inference detection system. We have identi#ed #ve inference rules that a user can use to perform inferences. They are `subsume', `unique characteristic', `overlapping ', `complementary', and `functional dependency' inference rules. The existenceofthese inference rules con#rms the inadequacy of detecting inferences using just functional dependencies. The rules can be applied any number of times and in any order. These inference rules are sound. They are not necessarily complete, although we have no example that demonstrates incompleteness. We employ a rule based approach so that future inference rules can be incorporated into the detection system. We have developed a prototype of the inference detection system using Perl on a Sun SPARC20workstation. The preliminary results show that on average it takes seconds to process a query for a database with thousands of records. Thus, our approach to inference detection is best performed o#-line, and would be most useful to detect subtle inference attacks. 1.
Maximizing Sharing of Protected Information
, 2002
"... ... In this paper we address the problem of classifying information by enforcing explicit data classification as well as inference and association constraints. We formulate the problem of determining a classification that ensures satisfaction of the constraints, while at the same time guaranteein ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
... In this paper we address the problem of classifying information by enforcing explicit data classification as well as inference and association constraints. We formulate the problem of determining a classification that ensures satisfaction of the constraints, while at the same time guaranteeing that information will not be overclassified. We present an approach to the solution of this problem and give an algorithm implementing it which is linear in simple cases, and quadratic in the general case. We also analyze a variant of the problem that is NP-complete.
Specification and Enforcement of Classification and Inference Constraints
- IEEE Symposium on Security and Privacy
, 1999
"... Although mandatory access control in database systems has been extensively studied in recent years, and several models and systems have been proposed, capabilities for enforcement of mandatory constraints remain limited. Lack of support for expressing and combating inference channels that improperly ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Although mandatory access control in database systems has been extensively studied in recent years, and several models and systems have been proposed, capabilities for enforcement of mandatory constraints remain limited. Lack of support for expressing and combating inference channels that improperly leak protected information remains a major limitation in today’s multilevel systems. Moreover, the working assumption that data are classified at insertion time makes previous approaches inapplicable to the classification of existing, possibly historical, data repositories that need to be classified for release. Such a capability would be of great benefit to, and appears to be in demand by, governmental, public, and private institutions. We address the problem of classifying existing data
A Practical Formalism for Imprecise Inference Control
- Proceedings of the 8th IFIP WG11.3 Workshop on Database Security
, 1994
"... This paper describes a powerful, yet practical, formalism for modeling and controlling imprecise FD-based inference in relational database systems. The formalism provides a canonical representation of inference which unifies precise inference and the primitive imprecise inference mechanisms of abduc ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
This paper describes a powerful, yet practical, formalism for modeling and controlling imprecise FD-based inference in relational database systems. The formalism provides a canonical representation of inference which unifies precise inference and the primitive imprecise inference mechanisms of abduction and partial deduction. Whereas other imprecise (partial) inference models estimate the probability of making inferences, the formalism supports the analysis of the actual imprecise values inferred in a database extension. Imprecise inference is analyzed by transforming a precise database augmented with additional "catalytic" relations, conveying possibly imprecise a priori knowledge, into an equivalent imprecise database. The analysis of imprecise inference and the related inference control methodology are highly flexible and robust. They can be directly applied to classical, MLS, and imprecise databases. With minimal modifications, they also can be used in knowledge discovery or databa...
The Design And Implementation Of A Data Level Database Inference Detection System
- In Proceedings of the Twelfth Annual IFIP WG 11.3 Working Conference on Database Security, Chalkidiki
, 1998
"... : Inference is a waytosubvert access control mechanisms of database systems. Most existing work on inference detection relies on analyzing functional dependencies in the database schema. This paper is an extension to our earlier e#ort in developing a data level inference detection system #Yip and ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
: Inference is a waytosubvert access control mechanisms of database systems. Most existing work on inference detection relies on analyzing functional dependencies in the database schema. This paper is an extension to our earlier e#ort in developing a data level inference detection system #Yip and Levitt, 1998#. In this paper, weintroduce the split query inference rule, make an extension to the overlapping inference rule, and provide an in depth discussion on the applications of the inference rules on union queries. Data level inference detection is inevitably expensive. Wehave developed a prototype of the inference detection system to evaluate its performance. The result shows that the system performs better with larger number of attributes and records in the database, and smaller number of projected attributes and return tuples of the queries. Therefore, the inference detection system could be practical when users retrieve a small amount of data compare to the size of the database. 1
Web-Based Inference Detection
"... Newly published data, when combined with existing public knowledge, allows for complex and sometimes unintended inferences. We propose semi-automated tools for detecting these inferences prior to releasing data. Our tools give data owners a fuller understanding of the implications of releasing data ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Newly published data, when combined with existing public knowledge, allows for complex and sometimes unintended inferences. We propose semi-automated tools for detecting these inferences prior to releasing data. Our tools give data owners a fuller understanding of the implications of releasing data and help them adjust the amount of data they release to avoid unwanted inferences. Our tools first extract salient keywords from the private data intended for release. Then, they issue search queries for documents that match subsets of these keywords, within a reference corpus (such as the public Web) that encapsulates as much of relevant public knowledge as possible. Finally, our tools parse the documents returned by the search queries for keywords not present in the original private data. These additional keywords allow us to automatically estimate the likelihood of certain inferences. Potentially dangerous inferences are flagged for manual review. We call this new technology Web-based inference control. The paper reports on two experiments which demonstrate early successes of this technology. The first experiment shows the use of our tools to automatically estimate the risk that an anonymous document allows for re-identification of its author. The second experiment shows the use of our tools to detect the risk that a document is linked to a sensitive topic. These experiments, while simple, capture the full complexity of inference detection and illustrate the power of our approach.
Private inference control
- In CCS ’04: Proceedings of the 11th ACM conference on Computer and communications security
, 2004
"... Access control can be used to ensure that database queries pertaining to sensitive information are not answered. This is not enough to prevent users from learning sensitive information though, because users can combine non-sensitive information to discover something sensitive. Inference control prev ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Access control can be used to ensure that database queries pertaining to sensitive information are not answered. This is not enough to prevent users from learning sensitive information though, because users can combine non-sensitive information to discover something sensitive. Inference control prevents users from obtaining sensitive information via such “inference channels”, however, existing inference control techniques are not private- that is, they require the server to learn what queries the user is making in order to deny inference-enabling queries. We propose a new primitive- private inference control (PIC)- which is a means for the server to provide inference control without learning what information is being retrieved. PIC is a generalization of private information retrieval (PIR) and symmetrically-private information retrieval (SPIR). While it is straightforward to implement access control using PIR (simply omit sensitive information from the database), it is nontrivial to implement inference control efficiently. We measure the efficiency of a PIC protocol in terms of its communication complexity, its round complexity, and the work the server performs per query. Under existing cryptographic assumptions, we give a PIC scheme which is simultaneously optimal, up to logarithmic factors, in the work the server performs per query, the total communication complexity, and the number of rounds of interaction. We also present a scheme requiring more communication but sufficient storage of state by the server to facilitate private user revocation. Finally, we present a generic reduction which shows that one can focus on designing PIC schemes for which the inference channels take a particularly simple threshold form.
Analyzing FD Inference in Relational Databases
- Data and Knowledge Engineering
, 1996
"... Imprecise inference models the ability to infer sets of values or information chunks. Imprecise database inference is just as important as precise inference. In fact, it is more prevalent than its precise counterpart even in precise databases. Analyzing the extent of imprecise inference is important ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Imprecise inference models the ability to infer sets of values or information chunks. Imprecise database inference is just as important as precise inference. In fact, it is more prevalent than its precise counterpart even in precise databases. Analyzing the extent of imprecise inference is important in knowledge discovery and database security. Imprecise inference analysis can be used to "mine" rule-based knowledge from database data. In database security, imprecise inference analysis can help determine whether or not a system is safe from imprecise inference attacks. This paper deals with the general problem of analyzing fuzzy inference based on functional dependencies (FDs) in database relations. Fuzzy inference, the ability to infer fuzzy set values, generalizes imprecise (setvalued) inference and precise inference. Likewise, fuzzy relational databases generalize their classical and imprecise counterparts by supporting fuzzy information storage and retrieval. Inference analysis is p...

