Results 1 - 10
of
16
Worst-case background knowledge in privacy
- In ICDE
, 2007
"... Recent work has shown the necessity of considering an attacker’s background knowledge when reasoning about privacy in data publishing. However, in practice, the data publisher does not know what background knowledge the attacker possesses. Thus, it is important to consider the worst-case. In this pa ..."
Abstract
-
Cited by 56 (1 self)
- Add to MetaCart
Recent work has shown the necessity of considering an attacker’s background knowledge when reasoning about privacy in data publishing. However, in practice, the data publisher does not know what background knowledge the attacker possesses. Thus, it is important to consider the worst-case. In this paper, we initiate a formal study of worst-case background knowledge. We propose a language that can express any background knowledge about the data. We provide a polynomial time algorithm to measure the amount of disclosure of sensitive information in the worst case, given that the attacker has at most k pieces of information in this language. We also provide a method to efficiently sanitize the data so that the amount of disclosure in the worst case is less than a specified threshold. 1.
Foundations of secure deductive databases
- IEEE Transactions on Knowledge and Data Engineering
, 1995
"... Abstract-In this paper, we develop a formal logical foundation for secure deductive databases. This logical foundation is based on an extended logic involving several modal operators. We develop two models of interaction between the user and the datathe space of admissible responses (e.g., yes, no, ..."
Abstract
-
Cited by 33 (5 self)
- Add to MetaCart
Abstract-In this paper, we develop a formal logical foundation for secure deductive databases. This logical foundation is based on an extended logic involving several modal operators. We develop two models of interaction between the user and the datathe space of admissible responses (e.g., yes, no, don’t know, rehse to tell you), and the initial body of knowledge the user may possess. base called “yes-no ” dialogs, and “yes-no-don’t know ” dialogs. Both dialog frameworks allow the database to lie to the user. We develop an algorithm for answering queries using yes-no dialogs and prove that secure query processing using yes-no dialogs is NP-complete. Consequently, the degree of computational intractability of query processing with yes-no dialogs is no worse than In this paper, we consider the problem of security in deductive databases. A deductive database is a kite set of formulas of
Computational Disclosure Control - A Primer on Data Privacy Protection
- Massachusetts Institute of Technology
, 2001
"... Today's globally networked society places great demand on the dissemination and sharing of person-specific data for many new and exciting uses. Even situations where aggregate statistical information was once the reporting norm now rely heavily on the transfer of microscopically detailed transaction ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
Today's globally networked society places great demand on the dissemination and sharing of person-specific data for many new and exciting uses. Even situations where aggregate statistical information was once the reporting norm now rely heavily on the transfer of microscopically detailed transaction and encounter information. This happens at a time when more and more historically public information is also electronically available. When these data are linked together, they provide an electronic shadow of a person or organization that is as identifying and personal as a fingerprint even when the information contains no explicit identifiers, such as name and phone number. Other distinctive data, such as birth date and ZIP code, often combine uniquely and can be linked to publicly available information to re-identify individuals. Producing anonymous data that remains specific enough to be useful is often a very difficult task and practice today tends to either incorrectly believe confidentiality is maintained when it is not or produces data that are practically useless. The goal of the work presented in this book is to explore computational techniques for releasing useful information in such a way that the identity of any individual or entity contained in data cannot be recognized while the data remain practically useful. I begin by demonstrating ways to learn information about entities from publicly available information. I then provide a formal framework for reasoning about disclosure control and the ability to infer the identities of entities contained within the data. I formally define and present null-map, k-map and wrong-map as models of protection. Each model provides protection by ensuring that released information maps to no, k or incorrect entities, respectively. The book ends by examining four computational systems that attempt to maintain privacy while releasing electronic information. These systems are: (1) my Scrub System, which locates personally-identifying information in letters between doctors and notes written by clinicians; (2) my Datafly II System, which generalizes and suppresses values in field-structured data sets; (3) Statistics Netherlands' m-Argus System, which is becoming a European standard for producing public-use data; and, (4) my k-Similar algorithm, which finds optimal solutions such that data are minimally distorted while still providing adequate protection. By introducing anonymity and quality metrics, I show that Datafly II can overprotect data, Scrub and m-Argus can fail to provide adequate protection, but k-similar finds optimal results.
Maximizing Sharing of Protected Information
, 2002
"... ... In this paper we address the problem of classifying information by enforcing explicit data classification as well as inference and association constraints. We formulate the problem of determining a classification that ensures satisfaction of the constraints, while at the same time guaranteein ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
... In this paper we address the problem of classifying information by enforcing explicit data classification as well as inference and association constraints. We formulate the problem of determining a classification that ensures satisfaction of the constraints, while at the same time guaranteeing that information will not be overclassified. We present an approach to the solution of this problem and give an algorithm implementing it which is linear in simple cases, and quadratic in the general case. We also analyze a variant of the problem that is NP-complete.
Controlled Query Evaluation for Known Policies by Combining Lying and Refusal
- Annals of Mathematics and Artificial Intelligence
, 2001
"... Controlled query evaluation enforces security policies for con- dentiality in information systems. It deals with users who may apply background knowledge to infer additional information from the answers to their queries. For each query the correct answer is rst judged by some censor and then|if ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Controlled query evaluation enforces security policies for con- dentiality in information systems. It deals with users who may apply background knowledge to infer additional information from the answers to their queries. For each query the correct answer is rst judged by some censor and then|if necessary|appropriately modied to preserve security. In previous approaches, modication has been done uniformly, either by lying or by refusal. A drawback of lying is that all disjunctions of secrets must always be protected. On the other hand, refusal may hide an answer even when the correct answer does not immediately reveal a secret.
Specification and Enforcement of Classification and Inference Constraints
- IEEE Symposium on Security and Privacy
, 1999
"... Although mandatory access control in database systems has been extensively studied in recent years, and several models and systems have been proposed, capabilities for enforcement of mandatory constraints remain limited. Lack of support for expressing and combating inference channels that improperly ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Although mandatory access control in database systems has been extensively studied in recent years, and several models and systems have been proposed, capabilities for enforcement of mandatory constraints remain limited. Lack of support for expressing and combating inference channels that improperly leak protected information remains a major limitation in today’s multilevel systems. Moreover, the working assumption that data are classified at insertion time makes previous approaches inapplicable to the classification of existing, possibly historical, data repositories that need to be classified for release. Such a capability would be of great benefit to, and appears to be in demand by, governmental, public, and private institutions. We address the problem of classifying existing data
A Practical Formalism for Imprecise Inference Control
- Proceedings of the 8th IFIP WG11.3 Workshop on Database Security
, 1994
"... This paper describes a powerful, yet practical, formalism for modeling and controlling imprecise FD-based inference in relational database systems. The formalism provides a canonical representation of inference which unifies precise inference and the primitive imprecise inference mechanisms of abduc ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
This paper describes a powerful, yet practical, formalism for modeling and controlling imprecise FD-based inference in relational database systems. The formalism provides a canonical representation of inference which unifies precise inference and the primitive imprecise inference mechanisms of abduction and partial deduction. Whereas other imprecise (partial) inference models estimate the probability of making inferences, the formalism supports the analysis of the actual imprecise values inferred in a database extension. Imprecise inference is analyzed by transforming a precise database augmented with additional "catalytic" relations, conveying possibly imprecise a priori knowledge, into an equivalent imprecise database. The analysis of imprecise inference and the related inference control methodology are highly flexible and robust. They can be directly applied to classical, MLS, and imprecise databases. With minimal modifications, they also can be used in knowledge discovery or databa...
Analyzing FD Inference in Relational Databases
- Data and Knowledge Engineering
, 1996
"... Imprecise inference models the ability to infer sets of values or information chunks. Imprecise database inference is just as important as precise inference. In fact, it is more prevalent than its precise counterpart even in precise databases. Analyzing the extent of imprecise inference is important ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Imprecise inference models the ability to infer sets of values or information chunks. Imprecise database inference is just as important as precise inference. In fact, it is more prevalent than its precise counterpart even in precise databases. Analyzing the extent of imprecise inference is important in knowledge discovery and database security. Imprecise inference analysis can be used to "mine" rule-based knowledge from database data. In database security, imprecise inference analysis can help determine whether or not a system is safe from imprecise inference attacks. This paper deals with the general problem of analyzing fuzzy inference based on functional dependencies (FDs) in database relations. Fuzzy inference, the ability to infer fuzzy set values, generalizes imprecise (setvalued) inference and precise inference. Likewise, fuzzy relational databases generalize their classical and imprecise counterparts by supporting fuzzy information storage and retrieval. Inference analysis is p...
Compromising Privacy with Trail Re-Identification: The REIDIT Algorithms
, 2002
"... Re-identification is the process of relating unique and specific entities to seemingly anonymous data, and as such, is an attack on the privacy of a data collection. This work introduces a new reidentification attack, termed the trail problem, for data distributed over multiple locations. Through th ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Re-identification is the process of relating unique and specific entities to seemingly anonymous data, and as such, is an attack on the privacy of a data collection. This work introduces a new reidentification attack, termed the trail problem, for data distributed over multiple locations. Through the use of data trails an adversary can independently reconstruct the trails of locations that identified entities and their un-identified data visited, which can then employed for re-identification via trail matching. The attack strategy is based on the premise that data collecting institutions partition and release a dataset as multiple subsets, such that one release contains identifying attributes (e.g. name, social security number, phone number) and a second is devoid of these attributes (e.g. DNA sequences). The trail attack is dependent on whether the identified data is always collected with the un-identified data, termed complete, or whether one of the attributes is under-collected, termed incomplete. Both the complete and incomplete trail problems are formalized and several novel algorithms for re-identification are introduced. Examples are drawn from the areas of clickstream, DNA sequence, health, and video data.
Privacy: A Machine Learning View
, 2002
"... The problem of disseminating a data-set for machine learning while controlling the disclosure of data source identity is described using a commuting diagram of functions. This formalization is used to present and analyze an optimization problem balancing privacy and data utility requirements. The an ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The problem of disseminating a data-set for machine learning while controlling the disclosure of data source identity is described using a commuting diagram of functions. This formalization is used to present and analyze an optimization problem balancing privacy and data utility requirements. The analysis points to the application of a generalization mechanism for maintaining privacy in view of machine learning needs. We present new proofs of NP-hardness of the problem of minimizing information loss while satisfying a set of privacy requirements, both with and without the addition of a particular uniform coding requirement. As an initial analysis of the approximation properties of the problem, we show that the cell suppression problem with a constant number of attributes can be approximated within a constant. As a side e#ect, proofs of NP-hardness of the minimum k-union, maximum k-intersection, and parallel versions of these are presented. Bounded versions of these problems are also shown to be approximable within a constant.

