Results 1 - 10
of
49
ℓ-diversity: Privacy beyond k-anonymity
- IN ICDE
, 2006
"... Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called k-anonymity has gained popularity. In a k-anonymized dataset, each record is indistinguishable from at least k − 1 other records with resp ..."
Abstract
-
Cited by 672 (13 self)
- Add to MetaCart
Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called k-anonymity has gained popularity. In a k-anonymized dataset, each record is indistinguishable from at least k − 1 other records with respect to certain “identifying ” attributes. In this paper we show using two simple attacks that a k-anonymized dataset has some subtle, but severe privacy problems. First, an attacker can discover the values of sensitive attributes when there is little diversity in those sensitive attributes. This kind of attack is a known problem [60]. Second, attackers often have background knowledge, and we show that k-anonymity does not guarantee privacy against attackers using background knowledge. We give a detailed analysis of these two attacks and we propose a novel and powerful privacy criterion called ℓ-diversity that can defend against such attacks. In addition to building a formal foundation for ℓ-diversity, we show in an experimental evaluation that ℓ-diversity is practical and can be implemented efficiently.
Graphscope: parameter-free mining of large time-evolving graphs
- In KDD ’07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
, 2007
"... How can we find communities in dynamic networks of social interactions, such as who calls whom, who emails whom, or who sells to whom? How can we spot discontinuity time-points in such streams of graphs, in an on-line, any-time fashion? We propose GraphScope, that addresses both prob-lems, using inf ..."
Abstract
-
Cited by 155 (14 self)
- Add to MetaCart
(Show Context)
How can we find communities in dynamic networks of social interactions, such as who calls whom, who emails whom, or who sells to whom? How can we spot discontinuity time-points in such streams of graphs, in an on-line, any-time fashion? We propose GraphScope, that addresses both prob-lems, using information theoretic principles. Contrary to the majority of earlier methods, it needs no user-defined param-eters. Moreover, it is designed to operate on large graphs, in a streaming fashion. We demonstrate the efficiency and effectiveness of our GraphScope on real datasets from sev-eral diverse domains. In all cases it produces meaningful time-evolving patterns that agree with human intuition.
A framework for high-accuracy privacy-preserving mining
- In Proceedings of the 21st IEEE International Conference on Data Engineering
, 2005
"... To preserve client privacy in the data mining process, a variety of techniques based on random perturbation of individual data records have been proposed recently. In this paper, we present FRAPP, a generalized matrix-theoretic framework of random perturbation, which facilitates a systematic approac ..."
Abstract
-
Cited by 71 (1 self)
- Add to MetaCart
(Show Context)
To preserve client privacy in the data mining process, a variety of techniques based on random perturbation of individual data records have been proposed recently. In this paper, we present FRAPP, a generalized matrix-theoretic framework of random perturbation, which facilitates a systematic approach to the design of perturbation mechanisms for privacy-preserving mining. Specifically, FRAPP is used to demonstrate that (a) the prior techniques differ only in their choices for the perturbation matrix elements, and (b) a symmetric perturbation matrix with minimal condition number can be identified, maximizing the accuracy even under strict privacy guarantees. We also propose a novel perturbation mechanism wherein the matrix elements are themselves characterized as random variables, and demonstrate that this feature provides significant improvements in privacy at only a marginal cost in accuracy. The quantitative utility of FRAPP, which applies to random-perturbation-based privacy-preserving mining in general, is evaluated specifically with regard to frequentitemset mining on a variety of real datasets. Our experimental results indicate that, for a given privacy requirement, substantially lower errors are incurred, with respect to both itemset identity and itemset support, as compared to the prior techniques. 1.
Intrusion Detection in RBAC-administered Databases
- In: ACSAC ’05: Proceedings of the 21st Annual Computer Security Applications Conference
, 2005
"... A considerable effort has been recently devoted to the development of Database Management Systems (DBMS) which guarantee high assurance security and privacy. An important component of any strong security solution is represented by intrusion detection (ID) systems, able to detect anomalous behavior b ..."
Abstract
-
Cited by 28 (4 self)
- Add to MetaCart
(Show Context)
A considerable effort has been recently devoted to the development of Database Management Systems (DBMS) which guarantee high assurance security and privacy. An important component of any strong security solution is represented by intrusion detection (ID) systems, able to detect anomalous behavior by applications and users. To date, however, there have been very few ID mechanisms specifically tailored to database systems. In this paper, we propose such a mechanism. The approach we propose to ID is based on mining database traces stored in log files. The result of the mining process is used to form user profiles that can model normal behavior and identify intruders. An additional feature of our approach is that we couple our mechanism with Role Based Access Control (RBAC). Under a RBAC system permissions are associated with roles, usually grouping several users, rather than with single users. Our ID system is able to determine role intruders, that is, individuals that while holding a specific role, have a behavior different from the normal behavior of the role. An important advantage of providing an ID mechanism specifically tailored to databases is that it can also be used to protect against insider threats. Furthermore, the use of roles makes our approach usable even for databases with large user population. Our preliminary experimental evaluation on both real and synthetic database traces show that our methods work well in practical situations. 1.
Towards robustness in query auditing
- In VLDB
, 2006
"... We consider the online query auditing problem for statistical databases. Given a stream of aggregate queries posed over sensitive data, when should queries be denied in order to protect the privacy of individuals? We construct efficient auditors for max queries and bags of max and min queries in bot ..."
Abstract
-
Cited by 28 (6 self)
- Add to MetaCart
We consider the online query auditing problem for statistical databases. Given a stream of aggregate queries posed over sensitive data, when should queries be denied in order to protect the privacy of individuals? We construct efficient auditors for max queries and bags of max and min queries in both the partial and full disclosure settings. Our algorithm for the partial disclosure setting involves a novel application of probabilistic inference techniques that may be of independent interest. We also study for the first time, a particular dimension of the utility of an auditing scheme and obtain initial results for the utility of sum auditing when guarding against full disclosure. The result is positive for large databases, indicating that answers to queries will not be riddled with denials. 1.
Auditing SQL Queries.
, 2007
"... Abstract-We study the problem of auditing a batch of SQL queries: Given a forbidden view of a database that should have been kept confidential, a batch of queries that were posed over this database and answered, and a definition of suspiciousness, determine if the query batch is suspicious with res ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
(Show Context)
Abstract-We study the problem of auditing a batch of SQL queries: Given a forbidden view of a database that should have been kept confidential, a batch of queries that were posed over this database and answered, and a definition of suspiciousness, determine if the query batch is suspicious with respect to the forbidden view. We consider several notions of suspiciousness that span a spectrum both in terms of their disclosure detection guarantees and the tractability of auditing under them for different classes of queries. We identify a particular notion of suspiciousness, weak syntactic suspiciousness, that allows for an efficient auditor for a large class of conjunctive queries. The auditor can be used together with a specific set of forbidden views to detect disclosures of the association between individuals and their private attributes. Further it can also be used to prevent disclosures by auditing queries on the fly in an online setting. Finally, we tie in our work with recent research on query auditing and access control and relate the above definitions of suspiciousness to the notion of unconditional validity of a query introduced in database access control literature.
Explanation-Based Auditing
"... To comply with emerging privacy laws and regulations, it has become common for applications like electronic health records systems (EHRs) to collect access logs, which record each time a user (e.g., a hospital employee) accesses a piece of sensitive data (e.g., a patient record). Using the access lo ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
(Show Context)
To comply with emerging privacy laws and regulations, it has become common for applications like electronic health records systems (EHRs) to collect access logs, which record each time a user (e.g., a hospital employee) accesses a piece of sensitive data (e.g., a patient record). Using the access log, it is easy to answer simple queries (e.g., Who accessed Alice’s medical record?), but this often does not provide enough information. In addition to learning who accessed their medical records, patients will likely want to understand why each access occurred. In this paper, we introduce the problem of generating explanations for individual records in an access log. The problem is motivated by user-centric auditing applications, and it also provides a novel approach to misuse detection. We develop a framework for modeling explanations which is based on a fundamental observation: For certain classes of databases, including EHRs, the reason for most data accesses can be inferred from data stored elsewhere in the database. For example, if Alice has an appointment with Dr. Dave, this information is stored in the database, and it explains why Dr. Dave looked at Alice’s record. Large numbers of data accesses can be explained using general forms called explanation templates. Rather than requiring an administrator to manually specify explanation templates, we propose a set of algorithms for automatically discovering frequent templates from the database (i.e., those that explain a large number of accesses). We also propose techniques for inferring collaborative user groups, which can be used to enhance the quality of the discovered explanations. Finally, we have evaluated our proposed techniques using an access log and data from the University of Michigan Health System. Our results demonstrate that in practice we can provide explanations for over 94 % of data accesses in the log. 1.
Access Control for Smarter Healthcare Using Policy Spaces
, 2010
"... A fundamental requirement for the healthcare industry is that the delivery of care comes first and nothing should interfere with it. As a consequence, the access control mechanisms used in healthcare to regulate and restrict the disclosure of data are often bypassed in case of emergencies. This phen ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
A fundamental requirement for the healthcare industry is that the delivery of care comes first and nothing should interfere with it. As a consequence, the access control mechanisms used in healthcare to regulate and restrict the disclosure of data are often bypassed in case of emergencies. This phenomenon, called “break the glass”, is a common pattern in healthcare organizations and, though quite useful and mandatory in emergency situations, from a security perspective, it represents a serious system weakness. Malicious users, in fact, can abuse the system by exploiting the break the glass principle to gain unauthorized privileges and accesses. In this paper, we propose an access control solution aimed at better regulating break the glass exceptions that occur in healthcare systems. Our solution is based on the definition of different policy spaces, a language, and a composition algebra to regulate access to patient data and to balance the rigorous nature of traditional access control systems with the “delivery of care comes first ” principle.
Matching unstructured product offers to structured product specifications.
- In KDD,
, 2011
"... ABSTRACT An e-commerce catalog is typically comprised of specifications for millions of products. The search engine receives millions of sales offers from thousands of independent merchants that must be matched to the right products. This problem is hard for several reasons. First, unique identifie ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
(Show Context)
ABSTRACT An e-commerce catalog is typically comprised of specifications for millions of products. The search engine receives millions of sales offers from thousands of independent merchants that must be matched to the right products. This problem is hard for several reasons. First, unique identifiers are absent in most offers. Second, although the product specifications are well structured, offers are described in the form of free text. Third, offers mention the values of the attributes without providing the corresponding attribute names. Fourth, values of a large number of attributes are often missing from the offer description. Finally, offers may also contain words other than attribute names and values. We present an automated technique for matching unstructured offers to structured product descriptions. A novel aspect of our approach is the semantic parsing of offer descriptions using dictionaries built from the structured catalog. Another novelty is that the matching function we learn factors in not only matches but also mismatches of attribute values as well as the missing attribute values. Our approach has been implemented in an experimental search engine and is used to match all the offers received by Bing shopping to the Bing product catalog on a daily basis. We present extensive experimental results from this implementation that demonstrate the effectiveness of the proposed approach.
Do you know where your data’s been? — tamper-evident database provenance
- in Proceedings of the 6th VLDB Workshop on Secure Data Management (SDM
, 2010
"... Abstract. Database provenance chronicles the history of updates and modifications to data, and has received much attention due to its central role in scientific data management. However, the use of provenance information still requires a leap of faith. Without additional protections, provenance reco ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Database provenance chronicles the history of updates and modifications to data, and has received much attention due to its central role in scientific data management. However, the use of provenance information still requires a leap of faith. Without additional protections, provenance records are vulnerable to accidental corruption, and even malicious forgery, a problem that is most pronounced in the loosely-coupled multi-user environments often found in scientific research. This paper investigates the problem of providing integrity and tamper-detection for database provenance. We propose a checksum-based approach, which is well-suited to the unique characteristics of database provenance, including non-linear provenance objects and provenance associated with multiple fine granularities of data. We demonstrate that the proposed solution satisfies a set of desirable security properties, and that the additional time and space overhead incurred by the checksum approach is manageable, making the solution feasible in practice. 1