Results 1 - 10
of
27
Privacy-Preserving K-Means Clustering over Vertically Partitioned Data
- IN SIGKDD
, 2003
"... Privacy and security concerns can prevent sharing of data, derailing data mining projects. Distributed knowledge discovery, if done correctly, can alleviate this problem. The key is to obtain valid results, while providing guarantees on the (non)disclosure of data. We present a method for k-means cl ..."
Abstract
-
Cited by 83 (4 self)
- Add to MetaCart
Privacy and security concerns can prevent sharing of data, derailing data mining projects. Distributed knowledge discovery, if done correctly, can alleviate this problem. The key is to obtain valid results, while providing guarantees on the (non)disclosure of data. We present a method for k-means clustering when different sites contain different attributes for a common set of entities. Each site learns the cluster of each entity, but learns nothing about the attributes at other sites.
STATE-OF-THE-ART IN PRIVACY PRESERVING DATA MINING
"... We provide here an overview of the new and rapidly emerging research area of privacy preserving data mining. We also propose a classification hierarchy that sets the basis for analyzing the work which has been performed in this context. A detailed review of the work accomplished in this area is also ..."
Abstract
-
Cited by 81 (4 self)
- Add to MetaCart
We provide here an overview of the new and rapidly emerging research area of privacy preserving data mining. We also propose a classification hierarchy that sets the basis for analyzing the work which has been performed in this context. A detailed review of the work accomplished in this area is also given, along with the coordinates of each work to the classification hierarchy. A brief evaluation is performed, and some initial conclusions are made.
A framework for high-accuracy privacy-preserving mining
- In Proceedings of the 21st IEEE International Conference on Data Engineering
, 2005
"... To preserve client privacy in the data mining process, a variety of techniques based on random perturbation of individual data records have been proposed recently. In this paper, we present FRAPP, a generalized matrix-theoretic framework of random perturbation, which facilitates a systematic approac ..."
Abstract
-
Cited by 36 (0 self)
- Add to MetaCart
To preserve client privacy in the data mining process, a variety of techniques based on random perturbation of individual data records have been proposed recently. In this paper, we present FRAPP, a generalized matrix-theoretic framework of random perturbation, which facilitates a systematic approach to the design of perturbation mechanisms for privacy-preserving mining. Specifically, FRAPP is used to demonstrate that (a) the prior techniques differ only in their choices for the perturbation matrix elements, and (b) a symmetric perturbation matrix with minimal condition number can be identified, maximizing the accuracy even under strict privacy guarantees. We also propose a novel perturbation mechanism wherein the matrix elements are themselves characterized as random variables, and demonstrate that this feature provides significant improvements in privacy at only a marginal cost in accuracy. The quantitative utility of FRAPP, which applies to random-perturbation-based privacy-preserving mining in general, is evaluated specifically with regard to frequentitemset mining on a variety of real datasets. Our experimental results indicate that, for a given privacy requirement, substantially lower errors are incurred, with respect to both itemset identity and itemset support, as compared to the prior techniques. 1.
Privacy Preserving Naive Bayes Classifier for Vertically Partitioned Data
- in Proceedings of the fourth SIAM Conference on Data Mining, 2004
"... Privacy-Preserving Data Mining – developing models without seeing the data – is receiving growing attention. This paper assumes a privacy-preserving distributed data mining scenario: data sources collaborate to develop a global model, but must not disclose their data to others. Naïve Bayes is often ..."
Abstract
-
Cited by 35 (5 self)
- Add to MetaCart
Privacy-Preserving Data Mining – developing models without seeing the data – is receiving growing attention. This paper assumes a privacy-preserving distributed data mining scenario: data sources collaborate to develop a global model, but must not disclose their data to others. Naïve Bayes is often used as a baseline classifier, consistently providing reasonable classification performance. This paper brings privacy-preservation to Naïve Bayes classification on vertically partitioned data.
Secure set intersection cardinality with application to association rule mining
- Accepted for Publication in the Journal of Computer Security, IOS
"... There has been concern over the apparent conflict between privacy and data mining. There is no inherent conflict, as most types of data mining produce summary results that do not reveal information about individuals. The process of data mining may use private data, leading to the potential for priva ..."
Abstract
-
Cited by 26 (4 self)
- Add to MetaCart
There has been concern over the apparent conflict between privacy and data mining. There is no inherent conflict, as most types of data mining produce summary results that do not reveal information about individuals. The process of data mining may use private data, leading to the potential for privacy breaches. Secure Multiparty Computation shows that results can be produced without revealing the data used to generate them. The problem is that general techniques for secure multiparty computation do not scale to data-mining size computations. This paper presents an efficient protocol for securely determining the size of set intersection, and shows how this can be used to generate association rules where multiple parties have different (and private) information about the same set of individuals. 1
Secure Regression on Distributed Databases
- J. Computational and Graphical Statist
, 2004
"... We present several methods for performing linear regression on the union of distributed databases that preserve, to varying degrees, confidentiality of those databases. Such methods can be used by federal or state statistical agencies to share information from their individual databases, or to ma ..."
Abstract
-
Cited by 17 (12 self)
- Add to MetaCart
We present several methods for performing linear regression on the union of distributed databases that preserve, to varying degrees, confidentiality of those databases. Such methods can be used by federal or state statistical agencies to share information from their individual databases, or to make such information available to others. Secure data integration, which provides the lowest level of protection, actually integrates the databases, but in a manner that no database owner can determine the origin of any records other than its own. Regression, associated diagnostics or any other analysis then can be performed on the integrated data.
Defining Privacy for Data Mining
- in National Science Foundation Workshop on Next Generation Data Mining
, 2002
"... Privacy preserving data mining -- getting valid data mining results without learning the underlying data values -- has been receiving attention in the research community and beyond. It is unclear what privacy preserving means. This paper provides a framework and metrics for discussing the meaning of ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
Privacy preserving data mining -- getting valid data mining results without learning the underlying data values -- has been receiving attention in the research community and beyond. It is unclear what privacy preserving means. This paper provides a framework and metrics for discussing the meaning of privacy preserving data mining, as a foundation for further research in this field.
Privacy-preserving decision trees over vertically partitioned data
- In the Proceedings of the 19th Annual IFIP WG 11.3 Working Conference on Data and Applications Security
"... Abstract. Privacy and security concerns can prevent sharing of data, derailing data mining projects. Distributed knowledge discovery, if done correctly, can alleviate this problem. In this paper, we tackle the problem of classification. We introduce a generalized privacy preserving variant of the ID ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Abstract. Privacy and security concerns can prevent sharing of data, derailing data mining projects. Distributed knowledge discovery, if done correctly, can alleviate this problem. In this paper, we tackle the problem of classification. We introduce a generalized privacy preserving variant of the ID3 algorithm for vertically partitioned data distributed over two or more parties. Along with the algorithm, we give a complete proof of security that gives a tight bound on the information revealed. 1
Privacy Preserving Naive Bayes Classifier for Horizontally Partitioned Data
, 2003
"... one. In many situations, data is split between multiple organizations. These organizations may want to utilize all of the data to create more accurate predictive models while revealing neither their training data / databases nor the instances to be classified. The Naive Bayes Classifier is a simple ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
one. In many situations, data is split between multiple organizations. These organizations may want to utilize all of the data to create more accurate predictive models while revealing neither their training data / databases nor the instances to be classified. The Naive Bayes Classifier is a simple but efficient baseline classifier. In this paper, we present a privacy preserving Naive Bayes Classifier for horizontally partitioned data.
Privacy-preserving top-k queries
- In ICDE
, 2005
"... There is an increasing need for ranked and “best match” queries, in particular the top-k query: Finding the k closest matches to a query “point”. Top-k queries are particularly relevant when dealing with privacy-sensitive information. For example, profiling has potential anti-terrorism applications ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
There is an increasing need for ranked and “best match” queries, in particular the top-k query: Finding the k closest matches to a query “point”. Top-k queries are particularly relevant when dealing with privacy-sensitive information. For example, profiling has potential anti-terrorism applications (e.g., finding visa applicants who are likely terrorists). Profiles rarely match individuals exactly- a prime example of the need for similarity or best match queries. However, profiles often match innocent people (or transactions), subjecting them to unwarranted scrutiny. By stripping a top-k query to its minimum – disclosing only an unranked set of k matches – we meet the goals of the top-k query while maintaining the privacy concept of k-anonymity[5]. In the context of profiling, k-anonymity requires that any individual that “matches ” be indistinguishable

