Results 11 - 20
of
71
A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms
, 2006
"... ..."
Optimal Random Perturbation at Multiple Privacy Levels
"... Random perturbation is a popular method of computing anonymized data for privacy preserving data mining. It is simple to apply, ensures strong privacy protection, and permits effective mining of a large variety of data patterns. However, all the existing studies with good privacy guarantees focus on ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
(Show Context)
Random perturbation is a popular method of computing anonymized data for privacy preserving data mining. It is simple to apply, ensures strong privacy protection, and permits effective mining of a large variety of data patterns. However, all the existing studies with good privacy guarantees focus on perturbation at a single privacy level. Namely, a fixed degree of privacy protection is imposed on all anonymized data released by the data holder. This drawback seriously limits the applicability of random perturbation in scenarios where the holder has numerous recipients to which different privacy levels apply. Motivated by this, we study the problem of multi-level perturbation, whose objective is to release multiple versions of a dataset anonymized at different privacy levels. The challenge is that various recipients may collude by sharing their data to infer privacy beyond their permitted levels. Our solution overcomes this obstacle, and achieves two crucial properties. First, collusion is useless, meaning that the colluding recipients cannot learn anything more than what the most trustable recipient (among the colluding recipients) already knows alone. Second, the data each recipient receives can be regarded (and hence, analyzed in the same way) as the output of conventional uniform perturbation. Besides its solid theoretical foundation, the proposed technique is both space economical and computationally efficient. It requires O(n+m) expected space, and produces a new anonymized version in O(n+logm) expected time, where n is the cardinality of the original dataset, and m the number of versions released previously. Both bounds are optimal under the realistic assumption that n ≫ m. 1.
On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy
- In ASIACCS
, 2012
"... ar ..."
Hiding in the Crowd: Privacy Preservation on Evolving Streams through Correlation Tracking
"... We address the problem of preserving privacy in streams, which has received surprisingly limited attention. For static data, a well-studied and widely used approach is based on random perturbation of the data values. However, streams pose additional challenges. First, analysis of the data has to be ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
(Show Context)
We address the problem of preserving privacy in streams, which has received surprisingly limited attention. For static data, a well-studied and widely used approach is based on random perturbation of the data values. However, streams pose additional challenges. First, analysis of the data has to be performed incrementally, using limited processing time and buffer space, making batch approaches unsuitable. Second, the characteristics of streams evolve over time. Consequently, approaches based on global analysis of the data are not adequate. We show that it is possible to efficiently and effectively track the correlation and autocorrelation structure of multivariate streams and leverage it to add noise which maximally preserves privacy, in the sense that it is very hard to remove. Our techniques achieve much better results than previous static, global approaches, while requiring limited processing time and memory. We provide both a mathematical analysis and experimental evaluation on real data to validate the correctness, efficiency, and effectiveness of our algorithms. 1.
Privacy-preserving decision tree mining based on random substitutions
- In International Conference on Emerging Trends in Information and Communication Security
, 2005
"... Abstract. Privacy-preserving decision tree mining is an important problem that has yet to be thoroughly understood. In fact, the privacypreserving decision tree mining method explored in the pioneer paper [1] was recently showed to be completely broken, because its data perturbation technique is fun ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
(Show Context)
Abstract. Privacy-preserving decision tree mining is an important problem that has yet to be thoroughly understood. In fact, the privacypreserving decision tree mining method explored in the pioneer paper [1] was recently showed to be completely broken, because its data perturbation technique is fundamentally flawed [2]. However, since the general framework presented in [1] has some nice and useful features in practice, it is natural to ask if it is possible to rescue the framework by, say, utilizing a different data perturbation technique. In this paper, we answer this question affirmatively by presenting such a data perturbation technique based on random substitutions. We show that the resulting privacy-preserving decision tree mining method is immune to attacks (including the one introduced in [2]) that are seemingly relevant. Systematic experiments show that it is also effective. Keywords: Privacy-preservation, decision tree, data mining, perturbation, matrix. 1
On Static and Dynamic Methods for Condensation-Based Privacy-Preserving Data Mining
"... In recent years, privacy-preserving data mining has become an important problem because of the large amount of personal data which is tracked by many business applications. In many cases, users are unwilling to provide personal information unless the privacy of sensitive information is guaranteed. I ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
(Show Context)
In recent years, privacy-preserving data mining has become an important problem because of the large amount of personal data which is tracked by many business applications. In many cases, users are unwilling to provide personal information unless the privacy of sensitive information is guaranteed. In this paper, we propose a new framework for privacy-preserving data mining of multi-dimensional data. Previous work for privacy-preserving data mining uses a perturbation approach which reconstructs data distributions in order to perform the mining. Such an approach treats each dimension independently and therefore ignores the correlations between the different dimensions. In addition, it requires the development of a new distribution-based algorithm for each data mining problem, since it does not use the multi-dimensional records, but uses aggregate distributions of the data as input. This leads to a fundamental re-design of data mining algorithms. In this paper, we will develop a new and flexible approach for privacy-preserving data mining which does not require new problem-specific algorithms, since it maps the original data set into a new anonymized data set. These anonymized data closely match the characteristics of the original data including the correlations among the different dimensions. We will show how to extend the method to the case of data streams. We present empirical results illustrating the effectiveness of the method. We also show the efficiency of the method for data streams.
Privacy Preserving Market Basket Data Analysis
- In Proceedings the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases
, 2007
"... ..."
(Show Context)
unknown title
"... The publication of transaction data, such as market basket data, medical records, and query logs, serves the public benefit. Mining such data allows for the derivation of association rules that connect certain items to others with measurable confidence. Still, this type of data analysis poses a priv ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
(Show Context)
The publication of transaction data, such as market basket data, medical records, and query logs, serves the public benefit. Mining such data allows for the derivation of association rules that connect certain items to others with measurable confidence. Still, this type of data analysis poses a privacy threat; an adversary having partial information on a person’s behavior may confidently associate that person to an item deemed to be sensitive. Ideally, an anonymization of such data should lead to an inference-proof version that prevents the association of individuals to sensitive items, while otherwise allowing for truthful associations to be derived. Original approaches to this problem were based on value perturbation, damaging data integrity. Recently, value generalization has been proposed as an alternative; still, approaches based on it have assumed either that all items are equally sensitive, or that some are sensitive and can be known to an adversary only by association, while others are non-sensitive and can be known directly. Yet in reality there is a distinction between sensitive and non-sensitive items, but an adversary may possess information on any of them. Most critically, no antecedent method aims at a clear inference-proof privacy guarantee. In this paper, we propose
Privacy-Preserving Data Mining through Knowledge Model Sharing
"... Abstract. Privacy-preserving data mining (PPDM) is an important topic to both industry and academia. In general there are two approaches to tackling PPDM, one is statistics-based and the other is crypto-based. The statistics-based approach has the advantage of being efficient enough to deal with lar ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Privacy-preserving data mining (PPDM) is an important topic to both industry and academia. In general there are two approaches to tackling PPDM, one is statistics-based and the other is crypto-based. The statistics-based approach has the advantage of being efficient enough to deal with large volume of datasets. The basic idea underlying this approach is to let the data owners publish some sanitized versions of their data (e.g., via perturbation, generalization, or ℓ-diversification), which are then used for extracting useful knowledge models such as decision trees. In this paper, we present a new method for statistics-based PPDM. Our method differs from the existing ones because it lets the data owners share with each other the knowledge models extracted from their own private datasets, rather than to let the data owners publish any of their own private datasets (not even in any sanitized form). The knowledge models derived from the individual datasets are used to generate some pseudo-data that are then used for extracting the desired “global " knowledge models. While instrumental, there are some technical subtleties that need be carefully addressed. Specifically, we propose an algorithm for generating pseudo-data according to paths of a decision tree, a method for adapting anonymity measures of datasets to measure the privacy of decision trees, and an algorithm that prunes a decision tree to satisfy a given anonymity requirement. Through an empirical study, we show that predictive models learned using our method are significantly more accurate than those learned using the existing ℓ-diversity method in both centralized and distributed environments with different types of datasets, predictive models, and utility measures. 1
Publishing Microdata with a Robust Privacy Guarantee ∗
"... Today, the publication of microdata poses a privacy threat. Vast research has striven to define the privacy condition that microdata should satisfy before it is released, and devise algorithms to anonymize the data so as to achieve this condition. Yet, no method proposed to date explicitly bounds th ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
Today, the publication of microdata poses a privacy threat. Vast research has striven to define the privacy condition that microdata should satisfy before it is released, and devise algorithms to anonymize the data so as to achieve this condition. Yet, no method proposed to date explicitly bounds the percentage of information an adversary gains after seeing the published data for each sensitive value therein. This paper introduces β-likeness, an appropriately robust privacy model for microdata anonymization, along with two anonymization schemes designed therefor, the one based on generalization, and the other based on perturbation. Our model postulates that an adversary’s confidence on the likelihood of a certain sensitive-attribute (SA) value should not increase, in relative difference terms, by more than a predefined threshold. Our techniques aim to satisfy a given β threshold with little information loss. We experimentally demonstrate that (i) our model provides an effective privacy guarantee in a way that predecessor models cannot, (ii) our generalization scheme is more effective and efficient in its task than methods adapting algorithms for the k-anonymity model, and (iii) our perturbation method outperforms a baseline approach. Moreover, we discuss in detail the resistance of our model and methods to attacks proposed in previous research. 1.