Results 1 - 10
of
23
ℓ-diversity: Privacy beyond k-anonymity
- In ICDE
, 2006
"... Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called k-anonymity has gained popularity. In a k-anonymized dataset, each record is indistinguishable from at least k − 1 other records with resp ..."
Abstract
-
Cited by 294 (8 self)
- Add to MetaCart
Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called k-anonymity has gained popularity. In a k-anonymized dataset, each record is indistinguishable from at least k − 1 other records with respect to certain “identifying ” attributes. In this paper we show using two simple attacks that a k-anonymized dataset has some subtle, but severe privacy problems. First, an attacker can discover the values of sensitive attributes when there is little diversity in those sensitive attributes. This kind of attack is a known problem [60]. Second, attackers often have background knowledge, and we show that k-anonymity does not guarantee privacy against attackers using background knowledge. We give a detailed analysis of these two attacks and we propose a novel and powerful privacy criterion called ℓ-diversity that can defend against such attacks. In addition to building a formal foundation for ℓ-diversity, we show in an experimental evaluation that ℓ-diversity is practical and can be implemented efficiently. 1.
Deriving private information from randomized data
- In Proceedings of the ACM SIGMOD Conference on Management of Data
, 2005
"... Randomization has emerged as a useful technique for data disguising in privacy-preserving data mining. Its privacy properties have been studied in a number of papers. Kargupta et al. challenged the randomization schemes, and they pointed out that randomization might not be able to preserve privacy. ..."
Abstract
-
Cited by 66 (1 self)
- Add to MetaCart
Randomization has emerged as a useful technique for data disguising in privacy-preserving data mining. Its privacy properties have been studied in a number of papers. Kargupta et al. challenged the randomization schemes, and they pointed out that randomization might not be able to preserve privacy. However, it is still unclear what factors cause such a security breach, how they affect the privacy preserving property of the randomization, and what kinds of data have higher risk of disclosing their private contents even though they are randomized. We believe that the key factor is the correlations among attributes. We propose two data reconstruction methods that are based on data correlations. One method uses the Principal Component Analysis (PCA) technique, and the other method uses the Bayes Estimate (BE) technique. We have conducted theoretical and experimental analysis on the relationship between data correlations and the amount of private information that can be disclosed based our proposed data reconstructions schemes. Our studies have shown that when the correlations are high, the original data can be reconstructed more accurately, i.e., more private information can be disclosed. To improve privacy, we propose a modified randomization scheme, in which we let the correlation of random noises “similar ” to the original data. Our results have shown that the reconstruction accuracy of both PCA-based and BEbased schemes become worse as the similarity increases.
Smooth sensitivity and sampling in private data analysis
- In STOC
, 2007
"... We introduce a new, generic framework for private data analysis. The goal of private data analysis is to release aggregate information about a data set while protecting the privacy of the individuals whose information the data set contains. Our framework allows one to release functions f of the data ..."
Abstract
-
Cited by 59 (7 self)
- Add to MetaCart
We introduce a new, generic framework for private data analysis. The goal of private data analysis is to release aggregate information about a data set while protecting the privacy of the individuals whose information the data set contains. Our framework allows one to release functions f of the data with instance-based additive noise. That is, the noise magnitude is determined not only by the function we want to release, but also by the database itself. One of the challenges is to ensure that the noise magnitude does not leak information about the database. To address that, we calibrate the noise magnitude to the smooth sensitivity of f on the database x — a measure of variability of f in the neighborhood of the instance x. The new framework greatly expands the applicability of output perturbation, a technique for protecting individuals ’ privacy by adding a small amount of random noise to the released statistics. To our knowledge, this is the first formal analysis of the effect of instance-based noise in the context of data privacy. Our framework raises many interesting algorithmic questions. Namely, to apply the framework one must compute or approximate the smooth sensitivity of f on x. We show how to do this efficiently for several different functions, including the median and the cost of the minimum spanning tree. We also give a generic procedure based on sampling that allows one to release f(x) accurately on many databases x. This procedure is applicable even when no efficient algorithm for approximating smooth sensitivity of f is known or when f is given as a black box. We illustrate the procedure by applying it to k-SED (k-means) clustering and learning mixtures of Gaussians.
Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification
- In Proceedings of the 4th SIAM International Conference on Data Mining
, 2004
"... analysis technique that has found applications in various areas. In this paper, we study some multivariate statistical analysis methods in Secure 2-party Computation (S2C) framework illustrated by the following scenario: two parties, each having a secret data set, want to conduct the statistical ana ..."
Abstract
-
Cited by 45 (1 self)
- Add to MetaCart
analysis technique that has found applications in various areas. In this paper, we study some multivariate statistical analysis methods in Secure 2-party Computation (S2C) framework illustrated by the following scenario: two parties, each having a secret data set, want to conduct the statistical analysis on their joint data, but neither party is willing to disclose its private data to the other party or any third party. The current statistical analysis techniques cannot be used directly to support this kind of computation because they require all parties to send the necessary data to a central place. In this paper, We define two Secure 2-party multivariate statistical analysis problems: Secure 2-party Multivariate Linear Regression problem and Secure 2-party Multivariate Classification problem. We have developed a practical security model, based on which we have developed a number of building blocks for solving these two problems.
Privacy preserving regression modelling via distributed computation
- In Proc. Tenth ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining
, 2004
"... www.niss.org ..."
Privacy preserving crowd monitoring: Counting people without people models or tracking
- CVPR
, 2008
"... We present a privacy-preserving system for estimating the size of inhomogeneous crowds, composed of pedestrians that travel in different directions, without using explicit object segmentation or tracking. First, the crowd is segmented into components of homogeneous motion, using the mixture of dynam ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
We present a privacy-preserving system for estimating the size of inhomogeneous crowds, composed of pedestrians that travel in different directions, without using explicit object segmentation or tracking. First, the crowd is segmented into components of homogeneous motion, using the mixture of dynamic textures motion model. Second, a set of simple holistic features is extracted from each segmented region, and the correspondence between features and the number of people per segment is learned with Gaussian Process regression. We validate both the crowd segmentation algorithm, and the crowd counting system, on a large pedestrian dataset (2000 frames of video, containing 49,885 total pedestrian instances). Finally, we present results of the system running on a full hour of video. 1.
Secure distributed data-mining and its application to large-scale network measurements
- SIGCOMM Comput. Commun. Rev
"... The rapid growth of the Internet over the last decade has been startling. However, efforts to track its growth have often fallen afoul of bad data — for instance, how much traffic does the Internet now carry? The problem is not that the data is technically hard to obtain, or that it does not exist, ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
The rapid growth of the Internet over the last decade has been startling. However, efforts to track its growth have often fallen afoul of bad data — for instance, how much traffic does the Internet now carry? The problem is not that the data is technically hard to obtain, or that it does not exist, but rather that the data is not shared. Obtaining an overall picture requires data from multiple sources, few of whom are open to sharing such data, either because it violates privacy legislation, or exposes business secrets. Likewise, detection of global Internet health problems is hampered by a lack of data sharing. The approaches used so far in the Internet, e.g. trusted third parties, or data anonymization, have been only partially successful, and are not widely adopted. The paper presents a method for performing computations on shared data without any participants revealing their secret data. For example, one can compute the sum of traffic over a set of service providers without any service provider learning the traffic of another. The method is simple, scalable, and flexible enough to perform a wide range of valuable operations on Internet data.
Distributed Data Mining and Agents
- In Engineering Applications of Artificial Intelligence
, 2005
"... Abstract. Multi-Agent Systems (MAS) offer an architecture for distributed problem solving. Distributed Data Mining (DDM) algorithms focus on one class of such distributed problem solving tasks—analysis and modeling of distributed data. This paper offers a perspective on DDM algorithms in the context ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Abstract. Multi-Agent Systems (MAS) offer an architecture for distributed problem solving. Distributed Data Mining (DDM) algorithms focus on one class of such distributed problem solving tasks—analysis and modeling of distributed data. This paper offers a perspective on DDM algorithms in the context of multiagents systems. It discusses broadly the connection between DDM and MAS. It provides a high-level survey of DDM, then focuses on distributed clustering algorithms and some potential applications in multi-agent-based problem solving scenarios. It reviews algorithms for distributed clustering, including privacypreserving ones. It describes challenges for clustering in sensor-network environments, potential shortcomings of the current algorithms, and future work accordingly. It also discusses confidentiality (privacy preservation) and presents a new algorithm for privacy-preserving density-based clustering.
Privacy Preserving Nearest Neighbor Search
, 2006
"... Data mining is frequently obstructed by privacy concerns. In many cases data is distributed, and bringing the data together in one place for analysis is not possible due to privacy laws (e.g. HIPAA) or policies. Privacy preserving data mining techniques have been developed to address this issue by p ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Data mining is frequently obstructed by privacy concerns. In many cases data is distributed, and bringing the data together in one place for analysis is not possible due to privacy laws (e.g. HIPAA) or policies. Privacy preserving data mining techniques have been developed to address this issue by providing mechanisms to mine the data while giving certain privacy guarantees. In this work we address the issue of privacy preserving nearest neighbor search, which forms the kernel of many data mining applications. To this end, we present a novel algorithm based on secure multiparty computation primitives to compute the nearest neighbors of records in horizontally distributed data. We show how this algorithm can be used in three important data mining algorithms, namely LOF outlier detection, SNN clustering, and kNN classification. 1

