Results 1  10
of
88
Deriving private information from randomized data
 In SIGMOD
, 2005
"... Deriving private information from randomized data ..."
Abstract

Cited by 129 (2 self)
 Add to MetaCart
Deriving private information from randomized data
Random projectionbased multiplicative data perturbation for privacy preserving distributed data mining
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2006
"... This paper explores the possibility of using multiplicative random projection matrices for privacy preserving distributed data mining. It specifically considers the problem of computing statistical aggregates like the inner product matrix, correlation coefficient matrix, and Euclidean distance matri ..."
Abstract

Cited by 95 (6 self)
 Add to MetaCart
(Show Context)
This paper explores the possibility of using multiplicative random projection matrices for privacy preserving distributed data mining. It specifically considers the problem of computing statistical aggregates like the inner product matrix, correlation coefficient matrix, and Euclidean distance matrix from distributed privacy sensitive data possibly owned by multiple parties. This class of problems is directly related to many other datamining problems such as clustering, principal component analysis, and classification. This paper makes primary contributions on two different grounds. First, it explores Independent Component Analysis as a possible tool for breaching privacy in deterministic multiplicative perturbationbased models such as random orthogonal transformation and random rotation. Then, it proposes an approximate random projectionbased technique to improve the level of privacy protection while still preserving certain statistical characteristics of the data. The paper presents extensive theoretical analysis and experimental results. Experiments demonstrate that the proposed technique is effective and can be successfully used for different types of privacypreserving data mining applications.
Secure Regression on Distributed Databases
 J. Computational and Graphical Statist
, 2004
"... We present several methods for performing linear regression on the union of distributed databases that preserve, to varying degrees, confidentiality of those databases. Such methods can be used by federal or state statistical agencies to share information from their individual databases, or to ma ..."
Abstract

Cited by 38 (17 self)
 Add to MetaCart
(Show Context)
We present several methods for performing linear regression on the union of distributed databases that preserve, to varying degrees, confidentiality of those databases. Such methods can be used by federal or state statistical agencies to share information from their individual databases, or to make such information available to others. Secure data integration, which provides the lowest level of protection, actually integrates the databases, but in a manner that no database owner can determine the origin of any records other than its own. Regression, associated diagnostics or any other analysis then can be performed on the integrated data.
Privacy preserving regression modelling via distributed computation
 In Proc. Tenth ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining
, 2004
"... www.niss.org ..."
(Show Context)
Secure computation of the mean and related statistics
 in Proceedings of the Theory of Cryptography Conference, ser. Lecture Notes in Computer Science
"... Abstract. In recent years there has been massive progress in the development of technologies for storing and processing of data. If statistical analysis could be applied to such data when it is distributed between several organisations, there could be huge benefits. Unfortunately, in many cases, for ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
Abstract. In recent years there has been massive progress in the development of technologies for storing and processing of data. If statistical analysis could be applied to such data when it is distributed between several organisations, there could be huge benefits. Unfortunately, in many cases, for legal or commercial reasons, this is not possible. The idea of using the theory of multiparty computation to analyse efficient algorithms for privacy preserving datamining was proposed by Pinkas and Lindell. The point is that algorithms developed in this way can be used to overcome the apparent impasse described above: the owners of data can, in effect, pool their data while ensuring that privacy is maintained. Motivated by this, we describe how to securely compute the mean of an attribute value in a database that is shared between two parties. We also demonstrate that existing solutions in the literature that could be used to do this leak information, therefore underlining the importance of applying rigorous theoretical analysis rather than settling for ad hoc techniques. 1
Cryptographically Private Support Vector Machines
 In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2006
"... We study the problem of private classification using kernel methods. ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
(Show Context)
We study the problem of private classification using kernel methods.
Privacyaware Regression Modeling of Participatory Sensing Data
"... Many participatory sensing applications use data collected by participants to construct a public model of a system or phenomenon. For example, a health application might compute a model relating exercise and diet to amount of weight loss. While the ultimately computed model could be public, the indi ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
(Show Context)
Many participatory sensing applications use data collected by participants to construct a public model of a system or phenomenon. For example, a health application might compute a model relating exercise and diet to amount of weight loss. While the ultimately computed model could be public, the individual input and output data traces used to construct it may be private data of participants (e.g., their individual food intake, lifestyle choices, and resulting weight). This paper proposes and experimentally studies a technique that attempts to keep such input and output data traces private, while allowing accurate model construction. This is significantly different from perturbationbased techniques in that no noise is added. The main contribution of the paper is to show a certain data transformation at the client side that helps keeping the client data private while not introducing any additional error to model construction. We particularly focus on linear regression models which are widely used in participatory sensing applications. We use the data set from a mapbased participatory sensing service to evaluate our scheme. The service in question is a green navigation service that constructs regression models from participant data to predict the fuel consumption of vehicles on road segments. We evaluate our proposed mechanism by providing empirical evidence that: i) an individual data trace is generally hard to reconstruct with any reasonable accuracy, and ii) the regression model constructed using the transformed traces has a much smaller error than one based on additive data
PrivacyPreserving Ridge Regression on Hundreds of Millions of Records
"... Abstract—Ridge regression is an algorithm that takes as input a large number of data points and finds the bestfit linear curve through these points. The algorithm is a building block for many machinelearning operations. We present a system for privacypreserving ridge regression. The system output ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
Abstract—Ridge regression is an algorithm that takes as input a large number of data points and finds the bestfit linear curve through these points. The algorithm is a building block for many machinelearning operations. We present a system for privacypreserving ridge regression. The system outputs the bestfit curve in the clear, but exposes no other information about the input data. Our approach combines both homomorphic encryption and Yao garbled circuits, where each is used in a different part of the algorithm to obtain the best performance. We implement the complete system and experiment with it on real datasets, and show that it significantly outperforms pure implementations based only on homomorphic encryption or Yao circuits. x1,y1 x x2,y2
Privacypreserving data mashup
 In Proc. of the 12th International Conference on Extending Database Technology (EDBT). SaintPetersburg
, 2009
"... Mashup is a web technology that combines information from more than one source into a single web application. This technique provides a new platform for different data providers to flexibly integrate their expertise and deliver highly customizable services to their customers. Nonetheless, combining ..."
Abstract

Cited by 15 (9 self)
 Add to MetaCart
(Show Context)
Mashup is a web technology that combines information from more than one source into a single web application. This technique provides a new platform for different data providers to flexibly integrate their expertise and deliver highly customizable services to their customers. Nonetheless, combining data from different sources could potentially reveal personspecific sensitive information. In this paper, we study and resolve a reallife privacy problem in a data mashup application for the financial industry in Sweden, and propose a privacypreserving data mashup (PPMashup) algorithm to securely integrate private data from different data providers, whereas the integrated data still retains the essential information for supporting general data exploration or a specific data mining task, such as classification analysis. Experiments on reallife data suggest that our proposed method is effective for simultaneously preserving both privacy and information usefulness, and is scalable for handling large volume of data. 1.