Results 1  10
of
18
Privacy preserving regression modelling via distributed computation
 In Proc. Tenth ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining
, 2004
"... www.niss.org ..."
Data Dissemination and Disclosure Limitation In a World . . .
 STATIST. SCI
, 2004
"... Given the public's everincreasing concerns about data confidentiality, in the near future statistical agencies may be unable or unwilling, or even may not be legally allowed, to release any genuine microdatadata on individual units, such as individuals or establishments. In such a world, an al ..."
Abstract

Cited by 15 (11 self)
 Add to MetaCart
Given the public's everincreasing concerns about data confidentiality, in the near future statistical agencies may be unable or unwilling, or even may not be legally allowed, to release any genuine microdatadata on individual units, such as individuals or establishments. In such a world, an alternative dissemination strategy is remote access analysis servers, to which users submit requests for output from statistical models fit using the data, but are not allowed access to the data themselves. Analysis servers, however, are not free from the risk of disclosure, especially in the face of multiple, interacting queries. We describe these risks and propose quantifiable measures of risk and data utility that can be used to specify which queries can be answered, and with what output. The riskutility framework is illustrated for regression models.
Privacy preserving analysis of vertically partitioned data using secure matrix products
 J. Official
, 2004
"... Reluctance of statistical agencies and other data owners to share their possibly confidential or proprietary data with others who own related databases is a serious impediment to conducting mutually beneficial analyses. In this paper, we propose a protocol for securely computing matrix products in v ..."
Abstract

Cited by 12 (10 self)
 Add to MetaCart
Reluctance of statistical agencies and other data owners to share their possibly confidential or proprietary data with others who own related databases is a serious impediment to conducting mutually beneficial analyses. In this paper, we propose a protocol for securely computing matrix products in vertically partitioned data, i.e., the data sets have the same subjects but disjoint attributes. This protocol allows data owners to estimate coefficients and standard errors of linear regressions, and to examine regression model diagnostics, without disclosing the values of their attributes to each other or to third parties. The protocol can be used to perform other procedures for which sample means and covariances are sufficient statistics. 1
Privacypreserving svm classification on vertically partitioned data
 in PanAsia Conference on Knowledge Discover and Data Mining (PAKDD
, 2006
"... Abstract. Classical data mining algorithms implicitly assume complete access to all data, either in centralized or federated form. However, privacy and security concerns often prevent sharing of data, thus derailing data mining projects. Recently, there has been growing focus on finding solutions to ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Abstract. Classical data mining algorithms implicitly assume complete access to all data, either in centralized or federated form. However, privacy and security concerns often prevent sharing of data, thus derailing data mining projects. Recently, there has been growing focus on finding solutions to this problem. Several algorithms have been proposed that do distributed knowledge discovery, while providing guarantees on the nondisclosure of data. Classification is an important data mining problem applicable in many diverse domains. The goal of classification is to build a model which can predict an attribute (binary attribute in this work) based on the rest of attributes. We propose an efficient and secure privacypreserving algorithm for support vector machine (SVM) classification over vertically partitioned data. 1
Privacypreserving record linkage
 Privacy in Statistical Databases (PSD 2010
, 2010
"... Abstract. Record linkage has a long tradition in both the statistical and the computer science literature. We survey current approaches to the record linkage problem in a privacyaware setting and contrast these with the more traditional literature. We also identify several important open questions ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Abstract. Record linkage has a long tradition in both the statistical and the computer science literature. We survey current approaches to the record linkage problem in a privacyaware setting and contrast these with the more traditional literature. We also identify several important open questions that pertain to private record linkage from different perspectives. 1
Secure Multiple Linear Regression Based on Homomorphic Encryption
, 2011
"... We consider the problem of linear regression where the data are split up and held by different parties. We conceptualize the existence of a single combined database containing all of the information for the individuals in the separate databases and for the union of the variables. We propose an appro ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
We consider the problem of linear regression where the data are split up and held by different parties. We conceptualize the existence of a single combined database containing all of the information for the individuals in the separate databases and for the union of the variables. We propose an approach that gives full statistical calculation on this combined database without actually combining information sources. We focus on computing linear regression and ridge regression estimates, as well as certain goodness of fit statistics. We make use of homomorphic encryption in constructing a protocol for regression analysis which adheres to the definitions of security laid out in the cryptography literature. Our approach provides only the final result of the calculations compared with other methods that share intermediate values and thus present an opportunity for compromise of privacy. We perform an experiment on a dataset extracted from the Current Population Survey, with 51, 016 cases and 22 covariates, to show that our approach is practical for moderate sized problems.
Secure, privacypreserving analysis of distributed databases
 Technometrics
"... There is clear value, in both industrial and government settings, derived from performing statistical analyses that, in effect, integrate data in multiple, distributed databases. However, the barriers to actually integrating the data can be substantial or even insurmountable. Corporations may be unw ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
There is clear value, in both industrial and government settings, derived from performing statistical analyses that, in effect, integrate data in multiple, distributed databases. However, the barriers to actually integrating the data can be substantial or even insurmountable. Corporations may be unwilling to share proprietary databases such as chemical databases held by pharmaceutical manufacturers, government agencies are subject to laws protecting confidentiality of data subjects, and even the sheer volume of the data may preclude actual data integration. In this paper, we show how tools from modern information technology—specifically, secure multiparty computation and networking—can be used to perform statistically valid analyses of distributed databases. The common characteristic of the methods we describe is that the owners share sufficient statistics computed on the local databases in a way that protects each owner from the others. That is, while each owner can calculate the “complement ” of its contribution to the analysis, it cannot discern which other owners contributed what to that complement. Our focus is on horizontally partitioned data: the data records rather than the data attributes are spread among the owners. We present protocols for secure regression, contingency tables, maximum likelihood and Bayesian analysis. For lowrisk situations, we describe a secure data integration protocol that integrates the databases but prevents owners from learning the source of data records other than their own. Finally, we outline three current research directions: a software system implementing the protocols, secure EM algorithms, and partially trusted third parties, which reduce incentives to owners not to be honest. 1 1
PrivacyPreserving Ridge Regression on Hundreds of Millions of Records
"... Abstract—Ridge regression is an algorithm that takes as input a large number of data points and finds the bestfit linear curve through these points. The algorithm is a building block for many machinelearning operations. We present a system for privacypreserving ridge regression. The system output ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Abstract—Ridge regression is an algorithm that takes as input a large number of data points and finds the bestfit linear curve through these points. The algorithm is a building block for many machinelearning operations. We present a system for privacypreserving ridge regression. The system outputs the bestfit curve in the clear, but exposes no other information about the input data. Our approach combines both homomorphic encryption and Yao garbled circuits, where each is used in a different part of the algorithm to obtain the best performance. We implement the complete system and experiment with it on real datasets, and show that it significantly outperforms pure implementations based only on homomorphic encryption or Yao circuits. x1,y1 x x2,y2
Secure analysis of distributed chemical databases without data integration
 J. ComputerAided Molecular Design, November
, 2005
"... We present a method for performing statistical valid linear regressions on the union of distributed chemical databases that preserves confidentiality of those databases. The method employs secure multiparty computation to share local sufficient statistics necessary to compute least squares estimato ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
We present a method for performing statistical valid linear regressions on the union of distributed chemical databases that preserves confidentiality of those databases. The method employs secure multiparty computation to share local sufficient statistics necessary to compute least squares estimators of regression coefficients, error variances and other quantities of interest. We illustrate with an example containing four companies ’ rather different databases. Key words: Chemical database, distributed data, regression model, secure multiparty computation 1
Achieving Both Valid and Secure Logistic Regression Analysis on Aggregated Data from Different Private Sources
"... Abstract. Preserving the privacy of individual databases when carrying out statistical calculations has a relatively long history in statistics and had been the focus of much recent attention in machine learning. In this paper, we present a protocol for fitting a logistic regression when the data ar ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Abstract. Preserving the privacy of individual databases when carrying out statistical calculations has a relatively long history in statistics and had been the focus of much recent attention in machine learning. In this paper, we present a protocol for fitting a logistic regression when the data are held by separate parties—without actually combining information sources—by exploiting results from the literature on multiparty secure computation. Our protocol provides only the final result of the calculation compared with other methods that share intermediate values and thus present an opportunity for compromise of values in the individual databases. Our paper has two themes: (1) the development of a secure protocol for computing the logistic parameters, and a demonstration of its performances in practice, and (2) the presentation of an amended protocol that speeds up the computation of the logistic function. We illustrate the nature of the calculations and their accuracy using an extract of data from the Current Population Survey divided between two parties. Throughout, we build our protocol from existing cryptographic primitives, thus the novelty is in designing a concrete procedure for private computation of the logistic regression MLE rather than to propose new cryptographic constructions. Keywords: Distributed analysis; Logistic regression; Privacypreserving computation; Secure multiparty computation.