## Secure Regression on Distributed Databases (2004)

Venue: | J. Computational and Graphical Statist |

Citations: | 28 - 15 self |

### BibTeX

@ARTICLE{Karr04secureregression,

author = {Alan F. Karr and Alan F. Karr and Xiaodong Lin and Xiaodong Lin and Ashish P. Sanil and Ashish P. Sanil and Jerome P. Reiter and Jerome P. Reiter},

title = {Secure Regression on Distributed Databases},

journal = {J. Computational and Graphical Statist},

year = {2004},

volume = {14},

pages = {263--279}

}

### OpenURL

### Abstract

We present several methods for performing linear regression on the union of distributed databases that preserve, to varying degrees, confidentiality of those databases. Such methods can be used by federal or state statistical agencies to share information from their individual databases, or to make such information available to others. Secure data integration, which provides the lowest level of protection, actually integrates the databases, but in a manner that no database owner can determine the origin of any records other than its own. Regression, associated diagnostics or any other analysis then can be performed on the integrated data.

### Citations

1435 | Generalized additive models
- Hastie, Tibshirani
- 1990
(Show Context)
Citation Context ... the real-data residuals. To determine b j ut for continuous independent variables, each agency fits a smooth curve to the relationship between their r j and x j t using a generalized additive model (=-=Hastie and Tibshirani, 1990-=-). The b j ut equals the value of this curve at x js ut . To determine the v j ut, each agency finds the unit I in its data whose value in the real-data x j t is closest to x js ut ; that is, it finds... |

640 | Privacy-preserving data mining
- Agrawal, Srikant
- 2000
(Show Context)
Citation Context ...can protect database subjects from identity or attribute disclosure (§2.1). General approaches include building blocks from SMPC (Lindell and Pinkas, 2000) and adding noise to data—jittering in §2=-=.1 (Agrawal and Srikant, 2000-=-). Other problems that have been treated include association rules (Vaidya and Clifton, 2002; Evfimievski et al., 2002; Kantarcioglu and Clifton, 2002), classification (Du et al., 2004), clustering (V... |

458 |
How to play any mental game
- Goldreich, Micali, et al.
- 1987
(Show Context)
Citation Context ...tical disclosure limitation problems (Duncan et al., 2001; Duncan and Stokes, 2004; Gomatam et al., 2003; Dobra et al., 2002, 2003). 2.2 Secure Multi-Party Computation Secure multi-party computation (=-=Goldreich et al., 1987; Go-=-ldwasser, 1997; Yao, 1982) is concerned in general with performing computations in which multiple parties hold “pieces” of the computation. They wish to obtain the final result but at the same tim... |

403 | Privacy preserving data mining
- Lindell, Pinkas
(Show Context)
Citation Context ...incipally at preserving the privacy of the database holders, but also can protect database subjects from identity or attribute disclosure (§2.1). General approaches include building blocks from SMPC =-=(Lindell and Pinkas, 2000) a-=-nd adding noise to data—jittering in §2.1 (Agrawal and Srikant, 2000). Other problems that have been treated include association rules (Vaidya and Clifton, 2002; Evfimievski et al., 2002; Kantarcio... |

258 | Privacy Preserving Mining of Association Rules
- Evfimievski, Srikant, et al.
- 2002
(Show Context)
Citation Context ...from SMPC (Lindell and Pinkas, 2000) and adding noise to data—jittering in §2.1 (Agrawal and Srikant, 2000). Other problems that have been treated include association rules (Vaidya and Clifton, 200=-=2; Evfimievski et al., 2002-=-; Kantarcioglu and Clifton, 2002), classification (Du et al., 2004), clustering (Vaidya and Clifton, 2003; Lin et al., 2004), and linear regression for vertically partitioned data (Du et al., 2004; Sa... |

225 | Privacy Preserving Association Rule Mining in Vertically Partitioned Data
- Vaidya, Clifton
- 2002
(Show Context)
Citation Context ...s include building blocks from SMPC (Lindell and Pinkas, 2000) and adding noise to data—jittering in §2.1 (Agrawal and Srikant, 2000). Other problems that have been treated include association rule=-=s (Vaidya and Clifton, 2002-=-; Evfimievski et al., 2002; Kantarcioglu and Clifton, 2002), classification (Du et al., 2004), clustering (Vaidya and Clifton, 2003; Lin et al., 2004), and linear regression for vertically partitioned... |

181 | Privacy-preserving distributed mining of association rules on horizontally partitioned data
- Kantarcioglu, Clifton
- 2002
(Show Context)
Citation Context ... to data—jittering in Section 2.1 (Agrawal and Srikant 2000). Other problems that have been treated include association rules (Vaidya and Clifton 2002; Evfimievski, Srikant, Agrawal, and Gehrke 2004; =-=Kantarcioglu and Clifton 2002-=-), classification (Du, Han, and Chen 2004), clustering (Vaidya and Clifton 2003; Lin, Clifton, and Zhu 2004), and linear regression for vertically partitioned data (Du, Han, and Chen 2004; Sanil et al... |

110 |
Secret sharing homomorphisms: Keeping shares of a secret sharing
- Benaloh
- 1986
(Show Context)
Citation Context ...late v = � K j=1 v j in such a manner that each Agency j can learn only the minimum possible about the other agencies’ values, namely, the value of v(− j) = � ℓ�= j vℓ. The secure summat=-=ion protocol (Benaloh, 1987-=-), which is shown pictorially in Figure 2, can be used to effect this computation. Choose m to be a very large number, say 2 100 , which is known to all the agencies. Assume that v is known to lie in ... |

93 |
Hedonic prices and the demand for clean air
- Harrison, Rubinfeld
- 1978
(Show Context)
Citation Context ...via secure data integration. In §5, we describe how to perform diagnostics in the setting of this subsection. 4.3 Example We illustrate the secure regression protocol using the “Boston housing data=-=” (Harrison and Rubinfeld, 1978-=-). There are 506 data cases, representing towns around Boston, which we partitioned among K = 3 agencies representing, for example, regional governmental authorities. The database sizes are n1 = 172, ... |

77 |
Multi-party computations: Past and present
- Goldwasser
- 1997
(Show Context)
Citation Context ...ion problems (Duncan et al., 2001; Duncan and Stokes, 2004; Gomatam et al., 2003; Dobra et al., 2002, 2003). 2.2 Secure Multi-Party Computation Secure multi-party computation (Goldreich et al., 1987; =-=Goldwasser, 1997; Ya-=-o, 1982) is concerned in general with performing computations in which multiple parties hold “pieces” of the computation. They wish to obtain the final result but at the same time disclose as litt... |

66 | Privacy-preserving multivariate statistical analysis: linear regression and classification
- Du, Chen, et al.
(Show Context)
Citation Context ...§2.1 (Agrawal and Srikant, 2000). Other problems that have been treated include association rules (Vaidya and Clifton, 2002; Evfimievski et al., 2002; Kantarcioglu and Clifton, 2002), classification =-=(Du et al., 2004-=-), clustering (Vaidya and Clifton, 2003; Lin et al., 2004), and linear regression for vertically partitioned data (Du et al., 2004; Sanil et al., 2004b). Many of these techniques focus on computation ... |

54 | Disclosure risk vs. data utility: The R-U confidentiality map. Management Sci
- Duncan, Keller-McNulty, et al.
- 2004
(Show Context)
Citation Context ...a quality (National Institute of Statistical Sciences, 2003, 2004). Much of this research focuses on explicit disclosure risk–data utility formulations for statistical disclosure limitation problems=-= (Duncan et al., 2001-=-; Duncan and Stokes, 2004; Gomatam et al., 2003; Dobra et al., 2002, 2003). 2.2 Secure Multi-Party Computation Secure multi-party computation (Goldreich et al., 1987; Goldwasser, 1997; Yao, 1982) is c... |

49 |
Multiple imputation for statistical disclosure limitation
- Raghunathan, Reiter, et al.
- 2003
(Show Context)
Citation Context ...be created, which preserve some characteristics of the original data, but whose records simply do not correspond to real individuals or establishments (Duncan and Keller–McNulty, 2001; Reiter, 2003a=-=; Raghunathan et al., 2003-=-). Analysis servers (Gomatam et al., 2004), which disseminate analyses of data rather than data themselves, are another alternative. With support from the Digital Government program at the National Sc... |

41 | Privacypreserving distributed mining of association rules on horizontally partitioned data
- Kantarcioglou, Clifton
- 2002
(Show Context)
Citation Context ...kas, 2000) and adding noise to data—jittering in §2.1 (Agrawal and Srikant, 2000). Other problems that have been treated include association rules (Vaidya and Clifton, 2002; Evfimievski et al., 200=-=2; Kantarcioglu and Clifton, 2002-=-), classification (Du et al., 2004), clustering (Vaidya and Clifton, 2003; Lin et al., 2004), and linear regression for vertically partitioned data (Du et al., 2004; Sanil et al., 2004b). Many of thes... |

41 | Elements of Statistical Disclosure Control - Willenborg, Waal - 2001 |

34 | A practical approach to solve secure multi-party computation problems
- Du, Zhan
- 2002
(Show Context)
Citation Context ...othing about B other than what can be extracted from A and f (A, B), and symmetrically for Party 2. In practice, absolute security may not be possible, so some techniques for SMPC rely on heuristics (=-=Du and Zhan, 2002) or -=-randomization. Secure summation (§2.4) is an example of the latter. Various assumptions are possible about the participating parties, for example, whether they use “correct” values in the computa... |

29 |
Privacy Preserving k-Means Clustering over Vertically Partitioned Data
- Vaidya, Clifton
- 2003
(Show Context)
Citation Context ...0). Other problems that have been treated include association rules (Vaidya and Clifton, 2002; Evfimievski et al., 2002; Kantarcioglu and Clifton, 2002), classification (Du et al., 2004), clustering (=-=Vaidya and Clifton, 2003; Li-=-n et al., 2004), and linear regression for vertically partitioned data (Du et al., 2004; Sanil et al., 2004b). Many of these techniques focus on computation of the “final result” to the exclusion ... |

25 |
Privacy preserving clustering with distributed EM mixture modeling
- Lin, Clifton, et al.
(Show Context)
Citation Context ...ve been treated include association rules (Vaidya and Clifton, 2002; Evfimievski et al., 2002; Kantarcioglu and Clifton, 2002), classification (Du et al., 2004), clustering (Vaidya and Clifton, 2003; =-=Lin et al., 2004), a-=-nd linear regression for vertically partitioned data (Du et al., 2004; Sanil et al., 2004b). Many of these techniques focus on computation of the “final result” to the exclusion of supporting info... |

24 | Inference for partially synthetic, public use microdata sets. Survey Methodology 181189 - Reiter - 2003 |

23 | Privacy preserving regression modelling via distributed computation
- Sanil, Karr, et al.
- 2004
(Show Context)
Citation Context ...02; Kantarcioglu and Clifton, 2002), classification (Du et al., 2004), clustering (Vaidya and Clifton, 2003; Lin et al., 2004), and linear regression for vertically partitioned data (Du et al., 2004; =-=Sanil et al., 2004b). -=-Many of these techniques focus on computation of the “final result” to the exclusion of supporting information seen by statisticians as essential. For example, least squares regression estimators ... |

17 | Software systems for tabular data releases
- Dobra, Fienberg, et al.
(Show Context)
Citation Context ...Much of this research focuses on explicit disclosure risk–data utility formulations for statistical disclosure limitation problems (Duncan et al., 2001; Duncan and Stokes, 2004; Gomatam et al., 2003=-=; Dobra et al., 2002-=-, 2003). 2.2 Secure Multi-Party Computation Secure multi-party computation (Goldreich et al., 1987; Goldwasser, 1997; Yao, 1982) is concerned in general with performing computations in which multiple ... |

17 | Data dissemination and disclosure limitation in a world without microdata: A risk-utility framework for remote access servers
- Gomatam, Karr, et al.
- 2005
(Show Context)
Citation Context ...cs of the original data, but whose records simply do not correspond to real individuals or establishments (Duncan and Keller–McNulty, 2001; Reiter, 2003a; Raghunathan et al., 2003). Analysis servers=-= (Gomatam et al., 2004-=-), which disseminate analyses of data rather than data themselves, are another alternative. With support from the Digital Government program at the National Science Foundation (NSF) and multiple feder... |

17 | Data swapping as a decision problem
- Gomatam, Karr, et al.
- 2003
(Show Context)
Citation Context ...ciences, 2003, 2004). Much of this research focuses on explicit disclosure risk–data utility formulations for statistical disclosure limitation problems (Duncan et al., 2001; Duncan and Stokes, 2004=-=; Gomatam et al., 2003-=-; Dobra et al., 2002, 2003). 2.2 Secure Multi-Party Computation Secure multi-party computation (Goldreich et al., 1987; Goldwasser, 1997; Yao, 1982) is concerned in general with performing computation... |

16 | Preserving confidentiality of high-dimensional tabular data: Statistical and computational issues - Dobra, Karr, et al. |

14 |
Computational disclosure control for medical microdata: the Datafly system
- Sweeney
- 1997
(Show Context)
Citation Context ...s. In one well-known example, date of birth, 5-digit ZIP code of residence and gender alone produced identity disclosures from a medical records database by linkage to public voter registration data (=-=Sweeney, 1997). I-=-dentity disclosure can also occur by means of rare or extreme attribute values, such as very high incomes. Aggregation—geographical (Karr et al., 2001; Lee et al., 2001) or otherwise—is a principa... |

12 | Disseminating Information but Protecting Confidentiality
- Karr, Lee, et al.
- 2001
(Show Context)
Citation Context ...ase by linkage to public voter registration data (Sweeney, 1997). Identity disclosure can also occur by means of rare or extreme attribute values, such as very high incomes. Aggregation—geographical=-= (Karr et al., 2001; -=-Lee et al., 2001) or otherwise—is a principal strategy to reduce identity disclosures. The Census Bureau does not release data at aggregations 3sless than 100,000. Another is top-coding: to prevent ... |

12 | Privacy preserving analysis of vertically partitioned data using secure matrix products
- Sanil, Karr, et al.
(Show Context)
Citation Context ...02; Kantarcioglu and Clifton, 2002), classification (Du et al., 2004), clustering (Vaidya and Clifton, 2003; Lin et al., 2004), and linear regression for vertically partitioned data (Du et al., 2004; =-=Sanil et al., 2004b). -=-Many of these techniques focus on computation of the “final result” to the exclusion of supporting information seen by statisticians as essential. For example, least squares regression estimators ... |

8 |
Confidentiality, Disclosure and Data Access: Theory and Practical Application for Statistical Agencies
- Doyle, Lane, et al.
- 2001
(Show Context)
Citation Context ...ity of North Carolina at Chapel Hill is the dominant employer in Orange County, NC, so that the rate of workplace injuries for the county is, in effect, that for UNC. There is a wealth of techniques (=-=Doyle et al., 2001; Fe-=-deral Committee on Statistical Methodology, 1994; Journal of Official Statistics, 1998; Willenborg and de Waal, 1996, 2001) for “preventing” disclosure, which preserve low-dimensional statistical ... |

7 | Analysis of aggregated data in survey sampling with application to fertilizer/pesticide usage surveys
- Lee, Holloman, et al.
(Show Context)
Citation Context ...ublic voter registration data (Sweeney, 1997). Identity disclosure can also occur by means of rare or extreme attribute values, such as very high incomes. Aggregation—geographical (Karr et al., 2001=-=; Lee et al., 2001) -=-or otherwise—is a principal strategy to reduce identity disclosures. The Census Bureau does not release data at aggregations 3sless than 100,000. Another is top-coding: to prevent disclosing identit... |

7 | Model diagnostics for remote access regression servers - Reiter |

4 | Introduction to the Special Issue: Disclosure Limitation Methods for Protecting the Con� dentiality of Statistical Data - Fienberg, Willenborg - 1998 |

4 | Privacy preserving mining of association rules (invited journal version - Evfimievski, Srikant, et al. - 2004 |

3 | Mask or impute - Duncan, Keller–McNulty - 2001 |

1 | Diagnostics for Remote Access Regression Servers - “Model |