## Privacy-preserving logistic regression

Citations: | 35 - 2 self |

### BibTeX

@MISC{Chaudhuri_privacy-preservinglogistic,

author = {Kamalika Chaudhuri and Claire Monteleoni},

title = {Privacy-preserving logistic regression},

year = {}

}

### OpenURL

### Abstract

This paper addresses the important tradeoff between privacy and learnability, when designing algorithms for learning from private databases. We focus on privacy-preserving logistic regression. First we apply an idea of Dwork et al. [7] to design a privacy-preserving logistic regression algorithm. This involves bounding the sensitivity of regularized logistic regression, and perturbing the learned classifier with noise proportional to the sensitivity. We show that for certain data distributions, this algorithm has poor learning generalization, compared with standard regularized logistic regression. We then provide a privacy-preserving regularized logistic regression algorithm based on a new privacy-preserving technique: solving a perturbed optimization problem. We prove that our algorithm preserves privacy in the model due to [7], and we provide learning guarantees. We show that our algorithm performs almost as well as standard regularized logistic regression, in terms of generalization error. Experiments demonstrate improved learning performance of our method, versus the sensitivity method. Our privacy-preserving technique does not depend on the sensitivity of the function, and extends easily to a class of convex loss functions. Our work also reveals an interesting connection between regularization and privacy. 1

### Citations

640 |
UCI machine learning repository
- Asuncion, Newman
- 2007
(Show Context)
Citation Context ...istribution on the hypersphere with zero mass within a small margin from the generating linear separator. We also experimented on several real data sets from the UC Irvine Machine Learning Repository =-=[2]-=-: Pima Indians Diabetes, and Breast Cancer Wisconsin Diagnostic, chosen in part for their potentially private nature, as disease data. Our implementations use the CVX [9] convex optimization package. ... |

614 | Privacy-Preserving Data Mining
- Agrawal, Srikant
- 2000
(Show Context)
Citation Context ...cient. Additional related work. There has been a substantial amount of work on privacy in the literature, spanning several communities. Much work on privacy has been done in the data-mining community =-=[1, 8]-=-, [15, 11], however the privacy definitions used in these papers are different, and some are susceptible to attacks when the adversary has some prior information. In contrast, the privacy definition w... |

449 | ℓ-diversity: Privacy beyond k-anonymity
- Machanavajjhala, Kifer, et al.
(Show Context)
Citation Context ...dditional related work. There has been a substantial amount of work on privacy in the literature, spanning several communities. Much work on privacy has been done in the data-mining community [1, 8], =-=[15, 11]-=-, however the privacy definitions used in these papers are different, and some are susceptible to attacks when the adversary has some prior information. In contrast, the privacy definition we use avoi... |

341 |
CVX: Matlab software for disciplined convex programming. (web page and software
- Grant, Boyd
- 2009
(Show Context)
Citation Context ...e Machine Learning Repository [2]: Pima Indians Diabetes, and Breast Cancer Wisconsin Diagnostic, chosen in part for their potentially private nature, as disease data. Our implementations use the CVX =-=[9]-=- convex optimization package. Figure 2 gives mean and standard deviation of test error over five folds of cross validation. In all four problems, the new algorithm is superior to the sensitivity metho... |

313 | Calibrating noise to sensitivity in private data analysis
- Dwork, McSherry, et al.
- 2006
(Show Context)
Citation Context ...nt tradeoff between privacy and learnability, when designing algorithms for learning from private databases. We focus on privacy-preserving logistic regression. First we apply an idea of Dwork et al. =-=[7]-=- to design a privacy-preserving logistic regression algorithm. This involves bounding the sensitivity of regularized logistic regression, and perturbing the learned classifier with noise proportional ... |

301 | Differential privacy
- Dwork
- 2006
(Show Context)
Citation Context ...ote its Euclidean norm. For a function G(x) defined on R d , we use ∇G to denote its gradient and ∇ 2 G to denote its Hessian. Privacy Definition. The privacy definition we use is due to Dwork et al. =-=[7, 6]-=-. In this model, users have access to private data about individuals through a sanitization mechanism, usually denoted by M. The interaction between the sanitization mechanism and an adversary is mode... |

250 | Limiting privacy breaches in privacy preserving data mining
- Evfimievski, Gehrke, et al.
- 2003
(Show Context)
Citation Context ...cient. Additional related work. There has been a substantial amount of work on privacy in the literature, spanning several communities. Much work on privacy has been done in the data-mining community =-=[1, 8]-=-, [15, 11], however the privacy definitions used in these papers are different, and some are susceptible to attacks when the adversary has some prior information. In contrast, the privacy definition w... |

212 | Protecting Privacy when Disclosing Information: k-Anonymity and Its Enforcement through Generalization and Suppression
- Samarati, Sweeney
- 1998
(Show Context)
Citation Context ...dditional related work. There has been a substantial amount of work on privacy in the literature, spanning several communities. Much work on privacy has been done in the data-mining community [1, 8], =-=[15, 11]-=-, however the privacy definitions used in these papers are different, and some are susceptible to attacks when the adversary has some prior information. In contrast, the privacy definition we use avoi... |

164 |
Iterative Methods for Optimization
- Kelley
- 1999
(Show Context)
Citation Context ...gin. In the band of margin ≤ 0.1 with respect to the perfect classifier, we performed random label flipping with probability 0.2. For our experiments, we used convex optimization software provided by =-=[9]-=-. Figure 1 gives mean and standard deviation of test error over five-fold cross-validation, on 17,500 points. In both simulations, our new method is superior to the sensitivity method, although incurs... |

138 | Robust de-anonymization of large sparse datasets
- Narayanan, Shmatikov
- 2008
(Show Context)
Citation Context ...er, this is problematic, because an adversary may have some auxiliary information, which may even be publicly available, and which can be used to breach privacy. For more details on such attacks, see =-=[13]-=-. To formally address this issue, we need a definition of privacy which works in the presence of auxiliary knowledge by the adversary. The definition we use is due to Dwork et al. [7], and has been us... |

122 | A learning theory approach to non-interactive database privacy
- Blum, Ligett, et al.
- 2008
(Show Context)
Citation Context ...l. [7], and has been used in several applications [5, 12, 3]. We explain this definition and privacy model in more detail in Section 2. Privacy and learning. The work most related to ours is [10] and =-=[4]-=-. [10] shows how to find classifiers that preserve ɛ-differential privacy; however, their algorithm takes time exponential in d for inputs in R d . [4] provides a general method for publishing data-se... |

108 | Smooth Sensitivity and Sampling in Private Data Analysis
- Nissim, Raskhodnokova, et al.
- 2007
(Show Context)
Citation Context ...ny input x1, . . . , xn, releasing f(x1, . . . , xn) + η, where η is a random variable drawn from a Laplace distribution with mean 0 and standard deviation S(f) ɛ preserves ɛ-differential privacy. In =-=[14]-=-, Nissim et al. showed that given any input x to a function, and a function f, it is sufficient to draw η from a Laplace distribution with standard deviation SS(f) ɛ , where SS(f) is the smoothedsensi... |

105 | Mechanism design via differential privacy
- McSherry, Talwar
- 2007
(Show Context)
Citation Context ...issue, we need a definition of privacy which works in the presence of auxiliary knowledge by the adversary. The definition we use is due to Dwork et al. [7], and has been used in several applications =-=[5, 12, 3]-=-. We explain this definition and privacy model in more detail in Section 2. Privacy and learning. The work most related to ours is [10] and [4]. [10] shows how to find classifiers that preserve ɛ-diff... |

59 |
consistency too: A holistic solution to contingency table release
- Privacy
- 2007
(Show Context)
Citation Context ...issue, we need a definition of privacy which works in the presence of auxiliary knowledge by the adversary. The definition we use is due to Dwork et al. [7], and has been used in several applications =-=[5, 12, 3]-=-. We explain this definition and privacy model in more detail in Section 2. Privacy and learning. The work most related to ours is [10] and [4]. [10] shows how to find classifiers that preserve ɛ-diff... |

57 | What can we learn privately
- Kasiviswanathan, Lee, et al.
- 2008
(Show Context)
Citation Context ...work et al. [7], and has been used in several applications [5, 12, 3]. We explain this definition and privacy model in more detail in Section 2. Privacy and learning. The work most related to ours is =-=[10]-=- and [4]. [10] shows how to find classifiers that preserve ɛ-differential privacy; however, their algorithm takes time exponential in d for inputs in R d . [4] provides a general method for publishing... |

52 | SVM optimization: inverse dependence on training set size
- Shalev-Shwartz, Srebro
- 2008
(Show Context)
Citation Context ... that minimizes fλ(w) over the data distribution, and let w1 and w2 be the classifiers that minimize ˆ fλ(w) and ˆ fλ(w) + bT w n over the data distribution respectively. We can use an analysis as in =-=[16]-=- to write that: L(w2) = L(w0) + (fλ(w2) − fλ(w ∗ )) + (fλ(w ∗ ) − fλ(w0)) + λ 2 ||w0|| 2 − λ 2 ||w2|| 2 Notice that from Lemma 3, ˆ fλ(w2) − ˆ fλ(w1) ≤ 8 log2 (1/δ) λn2ɛ2 . Using this and [17], we can... |

23 | When random sampling preserves privacy
- Chaudhuri, Mishra
- 2006
(Show Context)
Citation Context ...issue, we need a definition of privacy which works in the presence of auxiliary knowledge by the adversary. The definition we use is due to Dwork et al. [7], and has been used in several applications =-=[5, 12, 3]-=-. We explain this definition and privacy model in more detail in Section 2. Privacy and learning. The work most related to ours is [10] and [4]. [10] shows how to find classifiers that preserve ɛ-diff... |

19 | Fast rates for regularized objectives
- SHAMIR, Sridharan, et al.
- 2008
(Show Context)
Citation Context ...s as in [16] to write that: L(w2) = L(w0) + (fλ(w2) − fλ(w ∗ )) + (fλ(w ∗ ) − fλ(w0)) + λ 2 ||w0|| 2 − λ 2 ||w2|| 2 Notice that from Lemma 3, ˆ fλ(w2) − ˆ fλ(w1) ≤ 8 log2 (1/δ) λn2ɛ2 . Using this and =-=[17]-=-, we can bound the second quantity in equation 1 as fλ(w2) − fλ(w∗ ) ≤ 16 log2 (1/δ) λn2ɛ2 + O( 1 λn ). By definition of w∗ , ɛg the third quantity in Equation 1 is non-positive. If λ is set to be ||w... |