Results 1  10
of
160
ℓdiversity: Privacy beyond kanonymity
 In ICDE
, 2006
"... Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called kanonymity has gained popularity. In a kanonymized dataset, each record is indistinguishable from at least k − 1 other records with resp ..."
Abstract

Cited by 445 (12 self)
 Add to MetaCart
Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called kanonymity has gained popularity. In a kanonymized dataset, each record is indistinguishable from at least k − 1 other records with respect to certain “identifying ” attributes. In this paper we show using two simple attacks that a kanonymized dataset has some subtle, but severe privacy problems. First, an attacker can discover the values of sensitive attributes when there is little diversity in those sensitive attributes. This kind of attack is a known problem [60]. Second, attackers often have background knowledge, and we show that kanonymity does not guarantee privacy against attackers using background knowledge. We give a detailed analysis of these two attacks and we propose a novel and powerful privacy criterion called ℓdiversity that can defend against such attacks. In addition to building a formal foundation for ℓdiversity, we show in an experimental evaluation that ℓdiversity is practical and can be implemented efficiently. 1.
Exact Sampling with Coupled Markov Chains and Applications to Statistical Mechanics
, 1996
"... For many applications it is useful to sample from a finite set of objects in accordance with some particular distribution. One approach is to run an ergodic (i.e., irreducible aperiodic) Markov chain whose stationary distribution is the desired distribution on this set; after the Markov chain has ..."
Abstract

Cited by 406 (13 self)
 Add to MetaCart
For many applications it is useful to sample from a finite set of objects in accordance with some particular distribution. One approach is to run an ergodic (i.e., irreducible aperiodic) Markov chain whose stationary distribution is the desired distribution on this set; after the Markov chain has run for M steps, with M sufficiently large, the distribution governing the state of the chain approximates the desired distribution. Unfortunately it can be difficult to determine how large M needs to be. We describe a simple variant of this method that determines on its own when to stop, and that outputs samples in exact accordance with the desired distribution. The method uses couplings, which have also played a role in other sampling schemes; however, rather than running the coupled chains from the present into the future, one runs from a distant point in the past up until the present, where the distance into the past that one needs to go is determined during the running of the al...
SOLVING SYSTEMS OF POLYNOMIAL EQUATIONS
, 2002
"... These are the lecture notes for ten lectures to be given at the CBMS ..."
Abstract

Cited by 150 (12 self)
 Add to MetaCart
These are the lecture notes for ten lectures to be given at the CBMS
Fastest Mixing Markov Chain on A Graph
 SIAM REVIEW
, 2003
"... We consider a symmetric random walk on a connected graph, where each edge is labeled with the probability of transition between the two adjacent vertices. The associated Markov chain has a uniform equilibrium distribution; the rate of convergence to this distribution, i.e. the mixing rate of the Mar ..."
Abstract

Cited by 90 (15 self)
 Add to MetaCart
We consider a symmetric random walk on a connected graph, where each edge is labeled with the probability of transition between the two adjacent vertices. The associated Markov chain has a uniform equilibrium distribution; the rate of convergence to this distribution, i.e. the mixing rate of the Markov chain, is determined by the second largest (in magnitude) eigenvalue of the transition matrix. In this paper we address the problem of assigning probabilities to the edges of the graph in such a way as to minimize the second largest magnitude eigenvalue, i.e., the problem of finding the fastest mixing Markov chain on the graph. We show that
Toward privacy in public databases
 In TCC
, 2005
"... Abstract. We initiate a theoretical study of the census problem. Informally, in a census individual respondents give private information to a trusted party (the census bureau), who publishes a sanitized version of the data. There are two fundamentally conflicting requirements: privacy for the respon ..."
Abstract

Cited by 89 (12 self)
 Add to MetaCart
Abstract. We initiate a theoretical study of the census problem. Informally, in a census individual respondents give private information to a trusted party (the census bureau), who publishes a sanitized version of the data. There are two fundamentally conflicting requirements: privacy for the respondents and utility of the sanitized data. Unlike in the study of secure function evaluation, in which privacy is preserved to the extent possible given a specific functionality goal, in the census problem privacy is paramount; intuitively, things that cannot be learned “safely ” should not be learned at all. An important contribution of this work is a definition of privacy (and privacy compromise) for statistical databases, together with a method for describing and comparing the privacy offered by specific sanitization techniques. We obtain several privacy results using two different sanitization techniques, and then show how to combine them via cross training. We also obtain two utility results involving clustering. 1
Honest Exploration of Intractable Probability Distributions Via Markov Chain Monte Carlo
 STATISTICAL SCIENCE
, 2001
"... Two important questions that must be answered whenever a Markov chain Monte Carlo (MCMC) algorithm is used are (Q1) What is an appropriate burnin? and (Q2) How long should the sampling continue after burnin? Developing rigorous answers to these questions presently requires a detailed study of the ..."
Abstract

Cited by 74 (19 self)
 Add to MetaCart
Two important questions that must be answered whenever a Markov chain Monte Carlo (MCMC) algorithm is used are (Q1) What is an appropriate burnin? and (Q2) How long should the sampling continue after burnin? Developing rigorous answers to these questions presently requires a detailed study of the convergence properties of the underlying Markov chain. Consequently, in most practical applications of MCMC, exact answers to (Q1) and (Q2) are not sought. The goal of this paper is to demystify the analysis that leads to honest answers to (Q1) and (Q2). The authors hope that this article will serve as a bridge between those developing Markov chain theory and practitioners using MCMC to solve practical problems. The ability to formally address (Q1) and (Q2) comes from establishing a drift condition and an associated minorization condition, which together imply that the underlying Markov chain is geometrically ergodic. In this paper, we explain exactly what drift and minorization are as well as how and why these conditions can be used to form rigorous answers to (Q1) and (Q2). The basic ideas are as follows. The results of Rosenthal (1995) and Roberts and Tweedie (1999) allow one to use drift and minorization conditions to construct a formula giving an analytic upper bound on the distance to stationarity. A rigorous answer to (Q1) can be calculated using this formula. The desired characteristics of the target distribution are typically estimated using ergodic averages. Geometric ergodicity of the underlying Markov chain implies that there are central limit theorems available for ergodic averages (Chan and Geyer 1994). The regenerative simulation technique (Mykland, Tierney and Yu 1995, Robert 1995) can be used to get a consistent estimate of the variance of the asymptotic nor...
Markov Chains and Polynomial time Algorithms
, 1994
"... This paper outlines the use of rapidly mixing Markov Chains in randomized polynomial time algorithms to solve approximately certain counting problems. They fall into two classes: combinatorial problems like counting the number of perfect matchings in certain graphs and geometric ones like computing ..."
Abstract

Cited by 48 (0 self)
 Add to MetaCart
This paper outlines the use of rapidly mixing Markov Chains in randomized polynomial time algorithms to solve approximately certain counting problems. They fall into two classes: combinatorial problems like counting the number of perfect matchings in certain graphs and geometric ones like computing the volumes of convex sets.
Fixedwidth output analysis for Markov chain Monte Carlo
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2006
"... Markov chain Monte Carlo is a method of producing a correlated sample in order to estimate features of a target distribution via ergodic averages. A fundamental question is when should sampling stop? That is, when are the ergodic averages good estimates of the desired quantities? We consider a metho ..."
Abstract

Cited by 48 (17 self)
 Add to MetaCart
Markov chain Monte Carlo is a method of producing a correlated sample in order to estimate features of a target distribution via ergodic averages. A fundamental question is when should sampling stop? That is, when are the ergodic averages good estimates of the desired quantities? We consider a method that stops the simulation when the width of a confidence interval based on an ergodic average is less than a userspecified value. Hence calculating a Monte Carlo standard error is a critical step in assessing the simulation output. We consider the regenerative simulation and batch means methods of estimating the variance of the asymptotic normal distribution. We give sufficient conditions for the strong consistency of both methods and investigate their finite sample properties in a variety of examples.
Variation of Cost Functions in Integer Programming
 MATHEMATICAL PROGRAMMING
, 1994
"... We study the problem of minimizing c \Delta x subject to A \Delta x = b, x 0 and x integral, for a fixed matrix A. Two cost functions c and c 0 are considered equivalent if they give the same optimal solutions for each b. We construct a polytope St(A) whose normal cones are the equivalence classe ..."
Abstract

Cited by 42 (8 self)
 Add to MetaCart
We study the problem of minimizing c \Delta x subject to A \Delta x = b, x 0 and x integral, for a fixed matrix A. Two cost functions c and c 0 are considered equivalent if they give the same optimal solutions for each b. We construct a polytope St(A) whose normal cones are the equivalence classes. Explicit inequality presentations of these cones are given by the reduced Gröbner bases associated with A. The union of the reduced Gröbner bases as c varies (called the universal Gröbner basis) consists precisely of the edge directions of St(A). We present geometric algorithms for computing St(A), the Graver basis [Gra], and the universal Gröbner basis.
Simulatable auditing
 In PODS
, 2005
"... Given a data set consisting of private information about individuals, we consider the online query auditing problem: given a sequence of queries that have already been posed about the data, their corresponding answers – where each answer is either the true answer or “denied ” (in the event that reve ..."
Abstract

Cited by 39 (5 self)
 Add to MetaCart
Given a data set consisting of private information about individuals, we consider the online query auditing problem: given a sequence of queries that have already been posed about the data, their corresponding answers – where each answer is either the true answer or “denied ” (in the event that revealing the answer compromises privacy) – and given a new query, deny the answer if privacy may be breached or give the true answer otherwise. A related problem is the offline auditing problem where one is given a sequence of queries and all of their true answers and the goal is to determine if a privacy breach has already occurred. We uncover the fundamental issue that solutions to the offline auditing problem cannot be directly used to solve the online auditing problem since query denials may leak information. Consequently, we introduce a new model called simulatable auditing where query denials provably do not leak information. We demonstrate that max queries may be audited in this simulatable paradigm under the classical definition of privacy where a breach occurs if a sensitive value is fully compromised. We also introduce a probabilistic notion of (partial) compromise. Our privacy definition requires that the apriori probability that a sensitive value lies within some small interval is not that different from the posterior probability (given the query answers). We demonstrate that sum queries can be audited in a simulatable fashion under this privacy definition.