Results 1  10
of
268
ℓdiversity: Privacy beyond kanonymity
 IN ICDE
, 2006
"... Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called kanonymity has gained popularity. In a kanonymized dataset, each record is indistinguishable from at least k − 1 other records with resp ..."
Abstract

Cited by 672 (13 self)
 Add to MetaCart
Publishing data about individuals without revealing sensitive information about them is an important problem. In recent years, a new definition of privacy called kanonymity has gained popularity. In a kanonymized dataset, each record is indistinguishable from at least k − 1 other records with respect to certain “identifying ” attributes. In this paper we show using two simple attacks that a kanonymized dataset has some subtle, but severe privacy problems. First, an attacker can discover the values of sensitive attributes when there is little diversity in those sensitive attributes. This kind of attack is a known problem [60]. Second, attackers often have background knowledge, and we show that kanonymity does not guarantee privacy against attackers using background knowledge. We give a detailed analysis of these two attacks and we propose a novel and powerful privacy criterion called ℓdiversity that can defend against such attacks. In addition to building a formal foundation for ℓdiversity, we show in an experimental evaluation that ℓdiversity is practical and can be implemented efficiently.
Exact Sampling with Coupled Markov Chains and Applications to Statistical Mechanics
, 1996
"... For many applications it is useful to sample from a finite set of objects in accordance with some particular distribution. One approach is to run an ergodic (i.e., irreducible aperiodic) Markov chain whose stationary distribution is the desired distribution on this set; after the Markov chain has ..."
Abstract

Cited by 543 (13 self)
 Add to MetaCart
For many applications it is useful to sample from a finite set of objects in accordance with some particular distribution. One approach is to run an ergodic (i.e., irreducible aperiodic) Markov chain whose stationary distribution is the desired distribution on this set; after the Markov chain has run for M steps, with M sufficiently large, the distribution governing the state of the chain approximates the desired distribution. Unfortunately it can be difficult to determine how large M needs to be. We describe a simple variant of this method that determines on its own when to stop, and that outputs samples in exact accordance with the desired distribution. The method uses couplings, which have also played a role in other sampling schemes; however, rather than running the coupled chains from the present into the future, one runs from a distant point in the past up until the present, where the distance into the past that one needs to go is determined during the running of the al...
Solving Systems of Polynomial Equations
 AMERICAN MATHEMATICAL SOCIETY, CBMS REGIONAL CONFERENCES SERIES, NO 97
, 2002
"... One of the most classical problems of mathematics is to solve systems of polynomial equations in several unknowns. Today, polynomial models are ubiquitous and widely applied across the sciences. They arise in robotics, coding theory, optimization, mathematical biology, computer vision, game theory, ..."
Abstract

Cited by 223 (13 self)
 Add to MetaCart
One of the most classical problems of mathematics is to solve systems of polynomial equations in several unknowns. Today, polynomial models are ubiquitous and widely applied across the sciences. They arise in robotics, coding theory, optimization, mathematical biology, computer vision, game theory, statistics, machine learning, control theory, and numerous other areas. The set of solutions to a system of polynomial equations is an algebraic variety, the basic object of algebraic geometry. The algorithmic study of algebraic varieties is the central theme of computational algebraic geometry. Exciting recent developments in symbolic algebra and numerical software for geometric calculations have revolutionized the field, making formerly inaccessible problems tractable, and providing fertile ground for experimentation and conjecture. The first half of this book furnishes an introduction and represents a snapshot of the state of the art regarding systems of polynomial equations. Afficionados of the wellknown text books by Cox, Little, and O’Shea will find familiar themes in the first five chapters: polynomials in one variable, Gröbner
Fastest mixing markov chain on a graph
 SIAM REVIEW
, 2003
"... We consider a symmetric random walk on a connected graph, where each edge is labeled with the probability of transition between the two adjacent vertices. The associated Markov chain has a uniform equilibrium distribution; the rate of convergence to this distribution, i.e., the mixing rate of the ..."
Abstract

Cited by 155 (15 self)
 Add to MetaCart
(Show Context)
We consider a symmetric random walk on a connected graph, where each edge is labeled with the probability of transition between the two adjacent vertices. The associated Markov chain has a uniform equilibrium distribution; the rate of convergence to this distribution, i.e., the mixing rate of the Markov chain, is determined by the second largest (in magnitude) eigenvalue of the transition matrix. In this paper we address the problem of assigning probabilities to the edges of the graph in such a way as to minimize the second largest magnitude eigenvalue, i.e., the problem of finding the fastest mixing Markov chain on the graph. We show that this problem can be formulated as a convex optimization problem, which can in turn be expressed as a semidefinite program (SDP). This allows us to easily compute the (globally) fastest mixing Markov chain for any graph with a modest number of edges (say, 1000) using standard numerical methods for SDPs. Larger problems can be solved by
Toward privacy in public databases
, 2005
"... We initiate a theoretical study of the census problem. Informally, in a census individual respondents give private information to a trusted party (the census bureau), who publishes a sanitized version of the data. There are two fundamentally conflicting requirements: privacy for the respondents an ..."
Abstract

Cited by 107 (10 self)
 Add to MetaCart
We initiate a theoretical study of the census problem. Informally, in a census individual respondents give private information to a trusted party (the census bureau), who publishes a sanitized version of the data. There are two fundamentally conflicting requirements: privacy for the respondents and utility of the sanitized data. Unlike in the study of secure function evaluation, in which privacy is preserved to the extent possible given a specific functionality goal, in the census problem privacy is paramount; intuitively, things that cannot be learned “safely ” should not be learned at all. An important contribution of this work is a definition of privacy (and privacy compromise) for statistical databases, together with a method for describing and comparing the privacy offered by specific sanitization techniques. We obtain several privacy results using two different sanitization techniques, and then show how to combine them via cross training. We also obtain two utility results involving clustering.
Honest Exploration of Intractable Probability Distributions Via Markov Chain Monte Carlo
 STATISTICAL SCIENCE
, 2001
"... Two important questions that must be answered whenever a Markov chain Monte Carlo (MCMC) algorithm is used are (Q1) What is an appropriate burnin? and (Q2) How long should the sampling continue after burnin? Developing rigorous answers to these questions presently requires a detailed study of the ..."
Abstract

Cited by 105 (32 self)
 Add to MetaCart
Two important questions that must be answered whenever a Markov chain Monte Carlo (MCMC) algorithm is used are (Q1) What is an appropriate burnin? and (Q2) How long should the sampling continue after burnin? Developing rigorous answers to these questions presently requires a detailed study of the convergence properties of the underlying Markov chain. Consequently, in most practical applications of MCMC, exact answers to (Q1) and (Q2) are not sought. The goal of this paper is to demystify the analysis that leads to honest answers to (Q1) and (Q2). The authors hope that this article will serve as a bridge between those developing Markov chain theory and practitioners using MCMC to solve practical problems. The ability to formally address (Q1) and (Q2) comes from establishing a drift condition and an associated minorization condition, which together imply that the underlying Markov chain is geometrically ergodic. In this paper, we explain exactly what drift and minorization are as well as how and why these conditions can be used to form rigorous answers to (Q1) and (Q2). The basic ideas are as follows. The results of Rosenthal (1995) and Roberts and Tweedie (1999) allow one to use drift and minorization conditions to construct a formula giving an analytic upper bound on the distance to stationarity. A rigorous answer to (Q1) can be calculated using this formula. The desired characteristics of the target distribution are typically estimated using ergodic averages. Geometric ergodicity of the underlying Markov chain implies that there are central limit theorems available for ergodic averages (Chan and Geyer 1994). The regenerative simulation technique (Mykland, Tierney and Yu 1995, Robert 1995) can be used to get a consistent estimate of the variance of the asymptotic nor...
FixedWidth Output Analysis for Markov Chain Monte Carlo
, 2005
"... Markov chain Monte Carlo is a method of producing a correlated sample in order to estimate features of a target distribution via ergodic averages. A fundamental question is when should sampling stop? That is, when are the ergodic averages good estimates of the desired quantities? We consider a metho ..."
Abstract

Cited by 93 (30 self)
 Add to MetaCart
(Show Context)
Markov chain Monte Carlo is a method of producing a correlated sample in order to estimate features of a target distribution via ergodic averages. A fundamental question is when should sampling stop? That is, when are the ergodic averages good estimates of the desired quantities? We consider a method that stops the simulation when the width of a confidence interval based on an ergodic average is less than a userspecified value. Hence calculating a Monte Carlo standard error is a critical step in assessing the simulation output. We consider the regenerative simulation and batch means methods of estimating the variance of the asymptotic normal distribution. We give sufficient conditions for the strong consistency of both methods and investigate their finite sample properties in a variety of examples.
Statistical challenges with high dimensionality: feature selection in knowledge discovery
, 2006
"... ..."
(Show Context)
On the toric algebra of graphical models
, 2006
"... We formulate necessary and sufficient conditions for an arbitrary discrete probability distribution to factor according to an undirected graphical model, or a loglinear model, or other more general exponential models. For decomposable graphical models these conditions are equivalent to a set of con ..."
Abstract

Cited by 57 (7 self)
 Add to MetaCart
(Show Context)
We formulate necessary and sufficient conditions for an arbitrary discrete probability distribution to factor according to an undirected graphical model, or a loglinear model, or other more general exponential models. For decomposable graphical models these conditions are equivalent to a set of conditional independence statements similar to the Hammersley–Clifford theorem; however, we show that for nondecomposable graphical models they are not. We also show that nondecomposable models can have nonrational maximum likelihood estimates. These results are used to give several novel characterizations of decomposable graphical models.