Results 1 - 10
of
26
Private Information Retrieval
, 1997
"... Publicly accessible databases are an indispensable resource for retrieving up to date information. But they also pose a significant risk to the privacy of the user, since a curious database operator can follow the user's queries and infer what the user is after. Indeed, in cases where the users ' i ..."
Abstract
-
Cited by 347 (10 self)
- Add to MetaCart
Publicly accessible databases are an indispensable resource for retrieving up to date information. But they also pose a significant risk to the privacy of the user, since a curious database operator can follow the user's queries and infer what the user is after. Indeed, in cases where the users ' intentions are to be kept secret, users are often cautious about accessing the database. It can be shown that when accessing a single database, to completely guarantee the privacy of the user, the whole database should be downloaded, namely n bits should be communicated (where n is the number of bits in the database). In this work, we investigate whether by replicating the database, more efficient solutions to the private retrieval problem can be obtained. We describe schemes that enable a user to access k replicated copies of a database (k * 2) and privately retrieve information stored in the database. This means that each individual database gets no information on the identity of the item retrieved by the user. Our schemes use the replication to gain substantial saving. In particular, we have ffl A two database scheme with communication complexity of O(n1=3). ffl A scheme for a constant number, k, of databases with communication complexity O(n1=k). ffl A scheme for 13 log2 n databases with polylogarithmic (in n) communication complexity.
The State of Record Linkage and Current Research Problems
- Statistical Research Division, U.S. Census Bureau
, 1999
"... This paper provides an overview of methods and systems developed for record linkage. Modern record linkage begins with the pioneering work of Newcombe and is especially based on the formal mathematical model of Fellegi and Sunter. In their seminal work, Fellegi and Sunter introduced many powerful id ..."
Abstract
-
Cited by 172 (7 self)
- Add to MetaCart
This paper provides an overview of methods and systems developed for record linkage. Modern record linkage begins with the pioneering work of Newcombe and is especially based on the formal mathematical model of Fellegi and Sunter. In their seminal work, Fellegi and Sunter introduced many powerful ideas for estimating record linkage parameters and other ideas that still influence record linkage today. Record linkage research is characterized by its synergism of statistics, computer science, and operations research. Many difficult algorithms have been developed and put in software systems. Record linkage practice is still very limited. Some limits are due to existing software. Other limits are due to the difficulty in automatically estimating matching parameters and error rates, with current research highlighted by the work of Larsen and Rubin. Keywords: computer matching, modeling, iterative fitting, string comparison, optimization RsSUMs Cet article donne une vue d'ensemble sur les ...
Measures of Disclosure Risks and Harm
- Journal of Official Statistics
, 1993
"... Disclosure is a difficult topic. Even the definition of disclosure depends on the context. Sometimes it is enough to violate anonymity. Sometimes sensitive information has to be revealed. Sometimes a disclosure is said to occur even though the information revealed is incorrect. This paper tries to u ..."
Abstract
-
Cited by 65 (0 self)
- Add to MetaCart
Disclosure is a difficult topic. Even the definition of disclosure depends on the context. Sometimes it is enough to violate anonymity. Sometimes sensitive information has to be revealed. Sometimes a disclosure is said to occur even though the information revealed is incorrect. This paper tries to untangle disclosure issues by differentiating between linking a respondent to a record and learning sensitive information from the linking. The extent to which a released record can be linked to a respondent determines disclosure risk; the information revealed when a respondent is linked to a released record determines disclosure harm. There can be harm even if the wrong record is identified or an incorrect sensitive value inferred. In this paper, measures of disclosure risk and harm that reflect what is learned about a respondent are studied, and some implications for data release policies are given. This paper was written at the request of the Panel on Confidentiality and Data Access of the...
A Random Server Model for Private Information Retrieval or Information Theoretic PIR Avoiding Database Replication
, 1997
"... Private information retrieval #PIR# schemes provide a user with information from a database while keeping his query secret from the database manager. We propose a new model for PIR, utilizing auxiliary random servers providing privacy services for database access. The principal database initially ..."
Abstract
-
Cited by 42 (3 self)
- Add to MetaCart
Private information retrieval #PIR# schemes provide a user with information from a database while keeping his query secret from the database manager. We propose a new model for PIR, utilizing auxiliary random servers providing privacy services for database access. The principal database initially engages in a preprocessing setup computation with the random servers, followed by the on-line stage with the users.
Security of Random Data Perturbation Methods
- ACM Transactions on Database Systems
, 2000
"... INTRODUCTION Organizations increasingly face the problem of protecting confidential information contained in their databases. Government agencies such as the Census Bureau, which are responsible for gathering and disseminating information, adopt many techniques including the masking of microdata, t ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
INTRODUCTION Organizations increasingly face the problem of protecting confidential information contained in their databases. Government agencies such as the Census Bureau, which are responsible for gathering and disseminating information, adopt many techniques including the masking of microdata, to limit the disclosure of confidential information [Fuller 1993]. Such agencies generally release static masked datasets so that undesirable inferences Authors' addresses: K. Muralidhar, Carol Martin Gatton College of Business & Economics, University of Kentucky, Lexington, KY 40506-0034; email: kmura0@pop.uky.edu; R. Sarathy, Department of Accounting, Illinois State University, Normal, IL 61790--5520; email: rsarathy@ilstu.edu. Permission to make digital / hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and i
Protecting data privacy through hard-to-reverse negative databases
- In Springer LNCS, editor, In proceedings of the 9th Information Security Conference (ISC’06
, 2006
"... Abstract. The paper extends the idea of negative representations of information for enhancing privacy. Simply put, a set DB of data elements can be represented in terms of its complement set. That is, all the elements not in DB are depicted and DB itself is not explicitly stored. We review the negat ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
Abstract. The paper extends the idea of negative representations of information for enhancing privacy. Simply put, a set DB of data elements can be represented in terms of its complement set. That is, all the elements not in DB are depicted and DB itself is not explicitly stored. We review the negative database (NDB) representation scheme for storing a negative image compactly and propose a design for depicting a multiple record DB using a collection of NDBs—in contrast to the single NDB approach of previous work. Finally, we present a method for creating negative databases that are hard to reverse in practice, i.e., from which it is hard to obtain DB, by adapting a technique for generating 3-SAT formulas. 1
Enhancing privacy through negative representations of data
, 2004
"... The paper introduces the concept of a negative database, in which a set of records DB is represented by its complement set. That is, all the records not in DB are represented, and DB itself is not explicitly stored. After introducing the concept, several results are given regarding the feasibility o ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
The paper introduces the concept of a negative database, in which a set of records DB is represented by its complement set. That is, all the records not in DB are represented, and DB itself is not explicitly stored. After introducing the concept, several results are given regarding the feasibility of such a scheme and its potential for enhancing privacy. It is shown that a database consisting of n, l-bit records can be represented negatively using only O(ln) records. It is also shown that membership queries for DB can be processed against the negative representation in time no worse than linear in its size and that reconstructing the database DB represented by a negative database NDB given as input is an NP-hard problem when time complexity is measured as a function of the size of NDB.
A framework for evaluating the utility of data altered to protect confidentiality
, 2006
"... When releasing data to the public, statistical agencies and survey organizations typically alter data values in order to protect the confidentiality of survey respondents ’ identities and attribute values. To select among the wide variety of data alteration methods, agencies require tools for evalua ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
When releasing data to the public, statistical agencies and survey organizations typically alter data values in order to protect the confidentiality of survey respondents ’ identities and attribute values. To select among the wide variety of data alteration methods, agencies require tools for evaluating the utility of proposed data releases. Such utility measures can be combined with disclosure risk measures to gauge risk-utility tradeoffs of competing methods. In this paper, we present utility measures focused on differences in inferences obtained from the altered data and corresponding inferences obtained from the original data. Using both genuine and simulated data, we show how the measures can be used in a decision-theoretic formulation for evaluating disclosure limitation procedures.
Microdata Protection
"... Governmental, public, and private organizations are more and more frequently required to make data available for external release in a selective and secure fashion. Most data are today released in the form of microdata, reporting information on individual respondents. The protection of microdata aga ..."
Abstract
-
Cited by 9 (7 self)
- Add to MetaCart
Governmental, public, and private organizations are more and more frequently required to make data available for external release in a selective and secure fashion. Most data are today released in the form of microdata, reporting information on individual respondents. The protection of microdata against improper disclosure is therefore an issue that has become increasingly important and will continue to be so. This has created an increasing demand on organizations to devote resources for adequate protection of microdata. In this chapter, we first characterize the microdata protection problem (in contrast to macrodata protection), discussing the disclosure risks at which microdata are exposed. We survey the main techniques that have been proposed to protect microdata from improper disclosure by distinguishing them in masking techniques (which protect data by masking or perturbing their values), and synthetic data generation techniques (which protect data by replacing them with plausible, but made up, values). We conclude the chapter with observations on measures for assessing disclosure risk and information loss brought by the application of protection techniques. 1
A Theoretical Basis For Perturbation Methods
, 2003
"... this paper we discuss a new theoretical basis for perturbation methods. In developing this new theoretical basis, we define the ideal measures of data utility and disclosure risk. Maximum data utility is achieved when the statistical characteristics of the perturbed data are the same as that of th ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
this paper we discuss a new theoretical basis for perturbation methods. In developing this new theoretical basis, we define the ideal measures of data utility and disclosure risk. Maximum data utility is achieved when the statistical characteristics of the perturbed data are the same as that of the original data. Disclosure risk is minimized if providing users with microdata access does not result in any additional information. We show that when the perturbed values of the confidential variables are generated as independent realizations from the distribution of the confidential variables conditioned on the non-confidential variables, they satisfy the data utility and disclosure risk requirements. We also discuss the relationship between the theoretical basis and some commonly used methods for generating perturbed values of confidential numerical variables

