Results 1 - 10
of
19
SECURE OUTSOURCING OF SEQUENCE COMPARISONS
"... Large-scale problems in the physical and life sciences are being revolutionized by Internet computing technologies, like grid computing, that make possible the massive cooperative sharing of computational power, bandwidth, storage, and data. A weak computational device, once connected to such a grid ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
Large-scale problems in the physical and life sciences are being revolutionized by Internet computing technologies, like grid computing, that make possible the massive cooperative sharing of computational power, bandwidth, storage, and data. A weak computational device, once connected to such a grid, is no longer limited by its slow speed, small amounts of local storage, and limited bandwidth: It can avail itself of the abundance of these resources that is available elsewhere on the network. An impediment to the use of “computational outsourcing” is that the data in question is often sensitive, e.g., of national security importance, or proprietary and containing commercial secrets, or to be kept private for legal requirements such as the HIPAA legislation, Gramm-Leach-Bliley, or similar laws. This motivates the design of techniques for computational outsourcing in a privacy-preserving manner, i.e., without revealing to the remote agents whose computational power is being used, either one’s data or the outcome of the computation on the data. This paper investigates such secure outsourcing for widely applicable sequence comparison problems, and gives an efficient protocol for a
Towards practical privacy for genomic computation
- In 2008 IEEE Symposium on Security and Privacy
, 2008
"... Many basic tasks in computational biology involve operations on individual DNA and protein sequences. These sequences, even when anonymized, are vulnerable to re-identification attacks and may reveal highly sensitive information about individuals. We present a relatively efficient, privacy-preservin ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Many basic tasks in computational biology involve operations on individual DNA and protein sequences. These sequences, even when anonymized, are vulnerable to re-identification attacks and may reveal highly sensitive information about individuals. We present a relatively efficient, privacy-preserving implementation of fundamental genomic computations such as calculating the edit distance and Smith-Waterman similarity scores between two sequences. Our techniques are cryptographically secure and significantly more practical than previous solutions. We evaluate our prototype implementation on sequences from the Pfam database of protein families, and demonstrate that its performance is adequate for solving real-world sequence-alignment and related problems in a privacypreserving manner. Furthermore, our techniques have applications beyond computational biology. They can be used to obtain efficient, privacy-preserving implementations for many dynamic programming algorithms over distributed datasets. 1
Privacy-preserving data linkage and geocoding: Current approaches and research directions
- in ‘Workshop on Privacy Aspects of Data Mining’ (PADM’06), held at IEEE ICDM’06, Hong Kong
, 2006
"... Data linkage is the task of matching and aggregating records that relate to the same entity from one or more data sets. A related technique is geocoding, the matching of addresses to their geographic locations (latitude and longitude). As data linkage is often based on personal information (like nam ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Data linkage is the task of matching and aggregating records that relate to the same entity from one or more data sets. A related technique is geocoding, the matching of addresses to their geographic locations (latitude and longitude). As data linkage is often based on personal information (like names, dates of birth, and addresses), privacy and confidentiality issues are of paramount importance, especially when linking data across organisations. In this paper we present an overview of current approaches to privacy-preserving data linkage and geocoding and discuss their limitations, and using several real-world scenarios we illustrate the significance of developing improved techniques for large scale and distributed privacypreserving linking and geocoding. We discuss four core areas of research that need to be addressed in order to make linking and geocoding of large confidential data collections possible: secure matching techniques, automated record pair classification, scalability, and techniques that prevent re-identification of records over collections of linked data. 1.
Privacy Preserving Clustering On Horizontally Partitioned Data 1
"... Data mining has been a popular research area for more than a decade due to its vast spectrum of applications.. However, the popularity and wide availability of data mining tools also raised concerns about the privacy of individuals. The aim of privacy preserving data mining researchers is to develop ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Data mining has been a popular research area for more than a decade due to its vast spectrum of applications.. However, the popularity and wide availability of data mining tools also raised concerns about the privacy of individuals. The aim of privacy preserving data mining researchers is to develop data mining techniques that could be applied on databases without violating the privacy of individuals. Privacy preserving techniques for various data mining models have been proposed, initially for classification on centralized data then for association rules in distributed environments. In this work, we propose methods for constructing the dissimilarity matrix of objects from different sites in a privacy preserving manner which can be used for privacy preserving clustering as well as database joins, record linkage and other operations that require pair-wise comparison of individual private data objects horizontally distributed to multiple sites. We show communication and computation complexity of our protocol by conducting experiments over synthetically generated and real datasets. Each experiment is also performed for a baseline protocol which has no privacy concern to show that the overhead comes with security and privacy by comparing the baseline protocol and our protocol.
Learning Your Identity and Disease from Research Papers: Information Leaks in Genome Wide Association Study
"... Genome-wide association studies (GWAS) aim at discovering the association between genetic variations, particularly single-nucleotide polymorphism (SNP), and common diseases, which have been well recognized to be one of the most important and active areas in biomedical research. Also renowned is the ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Genome-wide association studies (GWAS) aim at discovering the association between genetic variations, particularly single-nucleotide polymorphism (SNP), and common diseases, which have been well recognized to be one of the most important and active areas in biomedical research. Also renowned is the privacy implication of such studies, which has been brought into the limelight by the recent attack proposed by Homer et al. Homer’s attack demonstrates that it is possible to identify a participant of a GWAS from analyzing the allele frequencies of a large number of SNPs. Such a threat, unfortunately, was found in our research to be significantly understated. In this paper, we demonstrate that individuals can actually be identified from even a relatively small set of statistics, as those routinely published in GWAS papers. We present two attacks. The first one extends Homer’s attack with a much more powerful test statistic, based on the correlations among different SNPs described by coefficient of determination (r 2). This attack can determine the presence of an individual in a GWAS from the statistics related to a couple of hundred SNPs. The second attack can lead to complete disclosure of hundreds of the participants ’ SNPs, by analyzing the information derived from the published statistics. We also found that those attacks can succeed even when the precisions of the statistics are low and part of data is missing, which makes the effects of such simple defense limited. We evaluated our attacks on the real human genomes from the International HapMap project, and concluded that such threats are completely realistic.
Efficient Privacy-Preserving k-Nearest Neighbor Search ∗
"... We give efficient protocols for secure and private k-nearest neighbor (k-NN) search, when the data is distributed between two parties who want to cooperatively compute the answers without revealing to each other their private data. Our protocol for the single-step k-NN search is provably secure and ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We give efficient protocols for secure and private k-nearest neighbor (k-NN) search, when the data is distributed between two parties who want to cooperatively compute the answers without revealing to each other their private data. Our protocol for the single-step k-NN search is provably secure and has linear computation and communication complexity. Previous work on this problem had a quadratic complexity, and also leaked information about the parties ’ inputs. We adapt our techniques to also solve the general multi-step k-NN search, and describe a specific embodiment of it for the case of sequence data. The protocols and correctness proofs can be extended to suit other privacy-preserving data mining tasks, such as classification and outlier detection. 1
Privacy Preserving Error Resilient DNA Searching through Oblivious Automata
"... Human Desoxyribo-Nucleic Acid (DNA) sequences offer a wealth of information that reveal, among others, predisposition to various diseases and paternity relations. The breadth and personalized nature of this information highlights the need for privacy-preserving protocols. In this paper, we present a ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Human Desoxyribo-Nucleic Acid (DNA) sequences offer a wealth of information that reveal, among others, predisposition to various diseases and paternity relations. The breadth and personalized nature of this information highlights the need for privacy-preserving protocols. In this paper, we present a new error-resilient privacy-preserving string searching protocol that is suitable for running private DNA queries. This protocol checks if a short template (e.g., a string that describes a mutation leading to a disease), known to one party, is present inside a DNA sequence owned by another party, accounting for possible errors and without disclosing to each party the other party’s input. Each query is formulated as a regular expression over a finite alphabet and implemented as an automaton. As the main technical contribution, we provide a protocol that allows to execute any finite state machine in an oblivious manner, requiring a communication complexity which is linear both in the number of states and the length of the input string. Categories and Subject Descriptors
Filtering for private collaborative benchmarking
- International Conference on Emergin Trends in Information and Communication Security, LNCS 3995
, 2006
"... Abstract. Collaborative Benchmarking is an important issue for modern enterprises, but the business performance quantities used as input are often highly confidential. Secure Multi-Party Computation can offer protocols that can compute benchmarks without leaking the input variables. Benchmarking is ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract. Collaborative Benchmarking is an important issue for modern enterprises, but the business performance quantities used as input are often highly confidential. Secure Multi-Party Computation can offer protocols that can compute benchmarks without leaking the input variables. Benchmarking is a process of comparing to the “best”, so often it is necessary to only include the k-best enterprises for computing a benchmark to not distort the result with some outlying performances. We present a protocol that can be used as a filter, before running any collaborative benchmarking protocol that restricts the participants to the k best values. Our protocol doesn’t use the general circuit construction technique for SMC aiming to optimize performance. As building blocks we present the fastest implementation of Yao’s millionaires ’ protocol and a protocol that achieves a fair shuffle in O(log n) rounds. 1
Secure sound classification: Gaussian mixture models
- in Proc. of ICASSP, 2006. [Online]. Available: http://www.merl.com
"... We propose secure protocols for gaussian mixture-based sound recognition. The protocols we describe allow varying levels of security between two collaborating parties. The case we examine consists of one party (Alice) providing data and other party (Bob) providing a recognition algorithm. We show th ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We propose secure protocols for gaussian mixture-based sound recognition. The protocols we describe allow varying levels of security between two collaborating parties. The case we examine consists of one party (Alice) providing data and other party (Bob) providing a recognition algorithm. We show that it is possible to have Bob apply his algorithm on Alice’s data in such a way that the data and the recognition results will not be revealed to Bob thereby guaranteeing Alice’s data privacy. Likewise we show that it is possible to organize the collaboration so that a reverse engineering of Bob’s recognition algorithm cannot be performed by Alice. We show how gaussian mixtures can be implemented in a secure manner using secure computation primitives implementing simple numerical operations and we demonstrate the process by showing how it can yield identical results to a non-secure computation while maintaining privacy. 1.
Privacy-Preserving Genomic Computation Through Program Specialization
"... In this paper, we present a new approach to performing important classes of genomic computations (e.g., search for homologous genes) that makes a significant step towards privacy protection in this domain. Our approach leverages a key property of the human genome, namely that the vast majority of it ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In this paper, we present a new approach to performing important classes of genomic computations (e.g., search for homologous genes) that makes a significant step towards privacy protection in this domain. Our approach leverages a key property of the human genome, namely that the vast majority of it is shared across humans (and hence public), and consequently relatively little of it is sensitive. Based on this observation, we propose a privacy-protection framework that partitions a genomic computation, distributing the part on sensitive data to the data provider and the part on the pubic data to the user of the data. Such a partition is achieved through program specialization that enables a biocomputing program to perform a concrete execution on public data and a symbolic execution on sensitive data. As a result, the program is simplified into an efficient query program that takes only sensitive genetic data as inputs. We prove the effectiveness of our techniques on a set of dynamic programming algorithms common in genomic computing. We develop a program transformation tool that automatically instruments a legacy program for specialization operations. We also demonstrate that our techniques can greatly facilitate secure multi-party computations on large biocomputing problems.

