Results 1 
7 of
7
Towards statistical queries over distributed private user data
 In proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation. USENIX
, 2012
"... To maintain the privacy of individual users ’ personal data, a growing number of researchers propose storing user data in client computers or personal data stores in the cloud, and allowing users to tightly control the release of that data. While this allows specific applications to use certain appr ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
To maintain the privacy of individual users ’ personal data, a growing number of researchers propose storing user data in client computers or personal data stores in the cloud, and allowing users to tightly control the release of that data. While this allows specific applications to use certain approved user data, it precludes broad statistical analysis of user data. Distributed differential privacy is one approach to enabling this analysis, but previous proposals are not practical in that they scale poorly, or that they require trusted clients. This paper proposes a design that overcomes these limitations. It places tight bounds on the extent to which malicious clients can distort answers, scales well, and tolerates churn among clients. This paper presents a detailed design and analysis, and gives performance results of a complete implementation based on the deployment of over 600 clients. 1
Efficient Sketches for the Set Query Problem ∗
"... We develop an algorithm for estimating the values of a vector x ∈ R n over a support S of size k from a randomized sparse binary linear sketch Ax of size O(k). Given Ax and S, we can recover x ′ with ‖x ′ − xS‖ 2 ≤ ɛ ‖x − xS‖ ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
We develop an algorithm for estimating the values of a vector x ∈ R n over a support S of size k from a randomized sparse binary linear sketch Ax of size O(k). Given Ax and S, we can recover x ′ with ‖x ′ − xS‖ 2 ≤ ɛ ‖x − xS‖
PrivacyEnhancing Technologies for Medical Tests Using Genomic Data
"... Abstract—We propose privacyenhancing technologies for medical tests and personalized medicine methods, which utilize patients’ genomic data. Focusing specifically on a typical diseasesusceptibility test, we develop a new architecture (between the patient and the medical unit) and propose a privacy ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Abstract—We propose privacyenhancing technologies for medical tests and personalized medicine methods, which utilize patients’ genomic data. Focusing specifically on a typical diseasesusceptibility test, we develop a new architecture (between the patient and the medical unit) and propose a privacypreserving algorithm by utilizing homomorphic encryption and proxy reencryption. Assuming the whole genome sequencing is done by a certified institution, we propose to store patients ’ genomic data encrypted by their public keys at a Storage and Processing Unit (SPU). The proposed algorithm lets the SPU process the encrypted genomic data for medical tests and personalized medicine methods while preserving the privacy of patients ’ genomic data. Furthermore, we implement and show via a complexity analysis the practicality of the proposed scheme. I.
What’s the Difference? Efficient Set Reconciliation without Prior Context
"... We describe a synopsis structure, the Difference Digest, that allows two nodes to compute the elements belonging to the set difference in a single round with communication overhead proportional to the size of the difference times the logarithm of the keyspace. While set reconciliation can be done ef ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We describe a synopsis structure, the Difference Digest, that allows two nodes to compute the elements belonging to the set difference in a single round with communication overhead proportional to the size of the difference times the logarithm of the keyspace. While set reconciliation can be done efficiently using logs, logs require overhead for every update and scale poorly when multiple users are to be reconciled. By contrast, our abstraction assumes no prior context and is useful in networking and distributed systems applications such as trading blocks in a peertopeer network, and synchronizing linkstate databases after a partition. Our basic setreconciliation method has a similarity with the peeling algorithm used in Tornado codes [6], which is not surprising, as there is an intimate connection between set difference and coding. Beyond set reconciliation, an essential component in our Difference Digest is a new estimator for the size of the set difference that outperforms minwise sketches [3] for small set differences. Our experiments show that the Difference Digest is more efficient than prior approaches such as Approximate Reconciliation Trees [5] and Characteristic Polynomial Interpolation [17]. We use Difference Digests to implement a generic KeyDiff service in Linux that runs over TCP and returns the sets of keys that differ between machines.
Data stream algorithms via expander graphs
 In 19th International Symposium on Algorithms and Computation (ISAAC
, 2008
"... Abstract. We present a simple way of designing deterministic algorithms for problems in the data stream model via lossless expander graphs. We illustrate this by considering two problems, namely, ksparsity testing and estimating frequency of items. 1 ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. We present a simple way of designing deterministic algorithms for problems in the data stream model via lossless expander graphs. We illustrate this by considering two problems, namely, ksparsity testing and estimating frequency of items. 1
Biff (Bloom Filter) Codes: Fast Error Correction for Large Data Sets
"... Abstract—Large data sets are increasingly common in cloud and virtualized environments. For example, transfers of multiple gigabytes are commonplace, as are replicated blocks of such sizes. There is a need for fast errorcorrection or data reconciliation in such settings even when the expected numbe ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract—Large data sets are increasingly common in cloud and virtualized environments. For example, transfers of multiple gigabytes are commonplace, as are replicated blocks of such sizes. There is a need for fast errorcorrection or data reconciliation in such settings even when the expected number of errors is small. Motivated by such cloud reconciliation problems, we consider errorcorrection schemes designed for large data, after explaining why previous approaches appear unsuitable. We introduce Biff codes, which are based on Bloom filters and are designed for large data. For Biff codes with a message of length L and E errors, the encoding time is O(L), decoding time is O(L + E) and the space overhead is O(E). Biff codes are lowdensity paritycheck codes; they are similar to Tornado codes, but are designed for errors instead of erasures. Further, Biff codes are designed to be very simple, removing any explicit graph structures and based entirely on hash tables. We derive Biff codes by a simple reduction from a set reconciliation algorithm for a recently developed data structure, invertible Bloom lookup tables. While the underlying theory is extremely simple, what makes this code especially attractive is the ease with which it can be implemented and the speed of decoding. We present results from a prototype implementation that decodes messages of 1 million words with thousands of errors in well under a second. I.
Sparse Recovery and Fourier Sampling
, 2013
"... the last decade a broad literature has arisen studying sparse recovery, the estimation of sparse vectors from low dimensional linear projections. Sparse recovery has a wide variety of applications such as streaming algorithms, image acquisition, and disease testing. A particularly important subclass ..."
Abstract
 Add to MetaCart
the last decade a broad literature has arisen studying sparse recovery, the estimation of sparse vectors from low dimensional linear projections. Sparse recovery has a wide variety of applications such as streaming algorithms, image acquisition, and disease testing. A particularly important subclass of sparse recovery is the sparse Fourier transform, which considers the computation of a discrete Fourier transform when the output is sparse. Applications of the sparse Fourier transform include medical imaging, spectrum sensing, and purely computation tasks involving convolution. This thesis describes a coherent set of techniques that achieve optimal or nearoptimal upper and lower bounds for a variety of sparse recovery problems. We give the following stateoftheart algorithms for recovery of an approximately ksparse vector in n dimensions: • Two sparse Fourier transform algorithms, respectively taking O(k log n log(n/k)) time and O(k log n log c log n) samples. The latter is within log c log n of the optimal sample complexity when k < n 1−ɛ. • An algorithm for adaptive sparse recovery using O(k log log(n/k)) measurements, showing