Results 1 -
5 of
5
Towards statistical queries over distributed private user data
- In proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation. USENIX
, 2012
"... To maintain the privacy of individual users ’ personal data, a growing number of researchers propose storing user data in client computers or personal data stores in the cloud, and allowing users to tightly control the release of that data. While this allows specific applications to use certain appr ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
To maintain the privacy of individual users ’ personal data, a growing number of researchers propose storing user data in client computers or personal data stores in the cloud, and allowing users to tightly control the release of that data. While this allows specific applications to use certain approved user data, it precludes broad statistical analysis of user data. Distributed differential privacy is one approach to enabling this analysis, but previous proposals are not practical in that they scale poorly, or that they require trusted clients. This paper proposes a design that overcomes these limitations. It places tight bounds on the extent to which malicious clients can distort answers, scales well, and tolerates churn among clients. This paper presents a detailed design and analysis, and gives performance results of a complete implementation based on the deployment of over 600 clients. 1
What’s the Difference? Efficient Set Reconciliation without Prior Context
"... We describe a synopsis structure, the Difference Digest, that allows two nodes to compute the elements belonging to the set difference in a single round with communication overhead proportional to the size of the difference times the logarithm of the keyspace. While set reconciliation can be done ef ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We describe a synopsis structure, the Difference Digest, that allows two nodes to compute the elements belonging to the set difference in a single round with communication overhead proportional to the size of the difference times the logarithm of the keyspace. While set reconciliation can be done efficiently using logs, logs require overhead for every update and scale poorly when multiple users are to be reconciled. By contrast, our abstraction assumes no prior context and is useful in networking and distributed systems applications such as trading blocks in a peer-to-peer network, and synchronizing link-state databases after a partition. Our basic set-reconciliation method has a similarity with the peeling algorithm used in Tornado codes [6], which is not surprising, as there is an intimate connection between set difference and coding. Beyond set reconciliation, an essential component in our Difference Digest is a new estimator for the size of the set difference that outperforms min-wise sketches [3] for small set differences. Our experiments show that the Difference Digest is more efficient than prior approaches such as Approximate Reconciliation Trees [5] and Characteristic Polynomial Interpolation [17]. We use Difference Digests to implement a generic KeyDiff service in Linux that runs over TCP and returns the sets of keys that differ between machines.
Efficient Sketches for the Set Query Problem ∗
"... We develop an algorithm for estimating the values of a vector x ∈ R n over a support S of size k from a randomized sparse binary linear sketch Ax of size O(k). Given Ax and S, we can recover x ′ with ‖x ′ − xS‖ 2 ≤ ɛ ‖x − xS‖ ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We develop an algorithm for estimating the values of a vector x ∈ R n over a support S of size k from a randomized sparse binary linear sketch Ax of size O(k). Given Ax and S, we can recover x ′ with ‖x ′ − xS‖ 2 ≤ ɛ ‖x − xS‖
Data stream algorithms via expander graphs
- In 19th International Symposium on Algorithms and Computation (ISAAC
, 2008
"... Abstract. We present a simple way of designing deterministic algorithms for problems in the data stream model via lossless expander graphs. We illustrate this by considering two problems, namely, k-sparsity testing and estimating frequency of items. 1 ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. We present a simple way of designing deterministic algorithms for problems in the data stream model via lossless expander graphs. We illustrate this by considering two problems, namely, k-sparsity testing and estimating frequency of items. 1
Biff (Bloom Filter) Codes: Fast Error Correction for Large Data Sets
"... Abstract—Large data sets are increasingly common in cloud and virtualized environments. For example, transfers of multiple gigabytes are commonplace, as are replicated blocks of such sizes. There is a need for fast error-correction or data reconciliation in such settings even when the expected numbe ..."
Abstract
- Add to MetaCart
Abstract—Large data sets are increasingly common in cloud and virtualized environments. For example, transfers of multiple gigabytes are commonplace, as are replicated blocks of such sizes. There is a need for fast error-correction or data reconciliation in such settings even when the expected number of errors is small. Motivated by such cloud reconciliation problems, we consider error-correction schemes designed for large data, after explaining why previous approaches appear unsuitable. We introduce Biff codes, which are based on Bloom filters and are designed for large data. For Biff codes with a message of length L and E errors, the encoding time is O(L), decoding time is O(L + E) and the space overhead is O(E). Biff codes are low-density paritycheck codes; they are similar to Tornado codes, but are designed for errors instead of erasures. Further, Biff codes are designed to be very simple, removing any explicit graph structures and based entirely on hash tables. We derive Biff codes by a simple reduction from a set reconciliation algorithm for a recently developed data structure, invertible Bloom lookup tables. While the underlying theory is extremely simple, what makes this code especially attractive is the ease with which it can be implemented and the speed of decoding. We present results from a prototype implementation that decodes messages of 1 million words with thousands of errors in well under a second. I.

