• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Biff (Bloom Filter) Codes: Fast Error Correction for Large Data Sets

by Michael Mitzenmacher, George Varghese
Add To MetaCart

Tools

Sorted by:
Results 1 - 7 of 7

SHO-FA: Robust compressive sensing with order-optimal complexity, measurements, and bits

by Mayank Bakshi, Sidharth Jaggi, Sheng Cai, Minghua Chen
"... Suppose x is any exactly k-sparse vector in �n. We present a class of “sparse” matrices A, and a corresponding algorithm that we call SHO-FA (for Short and Fast1) that, with high probability over A, can reconstruct x from Ax. The SHO-FA algorithm is related to the Invertible Bloom Lookup Tables (IBL ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
Suppose x is any exactly k-sparse vector in �n. We present a class of “sparse” matrices A, and a corresponding algorithm that we call SHO-FA (for Short and Fast1) that, with high probability over A, can reconstruct x from Ax. The SHO-FA algorithm is related to the Invertible Bloom Lookup Tables (IBLTs) recently introduced by Goodrich et al., with two important distinctions – SHO-FA relies on linear measurements, and is robust to noise and approximate sparsity. The SHO-FA algorithm is the first to simultaneously have the following properties: (a) it requires only O(k) measurements, (b) the bit-precision of each measurement and each arithmetic operation is O (log(n) + P) (here 2 −P corresponds to the desired relative error in the reconstruction of x), (c) the computational complexity of decoding is O(k) arithmetic operations, and (d) if the reconstruction goal is simply to recover a single component of x instead of all of x, with high probability over A this can be done in constant time. All constants above are independent of all problem parameters other than the desired probability of success. For a wide range of parameters these properties are information-theoretically order-optimal. In addition, our SHO-FA algorithm is robust to random noise, and (random) approximate sparsity for a large range of k. In particular, suppose the measured vector equals A(x+z)+e, where z and e correspond respectively to the source tail and measurement noise. Under reasonable statistical assumptions on z and e our decoding algorithm reconstructs x with an estimation error of O(||z||1 + (log k) 2 ||e||1). The SHO-FA algorithm works with high probability over A, z, and e, and still requires only O(k) steps and O(k) measurements over O(log(n))-bit numbers. This is in contrast to most existing algorithms which focus on the “worst-case" z model, where it is known Ω(k log(n/k)) measurements over O(log(n))-bit numbers are necessary.

Peeling Arguments and Double Hashing

by Michael Mitzenmacher, Justin Thaler
"... The analysis of several algorithms and data structures can be reduced to the analysis of the following greedy “peeling” process: start with a random hypergraph; find a vertex of degree at most k, and remove it and all of its adjacent hyperedges from the graph; repeat until there is no suitable vert ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
The analysis of several algorithms and data structures can be reduced to the analysis of the following greedy “peeling” process: start with a random hypergraph; find a vertex of degree at most k, and remove it and all of its adjacent hyperedges from the graph; repeat until there is no suitable vertex. This specific process finds the k-core of a hypergraph, and variations on this theme have proven useful in analyzing for example decoding from low-density parity-check codes, several hash-based data structures such as cuckoo hashing, and algorithms for satisfiability of random formulae. This approach can be analyzed several ways, with two common approaches being via a corresponding branching process or a fluid limit family of differential equations. In this paper, we make note of an interesting aspect of these types of processes: the results are generally the same when the randomness is structured in the manner of double hashing. This phenomenon allows us to use less randomness and simplify the implementation for several hash-based data structures and algorithms. We explore this approach from both an empirical and theoretical perspective, examining theoretical justifications as well as simulation results for specific problems.
(Show Context)

Citation Context

..., we find it slightly easier to analyze the graph Cdm,cm in which each of the ( m 2 ) hyperdges appears independently with probability p = cm/ ( m 2 ) . Asymptotically, the 1As an example, Biff codes =-=[21]-=- require an empty 2-core; however, a logarithmic sized 2-core could be handled by using a small amount of additional error-correction on the original data. graphs Cdm,cm and D d m,cm are equivalent up...

3 Finite Length Analysis on Listing Failure Probability of Invertible Bloom Lookup Tables

by Daichi Yugawa, Tadashi Wadayama
"... ar ..."
Abstract - Add to MetaCart
Abstract not found

Subspace Synchronization: A Network-Coding Approach to Object Reconciliation

by Vitaly Skachek , Michael G Rabbat
"... Abstract-Assume that two users possess two different subspaces of an ambient linear space. We show that the problem of synchronization of such vector spaces can be easily solved by an efficient algorithm. By building on this observation, we propose an algorithm for synchronization of two collection ..."
Abstract - Add to MetaCart
Abstract-Assume that two users possess two different subspaces of an ambient linear space. We show that the problem of synchronization of such vector spaces can be easily solved by an efficient algorithm. By building on this observation, we propose an algorithm for synchronization of two collections of binary files of length n each, stored in the cloud in a distributed manner. By further employing techniques akin to network coding, we propose a more efficient file synchronization algorithm that has communication complexity O(d · n) bits and computational complexity O(k 2 · n) operations, where k is the total number of files and d is the number of files that differ. The algorithm successfully reconciles two sets of files in 3 communication rounds with high probability.
(Show Context)

Citation Context

...g u), COMPUTATION(A) = O(d3) and TIME(A) = O(log k) with high probability. Here k is the total number of objects in possession of A and B, d is the number of objects possessed by only one user, and u is the size of the space where the objects are taken from. Another reconciliation algorithm, based on invertible Bloom filters, was recently proposed in [2], [3]. That algorithm has parameters COMMUNICATION(A) = O(d log u), COMPUTATION(A) = O(d) and TIME(A) = 3 (with high probability). Another recent algorithm performing set reconciliation with high probability by using Biff codes was proposed in [7]. That algorithm has parameters COMMUNICATION(A) = O(d log u), COMPUTATION(A) = O(k log u) and TIME(A) = 3. In this work, we build on the ideas from the area of network coding [1], [5]. In particular, we use the idea that information can be represented by vector spaces instead of vectors [4], [9]. For the case of two users, we first show in Section III that the problem of subspace syncronization can be easily solved by an efficient algorithm. Then, in Section IV, we extend that approach and propose a new, yet not practical, algorithm for synchronization of two collections of files by using a m...

Finite Length Analysis on Listing Failure Probability of Invertible Bloom Lookup Tables

by Daichi Yugawa, Tadashi Wadayama , 2013
"... ..."
Abstract - Add to MetaCart
Abstract not found

Hardness of Peeling with Stashes

by Michael Mitzenmacher, Vikram Nathan , 2014
"... The analysis of several algorithms and data structures can be framed as a peeling process on a random hypergraph: vertices with degree less than k and their adjacent edges are removed until no vertices of degree less than k are left. Often the question is whether the remaining hypergraph, the k-core ..."
Abstract - Add to MetaCart
The analysis of several algorithms and data structures can be framed as a peeling process on a random hypergraph: vertices with degree less than k and their adjacent edges are removed until no vertices of degree less than k are left. Often the question is whether the remaining hypergraph, the k-core, is empty or not. In some settings, it may be possible to remove either vertices or edges from the hypergraph before peeling, at some cost. For example, in hashing applications where keys correspond to edges and buckets to vertices, one might use an additional side data structure, commonly referred to as a stash, to separately handle some keys in order to avoid collisions. The natural question in such cases is to find the minimum number of edges (or vertices) that need to be stashed in order to realize an empty k-core. We show that both these problems are NP-complete for all k ≥ 2 on graphs and regular hypergraphs, with the sole exception being that the edge variant of stashing is solvable in polynomial time for k = 2 on standard (2-uniform) graphs.
(Show Context)

Citation Context

...efore uniquely defined and does not depend on the order vertices are removed in the peeling process. Peeling processes, and variations on it, have found applications in low-density parity-check codes =-=[7, 9]-=-, hash-based sketches [3, 5, 6], satisfiability of random boolean formulae [2, 8, 10], and cuckoo hashing [4, 11]. Usually in the design of these algorithms the primary question is whether or not the ...

Cache-Oblivious Peeling of Random Hypergraphs∗

by Djamal Belazzougui, Paolo Boldi, Rossano Venturini, Sebastiano Vigna
"... The computation of a peeling order in a randomly generated hypergraph is the most time-consuming step in a number of constructions, such as perfect hashing schemes, random r-SAT solvers, error-correcting codes, and approximate set encodings. While there exists a straightforward linear time algorithm ..."
Abstract - Add to MetaCart
The computation of a peeling order in a randomly generated hypergraph is the most time-consuming step in a number of constructions, such as perfect hashing schemes, random r-SAT solvers, error-correcting codes, and approximate set encodings. While there exists a straightforward linear time algorithm, its poor I/O performance makes it impractical for hypergraphs whose size exceeds the available internal memory. We show how to reduce the computation of a peeling order to a small number of sequential scans and sorts, and analyze its I/O complexity in the cache-oblivious model. The resulting algorithm requires O(sort(n)) I/Os and O(n logn) time to peel a random hypergraph with n edges. We experimentally evaluate the performance of our implementation of this algorithm in a real-world scenario by using the construction of minimal perfect hash functions (MPHF) as our test case: our algorithm builds a MPHF of 7.6 billion keys in less than 21 hours on a single machine. The resulting data structure is both more space-efficient and faster than that obtained with the current state-of-the-art MPHF construction for large-scale key sets.
(Show Context)

Citation Context

...d application in a number of fundamental problems, such as hash constructions [3, 6, 9, 10, 11, 12, 21], solving random instances of r-SAT [12, 24, 25], and the construction of error-correcting codes =-=[15, 20, 23]-=-. These applications exploit the guarantee that if the edge sparsity γ of a random r-hypergraph is larger than a certain sparsity threshold cr (e.g., c3 ≈ 1.221), then with high probability the hyperg...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2016 The Pennsylvania State University