Results 1 
3 of
3
Homomorphic Fingerprints under Misalignments: Sketching Edit and Shift Distances
, 2013
"... Fingerprinting is a widelyused technique for efficiently verifying that two files are identical. More generally, linear sketching is a form of lossy compression (based on random projections) that also enables the “dissimilarity ” of nonidentical files to be estimated. Many sketches have been propos ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Fingerprinting is a widelyused technique for efficiently verifying that two files are identical. More generally, linear sketching is a form of lossy compression (based on random projections) that also enables the “dissimilarity ” of nonidentical files to be estimated. Many sketches have been proposed for dissimilarity measures that decompose coordinatewise such as the Hamming distance between alphanumeric strings, or the Euclidean distance between vectors. However, virtually nothing is known on sketches that would accommodate alignment errors. With such errors, Hamming or Euclidean distances are rendered useless: a small misalignment may result in a file that looks very dissimilar to the original file according such measures. In this paper, we present the first linear sketch that is robust to a small number of alignment errors. Specifically, the sketch can be used to determine whether two files are within a small Hamming distance of being a cyclic shift of each other. Furthermore, the sketch is homomorphic with respect to rotations: it is possible to construct the sketch of a cyclic shift of a file given only the sketch of the original file. The relevant dissimilarity measure, known as the shift distance, arises in the context of embedding edit distance and our result addressed an open problem [26, Question 13] with a rather surprising outcome. Our sketch projects a length n file into D(n) · polylog n dimensions where D(n) ≪ n is the number of divisors of n. The striking fact is that this is nearoptimal, i.e., the D(n) dependence is inherent to a problem that is ostensibly about lossy compression. In contrast, we then show that any sketch for estimating the edit distance between two files, even when small, requires sketches whose size is nearly linear in n. This lower bound addresses a longstanding open problem on the low distor
GROTESQUE: Noisy Group Testing (Quick and Efficient)
, 2013
"... Grouptesting refers to the problem of identifying (with high probability) a (small) subset of D defectives from a (large) set of N items via a “small ” number of “pooled ” tests (i.e., tests that have a positive outcome if at least one of the items being tested in the pool is defective, else have a ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Grouptesting refers to the problem of identifying (with high probability) a (small) subset of D defectives from a (large) set of N items via a “small ” number of “pooled ” tests (i.e., tests that have a positive outcome if at least one of the items being tested in the pool is defective, else have a negative outcome). For ease of presentation in this work we focus on the regime when D = O (N1−δ) for some δ> 0. The tests may be noiseless or noisy, and the testing procedure may be adaptive (the pool defining a test may depend on the outcome of a previous test), or nonadaptive (each test is performed independent of the outcome of other tests). A rich body of literature demonstrates that Θ(D log(N)) tests are informationtheoretically necessary and sufficient for the grouptesting problem, and provides algorithms that achieve this performance. However, it is only recently that reconstruction algorithms with computational complexity that is sublinear in N have started being investigated (recent work by [1], [2], [3] gave some of the first such algorithms). In the scenario with adaptive tests with noisy outcomes, we present the first scheme that is simultaneously orderoptimal (up to small constant factors) in both the number of tests and the decoding complexity (O (D log(N)) in both the performance metrics). The total number of stages of our adaptive algorithm is “small ” (O (log(D))). Similarly, in the scenario with nonadaptive tests with noisy outcomes, we present the first scheme that is simultaneously nearoptimal in both the number of tests and the decoding complexity (via an algorithm that requires O (D log(D) log(N)) tests and has a decoding complexity of O(D(logN + log2D)). Finally, we present an adaptive algorithm that only requires 2 stages, and for which both the number of tests and the decoding complexity scale as O(D(logN + log2D)). For all three settings the probability of error of our algorithms scales as O (1/(poly(D)).