Results 1  10
of
21
The computational hardness of estimating edit distance
 In Proceedings of the Symposium on Foundations of Computer Science
, 2007
"... We prove the first nontrivial communication complexity lower bound for the problem of estimating the edit distance (aka Levenshtein distance) between two strings. To the best of our knowledge, this is the first computational setting in which the complexity of computing the edit distance is provably ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
We prove the first nontrivial communication complexity lower bound for the problem of estimating the edit distance (aka Levenshtein distance) between two strings. To the best of our knowledge, this is the first computational setting in which the complexity of computing the edit distance is provably larger than that of Hamming distance. Our lower bound exhibits a tradeoff between approximation and communication, asserting, for example, that protocols with O(1) bits of communication can only obtain approximation α ≥ Ω(log d / log log d), where d is the length of the input strings. This case of O(1) communication is of particular importance since it captures constantsize sketches as well as embeddings into spaces like L1 and squaredL2, two prevailing algorithmic approaches for dealing with edit distance. Furthermore, the bound holds not only for strings over alphabet Σ = {0, 1}, but also for strings that are permutations (aka the Ulam metric). Besides being applicable to a much richer class of algorithms than all previous results, our bounds are neartight in at least one case, namely of embedding permutations into L1. The proof uses a new technique, that relies on Fourier analysis in a rather elementary way. 1
Property testing of regular tree languages
 IN PROCEEDINGS OF 31ST INTERNATIONAL COLLOQUIUM ON AUTOMATA, LANGUAGES AND PROGRAMMING
, 2004
"... We consider the Edit distance with moves on the class of words and the class of ordered trees. We first exhibit a simple tester for the class of regular languages on words and generalize it to the class of ranked regular trees. In the complete version of the paper, we show that the distance problem ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
We consider the Edit distance with moves on the class of words and the class of ordered trees. We first exhibit a simple tester for the class of regular languages on words and generalize it to the class of ranked regular trees. In the complete version of the paper, we show that the distance problem is NPcomplete on ordered trees.
Improved singleround protocols for remote file synchronization
 In Proc. of Infocom
, 2005
"... Abstract — Given two versions of a file, a current version located on one machine and an outdated version known only to another machine, the remote file synchronization problem is how to update the outdated version over a network with a minimal amount of communication. In particular, when the versio ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Abstract — Given two versions of a file, a current version located on one machine and an outdated version known only to another machine, the remote file synchronization problem is how to update the outdated version over a network with a minimal amount of communication. In particular, when the versions are very similar, the total data transmitted should be significantly smaller than the file size. File synchronization problems arise in many application scenarios such as web site mirroring, file system backup and replication, and web access over slow links. An open source tool for this problem, called rsync and included in many Linux distributions, is widely used in such scenarios. rsync uses a single round of messages between the two machines. While recent research has shown that significant additional savings in bandwidth consumption are possible through the use of optimized multiround protocols, there are many scenarios where multiple rounds are undesirable. In this paper, we study singleround protocols for file synchronization that offer significant improvements over rsync. Our main contribution is a new approach to file synchronization based on the use of erasure codes. Using this approach, we design a singleround protocol that is provably efficient with respect to common measures of file distance, and another optimized practical protocol that shows promising improvements over rsync on our data sets. In addition, we show how to obtain moderate improvements by engineering the rsync approach. I.
Stable distributions for stream computations: It’s as easy as 0,1,2
 In Workshop on Management and Processing of Massive Data Streams, at FCRC
, 2003
"... 1. Introduction A surprising number of data stream problems are solved bymethods involving computations with stable distributions. This ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
1. Introduction A surprising number of data stream problems are solved bymethods involving computations with stable distributions. This
Geometric Crossover for Sets, Multisets and Partitions
 In Proceedings of the Parallel Problem Solving from Nature Conference
, 2006
"... Abstract. This paper extends a geometric framework for interpreting crossover and mutation [5] to the case of sets and related representations. We show that a deep geometric duality exists between the set representation and the vector representation. This duality reveals the equivalence of geometric ..."
Abstract

Cited by 7 (7 self)
 Add to MetaCart
Abstract. This paper extends a geometric framework for interpreting crossover and mutation [5] to the case of sets and related representations. We show that a deep geometric duality exists between the set representation and the vector representation. This duality reveals the equivalence of geometric crossovers for these representations. 1
Approximating edit distance in nearlinear time
, 2009
"... We show how to compute the edit distance between two strings of length n up to a factor of 2 Õ( √ log n) in n 1+o(1) time. This is the first subpolynomial approximation algorithm for this problem that runs in nearlinear time, improving on the stateoftheart n 1/3+o(1) approximation. Previously, ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
We show how to compute the edit distance between two strings of length n up to a factor of 2 Õ( √ log n) in n 1+o(1) time. This is the first subpolynomial approximation algorithm for this problem that runs in nearlinear time, improving on the stateoftheart n 1/3+o(1) approximation. Previously, approximation of 2 Õ( √ log n) was known only for embedding edit distance into ℓ1, and it is not known if that embedding can be computed in less than a quadratic time.
Deltacast: efficient file reconciliation in wireless broadcast systems
 in MobiSys ’05: Proceedings of the 3rd international conference on Mobile systems, applications, and services
, 2005
"... Recently, there has been an increasing interest in wireless broadcast systems as a means to enable scalable content delivery to large numbers of mobile users. However, gracefully providing efficient reconciliation of different versions of a file over such broadcast channels still remains a challenge ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Recently, there has been an increasing interest in wireless broadcast systems as a means to enable scalable content delivery to large numbers of mobile users. However, gracefully providing efficient reconciliation of different versions of a file over such broadcast channels still remains a challenge. Such systems often lack a feedback channel and consequently updates cannot be easily tailored to a specific user. Moreover, given the potentially large number of possible versions of a file, it is impractical to send a tailored update for each particular user. In this paper we consider the problem of efficiently updating files in such wireless broadcast channels. To this extent, we present DeltaCast, a system that combines hierarchical hashes and erasure codes to minimise the amount of battery power and the amount of time needed to synchronise each mobile device. Based on our experimental results, we show that DeltaCast is able to efficiently identify the missing portions of a file and quickly updated each client. 1
Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity
, 2010
"... We present a nearlinear time algorithm that approximates the edit distance between two strings within a polylogarithmic factor; specifically, for strings of length n and every fixed ε> 0, it can compute a (log n) O(1/ε) approximation in n 1+ε time. This is an exponential improvement over the pre ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We present a nearlinear time algorithm that approximates the edit distance between two strings within a polylogarithmic factor; specifically, for strings of length n and every fixed ε> 0, it can compute a (log n) O(1/ε) approximation in n 1+ε time. This is an exponential improvement over the previously known factor, 2 Õ( √ log n), with a comparable running time [OR07, AO09]. Previously, no efficient polylogarithmic approximation algorithm was known for any computational task involving edit distance (e.g., nearest neighbor search or sketching). This result arises naturally in the study of a new asymmetric query model. In this model, the input consists of two strings x and y, and an algorithm can access y in an unrestricted manner, while being charged for querying every symbol of x. Indeed, we obtain our main result by designing an algorithm that makes a small number of queries in this model. We then provide a nearlymatching lower bound on the number of queries. Our lower bound is the first to expose hardness of edit distance stemming from the input strings being “repetitive”, which means that many of their substrings are approximately identical. Consequently, our lower bound provides the first rigorous separation between edit distance and Ulam distance, which is edit distance on nonrepetitive strings, such as permutations.
New Sublinear Methods in the Struggle against Classical Problems
, 2010
"... We study the time and query complexity of approximation algorithms that access only a minuscule fraction of the input, focusing on two classical sources of problems: combinatorial graph optimization and manipulation of strings. The tools we develop find applications outside of the area of sublinear ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We study the time and query complexity of approximation algorithms that access only a minuscule fraction of the input, focusing on two classical sources of problems: combinatorial graph optimization and manipulation of strings. The tools we develop find applications outside of the area of sublinear algorithms. For instance, we obtain a more efficient approximation algorithm for edit distance and distributed algorithms for combinatorial problems on graphs that run in a constant number of communication rounds.