Results 1  10
of
19
The computational hardness of estimating edit distance
 In Proceedings of the Symposium on Foundations of Computer Science
, 2007
"... We prove the first nontrivial communication complexity lower bound for the problem of estimating the edit distance (aka Levenshtein distance) between two strings. To the best of our knowledge, this is the first computational setting in which the complexity of computing the edit distance is provably ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
We prove the first nontrivial communication complexity lower bound for the problem of estimating the edit distance (aka Levenshtein distance) between two strings. To the best of our knowledge, this is the first computational setting in which the complexity of computing the edit distance is provably larger than that of Hamming distance. Our lower bound exhibits a tradeoff between approximation and communication, asserting, for example, that protocols with O(1) bits of communication can only obtain approximation α ≥ Ω(log d / log log d), where d is the length of the input strings. This case of O(1) communication is of particular importance since it captures constantsize sketches as well as embeddings into spaces like L1 and squaredL2, two prevailing algorithmic approaches for dealing with edit distance. Furthermore, the bound holds not only for strings over alphabet Σ = {0, 1}, but also for strings that are permutations (aka the Ulam metric). Besides being applicable to a much richer class of algorithms than all previous results, our bounds are neartight in at least one case, namely of embedding permutations into L1. The proof uses a new technique, that relies on Fourier analysis in a rather elementary way. 1
Property testing of regular tree languages
 IN PROCEEDINGS OF 31ST INTERNATIONAL COLLOQUIUM ON AUTOMATA, LANGUAGES AND PROGRAMMING
, 2004
"... We consider the Edit distance with moves on the class of words and the class of ordered trees. We first exhibit a simple tester for the class of regular languages on words and generalize it to the class of ranked regular trees. In the complete version of the paper, we show that the distance problem ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
We consider the Edit distance with moves on the class of words and the class of ordered trees. We first exhibit a simple tester for the class of regular languages on words and generalize it to the class of ranked regular trees. In the complete version of the paper, we show that the distance problem is NPcomplete on ordered trees.
Geometric Crossover for Sets, Multisets and Partitions
 In Proceedings of the Parallel Problem Solving from Nature Conference
, 2006
"... Abstract. This paper extends a geometric framework for interpreting crossover and mutation [5] to the case of sets and related representations. We show that a deep geometric duality exists between the set representation and the vector representation. This duality reveals the equivalence of geometric ..."
Abstract

Cited by 7 (7 self)
 Add to MetaCart
Abstract. This paper extends a geometric framework for interpreting crossover and mutation [5] to the case of sets and related representations. We show that a deep geometric duality exists between the set representation and the vector representation. This duality reveals the equivalence of geometric crossovers for these representations. 1
Approximating edit distance in nearlinear time
, 2009
"... We show how to compute the edit distance between two strings of length n up to a factor of 2 Õ( √ log n) in n 1+o(1) time. This is the first subpolynomial approximation algorithm for this problem that runs in nearlinear time, improving on the stateoftheart n 1/3+o(1) approximation. Previously, ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
We show how to compute the edit distance between two strings of length n up to a factor of 2 Õ( √ log n) in n 1+o(1) time. This is the first subpolynomial approximation algorithm for this problem that runs in nearlinear time, improving on the stateoftheart n 1/3+o(1) approximation. Previously, approximation of 2 Õ( √ log n) was known only for embedding edit distance into ℓ1, and it is not known if that embedding can be computed in less than a quadratic time.
Improved singleround protocols for remote file synchronization
 In Proc. of Infocom
, 2005
"... Abstract — Given two versions of a file, a current version located on one machine and an outdated version known only to another machine, the remote file synchronization problem is how to update the outdated version over a network with a minimal amount of communication. In particular, when the versio ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Abstract — Given two versions of a file, a current version located on one machine and an outdated version known only to another machine, the remote file synchronization problem is how to update the outdated version over a network with a minimal amount of communication. In particular, when the versions are very similar, the total data transmitted should be significantly smaller than the file size. File synchronization problems arise in many application scenarios such as web site mirroring, file system backup and replication, and web access over slow links. An open source tool for this problem, called rsync and included in many Linux distributions, is widely used in such scenarios. rsync uses a single round of messages between the two machines. While recent research has shown that significant additional savings in bandwidth consumption are possible through the use of optimized multiround protocols, there are many scenarios where multiple rounds are undesirable. In this paper, we study singleround protocols for file synchronization that offer significant improvements over rsync. Our main contribution is a new approach to file synchronization based on the use of erasure codes. Using this approach, we design a singleround protocol that is provably efficient with respect to common measures of file distance, and another optimized practical protocol that shows promising improvements over rsync on our data sets. In addition, we show how to obtain moderate improvements by engineering the rsync approach. I.
Stable distributions for stream computations: It’s as easy as 0,1,2
 In Workshop on Management and Processing of Massive Data Streams, at FCRC
, 2003
"... 1. Introduction A surprising number of data stream problems are solved bymethods involving computations with stable distributions. This ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
1. Introduction A surprising number of data stream problems are solved bymethods involving computations with stable distributions. This
Deltacast: efficient file reconciliation in wireless broadcast systems
 in MobiSys ’05: Proceedings of the 3rd international conference on Mobile systems, applications, and services
, 2005
"... Recently, there has been an increasing interest in wireless broadcast systems as a means to enable scalable content delivery to large numbers of mobile users. However, gracefully providing efficient reconciliation of different versions of a file over such broadcast channels still remains a challenge ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Recently, there has been an increasing interest in wireless broadcast systems as a means to enable scalable content delivery to large numbers of mobile users. However, gracefully providing efficient reconciliation of different versions of a file over such broadcast channels still remains a challenge. Such systems often lack a feedback channel and consequently updates cannot be easily tailored to a specific user. Moreover, given the potentially large number of possible versions of a file, it is impractical to send a tailored update for each particular user. In this paper we consider the problem of efficiently updating files in such wireless broadcast channels. To this extent, we present DeltaCast, a system that combines hierarchical hashes and erasure codes to minimise the amount of battery power and the amount of time needed to synchronise each mobile device. Based on our experimental results, we show that DeltaCast is able to efficiently identify the missing portions of a file and quickly updated each client. 1
New Sublinear Methods in the Struggle against Classical Problems
, 2010
"... We study the time and query complexity of approximation algorithms that access only a minuscule fraction of the input, focusing on two classical sources of problems: combinatorial graph optimization and manipulation of strings. The tools we develop find applications outside of the area of sublinear ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We study the time and query complexity of approximation algorithms that access only a minuscule fraction of the input, focusing on two classical sources of problems: combinatorial graph optimization and manipulation of strings. The tools we develop find applications outside of the area of sublinear algorithms. For instance, we obtain a more efficient approximation algorithm for edit distance and distributed algorithms for combinatorial problems on graphs that run in a constant number of communication rounds.