Results 1 - 10
of
25
Nonembeddability theorems via Fourier analysis
"... Various new nonembeddability results (mainly into L1) are proved via Fourier analysis. In particular, it is shown that the Edit Distance on {0, 1}d has L1 distortion (log d) 12-o(1). We also give new lower bounds on the L1 distortion of flat tori, quotients of the discrete hypercube under group ac ..."
Abstract
-
Cited by 34 (8 self)
- Add to MetaCart
Various new nonembeddability results (mainly into L1) are proved via Fourier analysis. In particular, it is shown that the Edit Distance on {0, 1}d has L1 distortion (log d) 12-o(1). We also give new lower bounds on the L1 distortion of flat tori, quotients of the discrete hypercube under group actions, and the transportation cost (Earthmover) metric.
Approximating edit distance efficiently
- In Proc. FOCS 2004
, 2004
"... Edit distance has been extensively studied for the past several years. Nevertheless, no linear-time algorithm is known to compute the edit distance between two strings, or even to approximate it to within a modest factor. Furthermore, for various natural algorithmic problems such as low-distortion e ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
Edit distance has been extensively studied for the past several years. Nevertheless, no linear-time algorithm is known to compute the edit distance between two strings, or even to approximate it to within a modest factor. Furthermore, for various natural algorithmic problems such as low-distortion embeddings into normed spaces, approximate nearest-neighbor schemes, and sketching algorithms, known results for the edit distance are rather weak. We develop algorithms that solve gap versions of the edit distance problem: given two strings of length n with the promise that their edit distance is either at most k or greater than ℓ, decide which of the two holds. We present two sketching algorithms for gap versions of edit distance. Our first algorithm solves the k vs. (kn) 2/3 gap problem, using a constant size sketch. A more involved algorithm solves the stronger k vs. ℓ gap problem, where ℓ can be as small as O(k 2)—still with a constant sketch—but works only for strings that are mildly “non-repetitive”. Finally, we develop an n 3/7-approximation quasi-linear time algorithm for edit distance, improving the previous best factor of n 3/4 [5]; if the input strings are assumed to be non-repetitive, then the approximation factor can be strengthened to n 1/3. 1.
IMPROVED LOWER BOUNDS FOR EMBEDDINGS INTO L1
- SIAM J. COMPUT.
, 2009
"... We improve upon recent lower bounds on the minimum distortion of embedding certain finite metric spaces into L1. In particular, we show that for every n ≥ 1, there is an n-point metric space of negative type that requires a distortion of Ω(log log n) for such an embedding, implying the same lower bo ..."
Abstract
-
Cited by 25 (4 self)
- Add to MetaCart
We improve upon recent lower bounds on the minimum distortion of embedding certain finite metric spaces into L1. In particular, we show that for every n ≥ 1, there is an n-point metric space of negative type that requires a distortion of Ω(log log n) for such an embedding, implying the same lower bound on the integrality gap of a well-known semidefinite programming relaxation for sparsest cut. This result builds upon and improves the recent lower bound of (log log n) 1/6−o(1) due to Khot and Vishnoi [The unique games conjecture, integrality gap for cut problems and the embeddability of negative type metrics into l1, in Proceedings of the 46th Annual IEEE Symposium
Low distortion embeddings for edit distance
- In Proceedings of the Symposium on Theory of Computing
, 2005
"... We show that {0, 1} d endowed with edit distance embeds into ℓ1 with distortion 2 O( √ log d log log d). We further show efficient implementations of the embedding that yield solutions to various computational problems involving edit distance. These include sketching, communication complexity, neare ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
We show that {0, 1} d endowed with edit distance embeds into ℓ1 with distortion 2 O( √ log d log log d). We further show efficient implementations of the embedding that yield solutions to various computational problems involving edit distance. These include sketching, communication complexity, nearest neighbor search. For all these problems, we improve upon previous bounds. 1
Estimating the sortedness of a data stream
- In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms
, 2007
"... The distance to monotonicity of a sequence is the minimum number of edit operations required to transform the sequence into an increasing order; this measure is complementary to the length of the longest increasing subsequence (LIS). We address the question of estimating these quantities in the one- ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
The distance to monotonicity of a sequence is the minimum number of edit operations required to transform the sequence into an increasing order; this measure is complementary to the length of the longest increasing subsequence (LIS). We address the question of estimating these quantities in the one-pass data stream model and present the first sub-linear space algorithms for both problems. We first present O ( √ n)-space deterministic algorithms that approximate the distance to monotonicity and the LIS to within a factor that is arbitrarily close to 1. We also show a lower bound of Ω(n) on the space required by any randomized algorithm to compute the LIS (or alternatively the distance from monotonicity) exactly, demonstrating that approximation is necessary for sub-linear space computation; this bound improves upon the existing lower bound of Ω ( √ n) [LNVZ06]. Our main result is a randomized algorithm that uses only O(log 2 n) space and approximates the distance to monotonicity to within a factor that is arbitrarily close to 4. In contrast, we believe that any significant reduction in the space complexity for approximating the length of the LIS is considerably hard. We conjecture that any deterministic (1 + ɛ) approximation algorithm for LIS requires Ω ( √ n) space, and as a step towards this conjecture, prove a space lower bound of Ω ( √ n) for a restricted yet natural class of deterministic algorithms. 1
The computational hardness of estimating edit distance
- In Proceedings of the Symposium on Foundations of Computer Science
, 2007
"... We prove the first non-trivial communication complexity lower bound for the problem of estimating the edit distance (aka Levenshtein distance) between two strings. To the best of our knowledge, this is the first computational setting in which the complexity of computing the edit distance is provably ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
We prove the first non-trivial communication complexity lower bound for the problem of estimating the edit distance (aka Levenshtein distance) between two strings. To the best of our knowledge, this is the first computational setting in which the complexity of computing the edit distance is provably larger than that of Hamming distance. Our lower bound exhibits a trade-off between approximation and communication, asserting, for example, that protocols with O(1) bits of communication can only obtain approximation α ≥ Ω(log d / log log d), where d is the length of the input strings. This case of O(1) communication is of particular importance since it captures constant-size sketches as well as embeddings into spaces like L1 and squared-L2, two prevailing algorithmic approaches for dealing with edit distance. Furthermore, the bound holds not only for strings over alphabet Σ = {0, 1}, but also for strings that are permutations (aka the Ulam metric). Besides being applicable to a much richer class of algorithms than all previous results, our bounds are near-tight in at least one case, namely of embedding permutations into L1. The proof uses a new technique, that relies on Fourier analysis in a rather elementary way. 1
Estimating the weight of metric minimum spanning trees in sublinear-time
- in Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC
"... In this paper we present a sublinear time (1+ ɛ)-approximation randomized algorithm to estimate the weight of the minimum spanning tree of an n-point metric space. The running time of the algorithm is Õ(n/ɛO(1)). Since the full description of an n-point metric space is of size Θ(n 2),the complexity ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
In this paper we present a sublinear time (1+ ɛ)-approximation randomized algorithm to estimate the weight of the minimum spanning tree of an n-point metric space. The running time of the algorithm is Õ(n/ɛO(1)). Since the full description of an n-point metric space is of size Θ(n 2),the complexity of our algorithm is sublinear with respect to the input size. Our algorithm is almost optimal as it is not possible to approximate in o(n) time the weight of the minimum spanning tree to within any factor. Furthermore,it has been previously shown that no o(n 2) algorithm exists that returns a spanning tree whose weight is within a constant times the optimum.
Overcoming the ℓ1 non-embeddability barrier: Algorithms for product metrics
, 2008
"... A common approach for solving computational problems over a difficult metric space is to embed the “hard ” metric into L1, which admits efficient algorithms and is thus considered an “easy ” metric. This approach has proved successful or partially successful for important spaces such as the edit dis ..."
Abstract
-
Cited by 14 (8 self)
- Add to MetaCart
A common approach for solving computational problems over a difficult metric space is to embed the “hard ” metric into L1, which admits efficient algorithms and is thus considered an “easy ” metric. This approach has proved successful or partially successful for important spaces such as the edit distance, but it also has inherent limitations: it is provably impossible to go below certain approximation for some metrics. We propose a new approach, of embedding the difficult space into richer host spaces, namely iterated products of standard spaces like ℓ1 and ℓ∞. We show that this class is rich since it contains useful metric spaces with only a constant distortion, and, at the same time, it is tractable and admits efficient algorithms. Using this approach, we obtain for example the first nearest neighbor data structure with O(log log d) approximation for edit distance in nonrepetitive strings (the Ulam metric). This approximation is exponentially better than the lower bound for embedding into L1. Furthermore, we give constant factor approximation for two other computational problems. Along the way, we answer positively a question posed in [Ajtai, Jayram, Kumar, and Sivakumar, STOC 2002]. One of our algorithms has already found applications for smoothed edit distance over 0-1 strings [Andoni and Krauthgamer, ICALP 2008]. 1
Sublinear time algorithms
- SIGACT News
, 2003
"... Abstract Sublinear time algorithms represent a new paradigm in computing, where an algorithmmust give some sort of an answer after inspecting only a very small portion of the input. We discuss the sorts of answers that one might be able to achieve in this new setting. 1 Introduction The goal of algo ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Abstract Sublinear time algorithms represent a new paradigm in computing, where an algorithmmust give some sort of an answer after inspecting only a very small portion of the input. We discuss the sorts of answers that one might be able to achieve in this new setting. 1 Introduction The goal of algorithmic research is to design efficient algorithms, where efficiency is typicallymeasured as a function of the length of the input. For instance, the elementary school algorithm for multiplying two n digit integers takes roughly n2 steps, while more sophisticated algorithmshave been devised which run in less than n log2 n steps. It is still not known whether a linear time algorithm is achievable for integer multiplication. Obviously any algorithm for this task, as for anyother nontrivial task, would need to take at least linear time in n, since this is what it would take to read the entire input and write the output. Thus, showing the existence of a linear time algorithmfor a problem was traditionally considered to be the gold standard of achievement. Nevertheless, due to the recent tremendous increase in computational power that is inundatingus with a multitude of data, we are now encountering a paradigm shift from traditional computational models. The scale of these data sets, coupled with the typical situation in which there is verylittle time to perform our computations, raises the issue of whether there is time to consider any more than a miniscule fraction of the data in our computations? Analogous to the reasoning thatwe used for multiplication, for most natural problems, an algorithm which runs in sublinear time must necessarily use randomization and must give an answer which is in some sense imprecise.Nevertheless, there are many situations in which a fast approximate solution is more useful than a slower exact solution.
Algorithmic and Analysis Techniques in Property Testing
"... Property testing algorithms are “ultra”-efficient algorithms that decide whether a given object (e.g., a graph) has a certain property (e.g., bipartiteness), or is significantly different from any object that has the property. To this end property testing algorithms are given the ability to perform ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Property testing algorithms are “ultra”-efficient algorithms that decide whether a given object (e.g., a graph) has a certain property (e.g., bipartiteness), or is significantly different from any object that has the property. To this end property testing algorithms are given the ability to perform (local) queries to the input, though the decision they need to make usually concern properties with a global nature. In the last two decades, property testing algorithms have been designed for many types of objects and properties, amongst them, graph properties, algebraic properties, geometric properties, and more. In this article we survey results in property testing, where our emphasis is on common analysis and algorithmic techniques. Among the techniques surveyed are the following: • The self-correcting approach, which was mainly applied in the study of property testing of algebraic properties; • The enforce and test approach, which was applied quite extensively in the analysis of algorithms for testing graph properties (in the dense-graphs model), as well as in other contexts;

