Results 1  10
of
32
Nonembeddability theorems via Fourier analysis
"... Various new nonembeddability results (mainly into L1) are proved via Fourier analysis. In particular, it is shown that the Edit Distance on {0, 1}d has L1 distortion (log d) 12o(1). We also give new lower bounds on the L1 distortion of flat tori, quotients of the discrete hypercube under group ac ..."
Abstract

Cited by 43 (9 self)
 Add to MetaCart
Various new nonembeddability results (mainly into L1) are proved via Fourier analysis. In particular, it is shown that the Edit Distance on {0, 1}d has L1 distortion (log d) 12o(1). We also give new lower bounds on the L1 distortion of flat tori, quotients of the discrete hypercube under group actions, and the transportation cost (Earthmover) metric.
IMPROVED LOWER BOUNDS FOR EMBEDDINGS INTO L1
 SIAM J. COMPUT.
, 2009
"... We improve upon recent lower bounds on the minimum distortion of embedding certain finite metric spaces into L1. In particular, we show that for every n ≥ 1, there is an npoint metric space of negative type that requires a distortion of Ω(log log n) for such an embedding, implying the same lower bo ..."
Abstract

Cited by 32 (5 self)
 Add to MetaCart
We improve upon recent lower bounds on the minimum distortion of embedding certain finite metric spaces into L1. In particular, we show that for every n ≥ 1, there is an npoint metric space of negative type that requires a distortion of Ω(log log n) for such an embedding, implying the same lower bound on the integrality gap of a wellknown semidefinite programming relaxation for sparsest cut. This result builds upon and improves the recent lower bound of (log log n) 1/6−o(1) due to Khot and Vishnoi [The unique games conjecture, integrality gap for cut problems and the embeddability of negative type metrics into l1, in Proceedings of the 46th Annual IEEE Symposium
Approximating edit distance efficiently
 In Proc. FOCS 2004
, 2004
"... Edit distance has been extensively studied for the past several years. Nevertheless, no lineartime algorithm is known to compute the edit distance between two strings, or even to approximate it to within a modest factor. Furthermore, for various natural algorithmic problems such as lowdistortion e ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
Edit distance has been extensively studied for the past several years. Nevertheless, no lineartime algorithm is known to compute the edit distance between two strings, or even to approximate it to within a modest factor. Furthermore, for various natural algorithmic problems such as lowdistortion embeddings into normed spaces, approximate nearestneighbor schemes, and sketching algorithms, known results for the edit distance are rather weak. We develop algorithms that solve gap versions of the edit distance problem: given two strings of length n with the promise that their edit distance is either at most k or greater than ℓ, decide which of the two holds. We present two sketching algorithms for gap versions of edit distance. Our first algorithm solves the k vs. (kn) 2/3 gap problem, using a constant size sketch. A more involved algorithm solves the stronger k vs. ℓ gap problem, where ℓ can be as small as O(k 2)—still with a constant sketch—but works only for strings that are mildly “nonrepetitive”. Finally, we develop an n 3/7approximation quasilinear time algorithm for edit distance, improving the previous best factor of n 3/4 [5]; if the input strings are assumed to be nonrepetitive, then the approximation factor can be strengthened to n 1/3. 1.
Algorithmic and Analysis Techniques in Property Testing
"... Property testing algorithms are “ultra”efficient algorithms that decide whether a given object (e.g., a graph) has a certain property (e.g., bipartiteness), or is significantly different from any object that has the property. To this end property testing algorithms are given the ability to perform ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
Property testing algorithms are “ultra”efficient algorithms that decide whether a given object (e.g., a graph) has a certain property (e.g., bipartiteness), or is significantly different from any object that has the property. To this end property testing algorithms are given the ability to perform (local) queries to the input, though the decision they need to make usually concern properties with a global nature. In the last two decades, property testing algorithms have been designed for many types of objects and properties, amongst them, graph properties, algebraic properties, geometric properties, and more. In this article we survey results in property testing, where our emphasis is on common analysis and algorithmic techniques. Among the techniques surveyed are the following: • The selfcorrecting approach, which was mainly applied in the study of property testing of algebraic properties; • The enforce and test approach, which was applied quite extensively in the analysis of algorithms for testing graph properties (in the densegraphs model), as well as in other contexts;
Low distortion embeddings for edit distance
 In Proceedings of the Symposium on Theory of Computing
, 2005
"... We show that {0, 1} d endowed with edit distance embeds into ℓ1 with distortion 2 O( √ log d log log d). We further show efficient implementations of the embedding that yield solutions to various computational problems involving edit distance. These include sketching, communication complexity, neare ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
We show that {0, 1} d endowed with edit distance embeds into ℓ1 with distortion 2 O( √ log d log log d). We further show efficient implementations of the embedding that yield solutions to various computational problems involving edit distance. These include sketching, communication complexity, nearest neighbor search. For all these problems, we improve upon previous bounds. 1
Sublinear time algorithms
 SIGACT News
, 2003
"... Abstract Sublinear time algorithms represent a new paradigm in computing, where an algorithmmust give some sort of an answer after inspecting only a very small portion of the input. We discuss the sorts of answers that one might be able to achieve in this new setting. 1 Introduction The goal of algo ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
Abstract Sublinear time algorithms represent a new paradigm in computing, where an algorithmmust give some sort of an answer after inspecting only a very small portion of the input. We discuss the sorts of answers that one might be able to achieve in this new setting. 1 Introduction The goal of algorithmic research is to design efficient algorithms, where efficiency is typicallymeasured as a function of the length of the input. For instance, the elementary school algorithm for multiplying two n digit integers takes roughly n2 steps, while more sophisticated algorithmshave been devised which run in less than n log2 n steps. It is still not known whether a linear time algorithm is achievable for integer multiplication. Obviously any algorithm for this task, as for anyother nontrivial task, would need to take at least linear time in n, since this is what it would take to read the entire input and write the output. Thus, showing the existence of a linear time algorithmfor a problem was traditionally considered to be the gold standard of achievement. Nevertheless, due to the recent tremendous increase in computational power that is inundatingus with a multitude of data, we are now encountering a paradigm shift from traditional computational models. The scale of these data sets, coupled with the typical situation in which there is verylittle time to perform our computations, raises the issue of whether there is time to consider any more than a miniscule fraction of the data in our computations? Analogous to the reasoning thatwe used for multiplication, for most natural problems, an algorithm which runs in sublinear time must necessarily use randomization and must give an answer which is in some sense imprecise.Nevertheless, there are many situations in which a fast approximate solution is more useful than a slower exact solution.
Estimating the sortedness of a data stream
 In Proceedings of the ACMSIAM Symposium on Discrete Algorithms
, 2007
"... The distance to monotonicity of a sequence is the minimum number of edit operations required to transform the sequence into an increasing order; this measure is complementary to the length of the longest increasing subsequence (LIS). We address the question of estimating these quantities in the one ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
The distance to monotonicity of a sequence is the minimum number of edit operations required to transform the sequence into an increasing order; this measure is complementary to the length of the longest increasing subsequence (LIS). We address the question of estimating these quantities in the onepass data stream model and present the first sublinear space algorithms for both problems. We first present O ( √ n)space deterministic algorithms that approximate the distance to monotonicity and the LIS to within a factor that is arbitrarily close to 1. We also show a lower bound of Ω(n) on the space required by any randomized algorithm to compute the LIS (or alternatively the distance from monotonicity) exactly, demonstrating that approximation is necessary for sublinear space computation; this bound improves upon the existing lower bound of Ω ( √ n) [LNVZ06]. Our main result is a randomized algorithm that uses only O(log 2 n) space and approximates the distance to monotonicity to within a factor that is arbitrarily close to 4. In contrast, we believe that any significant reduction in the space complexity for approximating the length of the LIS is considerably hard. We conjecture that any deterministic (1 + ɛ) approximation algorithm for LIS requires Ω ( √ n) space, and as a step towards this conjecture, prove a space lower bound of Ω ( √ n) for a restricted yet natural class of deterministic algorithms. 1
Overcoming the ℓ1 nonembeddability barrier: Algorithms for product metrics
, 2008
"... A common approach for solving computational problems over a difficult metric space is to embed the “hard ” metric into L1, which admits efficient algorithms and is thus considered an “easy ” metric. This approach has proved successful or partially successful for important spaces such as the edit dis ..."
Abstract

Cited by 17 (8 self)
 Add to MetaCart
A common approach for solving computational problems over a difficult metric space is to embed the “hard ” metric into L1, which admits efficient algorithms and is thus considered an “easy ” metric. This approach has proved successful or partially successful for important spaces such as the edit distance, but it also has inherent limitations: it is provably impossible to go below certain approximation for some metrics. We propose a new approach, of embedding the difficult space into richer host spaces, namely iterated products of standard spaces like ℓ1 and ℓ∞. We show that this class is rich since it contains useful metric spaces with only a constant distortion, and, at the same time, it is tractable and admits efficient algorithms. Using this approach, we obtain for example the first nearest neighbor data structure with O(log log d) approximation for edit distance in nonrepetitive strings (the Ulam metric). This approximation is exponentially better than the lower bound for embedding into L1. Furthermore, we give constant factor approximation for two other computational problems. Along the way, we answer positively a question posed in [Ajtai, Jayram, Kumar, and Sivakumar, STOC 2002]. One of our algorithms has already found applications for smoothed edit distance over 01 strings [Andoni and Krauthgamer, ICALP 2008]. 1
The computational hardness of estimating edit distance
 In Proceedings of the Symposium on Foundations of Computer Science
, 2007
"... We prove the first nontrivial communication complexity lower bound for the problem of estimating the edit distance (aka Levenshtein distance) between two strings. To the best of our knowledge, this is the first computational setting in which the complexity of computing the edit distance is provably ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
We prove the first nontrivial communication complexity lower bound for the problem of estimating the edit distance (aka Levenshtein distance) between two strings. To the best of our knowledge, this is the first computational setting in which the complexity of computing the edit distance is provably larger than that of Hamming distance. Our lower bound exhibits a tradeoff between approximation and communication, asserting, for example, that protocols with O(1) bits of communication can only obtain approximation α ≥ Ω(log d / log log d), where d is the length of the input strings. This case of O(1) communication is of particular importance since it captures constantsize sketches as well as embeddings into spaces like L1 and squaredL2, two prevailing algorithmic approaches for dealing with edit distance. Furthermore, the bound holds not only for strings over alphabet Σ = {0, 1}, but also for strings that are permutations (aka the Ulam metric). Besides being applicable to a much richer class of algorithms than all previous results, our bounds are neartight in at least one case, namely of embedding permutations into L1. The proof uses a new technique, that relies on Fourier analysis in a rather elementary way. 1
Estimating the weight of metric minimum spanning trees in sublineartime
 in Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC
"... In this paper we present a sublinear time (1+ ɛ)approximation randomized algorithm to estimate the weight of the minimum spanning tree of an npoint metric space. The running time of the algorithm is Õ(n/ɛO(1)). Since the full description of an npoint metric space is of size Θ(n 2),the complexity ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
In this paper we present a sublinear time (1+ ɛ)approximation randomized algorithm to estimate the weight of the minimum spanning tree of an npoint metric space. The running time of the algorithm is Õ(n/ɛO(1)). Since the full description of an npoint metric space is of size Θ(n 2),the complexity of our algorithm is sublinear with respect to the input size. Our algorithm is almost optimal as it is not possible to approximate in o(n) time the weight of the minimum spanning tree to within any factor. Furthermore,it has been previously shown that no o(n 2) algorithm exists that returns a spanning tree whose weight is within a constant times the optimum.