Results 11  20
of
22
Shift Finding in Sublinear Time
, 2004
"... We study the following basic pattern matching problem. Consider a “code” sequence c consisting of n bits chosen uniformly at random, and a “signal ” sequence x obtained by shifting c (modulo n) and adding noise. The goal is to efficiently recover the shift with high probability. The problem models t ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
We study the following basic pattern matching problem. Consider a “code” sequence c consisting of n bits chosen uniformly at random, and a “signal ” sequence x obtained by shifting c (modulo n) and adding noise. The goal is to efficiently recover the shift with high probability. The problem models tasks of interest in several applications, including GPS synchronization and motion estimation. We present an algorithm that solves the problem in time Õ(n(f/(1+f)), where Õ(N f) is the running time of the best algorithm for finding the closest pair among N “random" sequences of length O(log N). A trivial bound of f = 2 leads to a simple algorithm with a running time of Õ(n2/3). The asymptotic running time can be further improved by plugging in recent more efficient algorithms for the closest pair problem. Our results also yield a sublinear time algorithm for approximate pattern matching algorithm for a random signal (text), even for the case when the error between the signal and the code (pattern) is asymptotically as large as the code size. This is the first sublinear time algorithm for such error rates.
Earth Mover’s Distance based Similarity Search at Scale
"... Earth Mover’s Distance (EMD), as a similarity measure, has received a lot of attention in the fields of multimedia and probabilistic databases, computer vision, image retrieval, machine learning, etc. EMD on multidimensional histograms provides better distinguishability between the objects approxima ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Earth Mover’s Distance (EMD), as a similarity measure, has received a lot of attention in the fields of multimedia and probabilistic databases, computer vision, image retrieval, machine learning, etc. EMD on multidimensional histograms provides better distinguishability between the objects approximated by the histograms (e.g., images), compared to classic measures like Euclidean distance. Despite its usefulness, EMD has a high computational cost; therefore, a number of effective filtering methods have been proposed, to reduce the pairs of histograms for which the exact EMD has to be computed, during similarity search. Still, EMD calculations in the refinement step remain the bottleneck of the whole similarity search process. In this paper, we focus on optimizing the refinement phase of EMDbased similarity search by (i) adapting an efficient mincost flow algorithm (SIA) for EMD computation, (ii) proposing a dynamic distance bound, which can be used to terminate an EMD refinement early, and (iii) proposing a dynamic refinement order for the candidates which, paired with a concurrent EMD refinement strategy, reduces the amount of needless computations. Our proposed techniques are orthogonal to and can be easily integrated with the stateoftheart filtering techniques, reducing the cost of EMDbased similarity queries by orders of magnitude. 1.
1An Explicit Formulation of the Earth Mover’s Distance with Continuous Road Map Distances
"... The Earth mover’s distance (EMD) is a measure of distance between probability distributions which is at the heart of mass transportation theory. Recent research has shown that the EMD plays a crucial role in studying the potential impact of DemandResponsive Transportation (DRT) and MobilityonDema ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The Earth mover’s distance (EMD) is a measure of distance between probability distributions which is at the heart of mass transportation theory. Recent research has shown that the EMD plays a crucial role in studying the potential impact of DemandResponsive Transportation (DRT) and MobilityonDemand (MoD) systems, which are growing paradigms for oneway vehicle sharing where people drive (or are driven by) shared vehicles from a point of origin to a point of destination. While the ubiquitous physical transportation setting is the “road network”, characterized by systems of roads connected together by interchanges, most analytical works about vehicle sharing represent distances between points in a plane using the simple Euclidean metric. Instead, we consider the EMD when the ground metric is taken from a class of onedimensional, continuous metric spaces, reminiscent of road networks. We produce an “explicit ” formulation of the Earth mover’s distance given any finite road network R. The result generalizes the EMD with a Euclidean R1 ground metric, which had remained one of the only known nondiscrete cases with an explicit formula. Our formulation casts the EMD as the optimal value of a finitedimensional, realvalued optimization problem, with a convex objective function and linear constraints. In the special case that the input distributions have piecewise uniform (constant) density, the problem reduces to one whose objective function is convex quadratic. Both forms are amenable to modern mathematical programming techniques. I.
Sublinear Algorithms for Earth Mover's Distance
, 2009
"... We study the problem of estimating the Earth Mover's Distance (EMD) between probability distributions when given access only to samples. We give closeness testers and additiveerror estimators over domains in [0, A], with sample complexities independent of domain size permitting the testabilit ..."
Abstract
 Add to MetaCart
We study the problem of estimating the Earth Mover's Distance (EMD) between probability distributions when given access only to samples. We give closeness testers and additiveerror estimators over domains in [0, A], with sample complexities independent of domain size permitting the testability even of continuous distributions over infinite domains. Instead, our algorithms depend on other parameters, such as the diameter of the domain space, which may be significantly smaller. We also prove lower bounds showing our testers to be optimal in their dependence on these parameters. Additionally, we consider whether natural classes of distributions exist for which there are algorithms with better dependence on the dimension, and show that for highly clusterable data, this is indeed the case. Lastly, we consider a variant of the EMD, defined over tree metrics instead of the usual L 1 metric, and give optimal algorithms.
NNS lower bounds via metric expansion for l ∞ and EMD
"... Abstract. We give new lower bounds for randomized NNS data structures in the cell probe model based on robust metric expansion for two metric spaces: l ∞ and Earth Mover Distance (EMD) in high dimensions. In particular, our results imply stronger nonembedability for these metric spaces into l1. The ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. We give new lower bounds for randomized NNS data structures in the cell probe model based on robust metric expansion for two metric spaces: l ∞ and Earth Mover Distance (EMD) in high dimensions. In particular, our results imply stronger nonembedability for these metric spaces into l1. The main components of our approach are a strengthening of the isoperimetric inequality for the distribution on l ∞ introduced by Andoni et al [FOCS’08] and a robust isoperimetric inequality for EMD on quotients of the boolean hypercube. 1
Noname manuscript No. (will be inserted by the editor) Topk Queries on Temporal Data
"... Abstract The database community has devoted extensive amount of efforts to indexing and querying temporal data in the past decades. However, insufficient amount of attention has been paid to temporal ranking queries. More precisely, given any time instance t, the query asks for the topk objects at ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract The database community has devoted extensive amount of efforts to indexing and querying temporal data in the past decades. However, insufficient amount of attention has been paid to temporal ranking queries. More precisely, given any time instance t, the query asks for the topk objects at time t with respect to some score attribute. Some generic indexing structures based on Rtrees do support ranking queries on temporal data, but as they are not tailored for such queries, the performance is far from satisfactory. We present the Sebtree, a simple indexing scheme that supports temporal ranking queries much more efficiently. The Sebtree answers a topk query for any time instance t in the optimal number of I/Os in expectation, namely, N k O(logB B B) I/Os, where N is the size of the data set and B is the disk block size. The index has nearlinear size (for constant and reasonable kmax values, where kmax is the maximum value for the possible values of the query parameter k), can be constructed in nearlinear time, and also supports insertions and deletions without affecting its query performance guarantee. Most of all, the Sebtree is especially appealing in practice due to its simplicity as it uses the Btree as the only building block. Extensive experiments on a number of large data sets, show that the Sebtree is more than an order of magnitude faster than the Rtree based indexes for temporal ranking queries.
BERTINORO WORKSHOP PARTICIPANTS:
, 2011
"... ABSTRACT. This document contains a list of open problems and research directions that have been suggested ..."
Abstract
 Add to MetaCart
(Show Context)
ABSTRACT. This document contains a list of open problems and research directions that have been suggested
Improved Approximation Algorithms for EarthMover Distance in Data Streams
, 2014
"... For two multisets S and T of points in [∆]2, such that S  = T  = n, the earthmover distance (EMD) between S and T is the minimum cost of a perfect bipartite matching with edges between points in S and T, i.e., EMD(S, T) = minpi:S→T a∈S a−pi(a)1, where pi ranges over all onetoone mappin ..."
Abstract
 Add to MetaCart
(Show Context)
For two multisets S and T of points in [∆]2, such that S  = T  = n, the earthmover distance (EMD) between S and T is the minimum cost of a perfect bipartite matching with edges between points in S and T, i.e., EMD(S, T) = minpi:S→T a∈S a−pi(a)1, where pi ranges over all onetoone mappings. The sketching complexity of approximating earthmover distance in the twodimensional grid is mentioned as one of the open problems in [16, 11]. We give two algorithms for computing EMD between two multisets when the number of distinct points in one set is a small value k = logO(1)(∆n). Our first algorithm gives a (1 + )approximation using O(k−2 log4 n) space and works only in the insertiononly model. The second algorithm gives a O(min(k3, log∆))approximation using O(log3 ∆ · log log ∆ · logn)space in the turnstile model.