Results 1  10
of
23
Geometric approximation via coresets
 Combinatorial and Computational Geometry, MSRI
, 2005
"... Abstract. The paradigm of coresets has recently emerged as a powerful tool for efficiently approximating various extent measures of a point set P. Using this paradigm, one quickly computes a small subset Q of P, called a coreset, that approximates the original set P and and then solves the problem o ..."
Abstract

Cited by 60 (7 self)
 Add to MetaCart
Abstract. The paradigm of coresets has recently emerged as a powerful tool for efficiently approximating various extent measures of a point set P. Using this paradigm, one quickly computes a small subset Q of P, called a coreset, that approximates the original set P and and then solves the problem on Q using a relatively inefficient algorithm. The solution for Q is then translated to an approximate solution to the original point set P. This paper describes the ways in which this paradigm has been successfully applied to various optimization and extent measure problems. 1.
A SpaceOptimal DataStream Algorithm for Coresets in the Plane
"... Given a point set P ⊆ R², a subset Q ⊆ P is an εkernel of P if for every slab W containing Q, the (1+ε)expansion of W also contains P. We present a datastream algorithm for maintaining an εkernel of a stream of points in R² that uses O(1/√ε) space and takes O(log(1/ε)) amortized time to process ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
Given a point set P ⊆ R², a subset Q ⊆ P is an εkernel of P if for every slab W containing Q, the (1+ε)expansion of W also contains P. We present a datastream algorithm for maintaining an εkernel of a stream of points in R² that uses O(1/√ε) space and takes O(log(1/ε)) amortized time to process each point. This is the first spaceoptimal datastream algorithm for this problem. As a consequence, we obtain improved datastream approximation algorithms for other extent measures, such as width, robust kernels, as well as εkernels in higher dimensions.
Robust shape fitting via peeling and grating coresets
 In Proc. 17th ACMSIAM Sympos. Discrete Algorithms
, 2006
"... Let P be a set of n points in R d. A subset S of P is called a (k, ε)kernel if for every direction, the direction width of S εapproximates that of P, when k “outliers ” can be ignored in that direction. We show that a (k, ε)kernel of P of size O(k/ε (d−1)/2) can be computed in time O(n+k 2 /ε d−1 ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
Let P be a set of n points in R d. A subset S of P is called a (k, ε)kernel if for every direction, the direction width of S εapproximates that of P, when k “outliers ” can be ignored in that direction. We show that a (k, ε)kernel of P of size O(k/ε (d−1)/2) can be computed in time O(n+k 2 /ε d−1). The new algorithm works by repeatedly “peeling” away (0, ε)kernels from the point set. We also present a simple εapproximation algorithm for fitting various shapes through a set of points with at most k outliers. The algorithm is incremental and works by repeatedly “grating ” critical points into a working set, till the working set provides the required approximation. We prove that the size of the working set is independent of n, and thus results in a simple and practical, nearlinear εapproximation algorithm for shape fitting with outliers in low dimensions. We demonstrate the practicality of our algorithms by showing their empirical performance on various inputs and problems. 1
Metric functional dependencies
 In ICDE
, 2009
"... Abstract—When merging data from various sources, it is often the case that small variations in data format and interpretation cause traditional functional dependencies (FDs) to be violated, without there being an intrinsic violation of semantics. Examples include differing address formats, or differ ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Abstract—When merging data from various sources, it is often the case that small variations in data format and interpretation cause traditional functional dependencies (FDs) to be violated, without there being an intrinsic violation of semantics. Examples include differing address formats, or different reported latitude/longitudes for a given address. In this paper, we define metric functional dependencies, which strictly generalize traditional FDs by allowing small differences (controlled by a metric) in values of the consequent attribute of an FD. We present efficient algorithms for the verification problem: determining whether a given metric FD holds for a given relation. We experimentally demonstrate the validity and efficiency of our approach on various data sets that lie in multidimensional spaces. I.
Reverse Furthest Neighbors in Spatial Databases
"... Given a set of points P and a query point q, the reverse furthest neighbor (RFN) query fetches the set of points p ∈ P such that q is their furthest neighbor among all points in P ∪ {q}. This is the monochromatic RFN (MRFN) query. Another interesting version of RFN query is the bichromatic reverse f ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Given a set of points P and a query point q, the reverse furthest neighbor (RFN) query fetches the set of points p ∈ P such that q is their furthest neighbor among all points in P ∪ {q}. This is the monochromatic RFN (MRFN) query. Another interesting version of RFN query is the bichromatic reverse furthest neighbor (BRFN) query. Given a set of points P, a query set Q and a query point q ∈ Q, a BRFN query fetches the set of points p ∈ P such that q is the furthest neighbor of p among all points in Q. The RFN query has many interesting applications in spatial databases and beyond. For instance, given a large residential database (as P) and a set of potential sites (as Q) for building a chemical plant complex, the construction site should be selected as the one that has the maximum number of reverse furthest neighbors. This is an instance of the BRFN query. This paper presents the challenges associated with such queries and proposes efficient, Rtree based algorithms for both monochromatic and bichromatic versions of the RFN queries. We analyze properties of the RFN query that differentiate it from the widely studied reverse nearest neighbor queries and enable the design of novel algorithms. Our approach takes advantage of the furthest Voronoi diagrams as well as the convex hulls of either the data set P (in the MRFN case) or the query set Q (in the BRFN case). For the BRFN queries, we also extend the analysis to the situation when Q is large in size and becomes diskresident. Experiments on both synthetic and real data sets confirm the efficiency and scalability of proposed algorithms over the bruteforce search based approach.
Processing a Large Number of Continuous Preference Topk Queries ∗
"... Given a set of objects, each with multiple numeric attributes, a (preference) topk query retrieves the k objects with the highest scores according to a user preference, defined as a linear combination of attribute values. We consider the problem of processing a large number of continuous topk quer ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Given a set of objects, each with multiple numeric attributes, a (preference) topk query retrieves the k objects with the highest scores according to a user preference, defined as a linear combination of attribute values. We consider the problem of processing a large number of continuous topk queries, each with its own preference. Whenobjectsoruserpreferenceschange, thequeryresults must be updated. We present a dynamic index that supports the reverse topk query, which is of independent interest. Combining thisindexwithanotheronefortopkqueries,wedevelopascalable solutionforprocessingmanycontinuoustopkqueriesthatexploits the clusteredness in user preferences. We also define an approximate version of the problem and present a solution significantly more efficient than the exact one with little loss in accuracy.
Shape Fitting on Point Sets with Probability Distributions
"... Abstract. We consider problems on data sets where each data point has uncertainty described by an individual probability distribution. We develop several frameworks and algorithms for calculating statistics on these uncertain data sets. Our examples focus on geometric shape fitting problems. We prov ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
Abstract. We consider problems on data sets where each data point has uncertainty described by an individual probability distribution. We develop several frameworks and algorithms for calculating statistics on these uncertain data sets. Our examples focus on geometric shape fitting problems. We prove approximation guarantees for the algorithms with respect to the full probability distributions. We then empirically demonstrate that our algorithms are simple and practical, solving for a constant hidden by asymptotic analysis so that a user can reliably trade speed and size for accuracy. 1
Group Enclosing Queries
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
"... Given a set of points P and a query set Q, a group enclosing query (GEQ) fetches the point p ∗ ∈ P such that the maximum distance of p ∗ to all points in Q is minimized. This problem is equivalent to the MinMax case (minimizing the maximum distance) of aggregate nearest neighbor queries for spatia ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Given a set of points P and a query set Q, a group enclosing query (GEQ) fetches the point p ∗ ∈ P such that the maximum distance of p ∗ to all points in Q is minimized. This problem is equivalent to the MinMax case (minimizing the maximum distance) of aggregate nearest neighbor queries for spatial databases [27]. This work first designs a new exact solution by exploring new geometric insights, such as the minimum enclosing ball, the convex hull and the furthest voronoi diagram of the query group. To further reduce the query cost, especially when the dimensionality increases, we turn to approximation algorithms. Our main approximation algorithm has a worst case √ 2approximation ratio if one can find the exact nearest neighbor of a point. In practice, its approximation ratio never exceeds 1.05 for a large number of data sets up to six dimension. We also discuss how to extend it to higher dimensions (up to 74 in our experiment) and show that it still maintains a very good approximation quality (still close to 1) and low query cost. In fixed dimensions, we extend the √ 2approximation algorithm to get a (1 + ǫ)approximate solution for the GEQ problem. Both approximation algorithms have O(log N + M) query cost in any fixed dimension, where N and M are the sizes of the data set P and query group Q. Extensive experiments on both synthetic and real data sets, up to 10 million points and 74 dimensions, confirm the efficiency, effectiveness and scalability of the proposed algorithms, especially their significant improvement over the stateoftheart method.
Mergeable Summaries
"... We study the mergeability of data summaries. Informally speaking, mergeability requires that, given two summaries on two data sets, there is a way to merge the two summaries into a single summary on the union of the two data sets, while preserving the error and size guarantees. This property means t ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We study the mergeability of data summaries. Informally speaking, mergeability requires that, given two summaries on two data sets, there is a way to merge the two summaries into a single summary on the union of the two data sets, while preserving the error and size guarantees. This property means that the summaries can be merged in a way like other algebraic operators such as sum and max, which is especially useful for computing summaries on massive distributed data. Several data summaries are trivially mergeable by construction, most notably all the sketches that are linear functions of the data sets. But some other fundamental ones like those for heavy hitters and quantiles, are not (known to be) mergeable. In this paper, we demonstrate that these summaries are indeed mergeable or can be made mergeable after appropriate modifications. Specifically, we show that for εapproximate heavy hitters, there is a deterministic mergeable summary of size O(1/ε); for εapproximate quantiles, there is a deterministic summary of size O ( 1 log(εn)) that has a restricted form of mergeability, ε and a randomized one of size O ( 1 1 log3/2) with full mergeε ε ability. We also extend our results to geometric summaries such as εapproximations and εkernels. We also achieve two results of independent interest: (1) we provide the best known randomized streaming bound for εapproximate quantiles that depends only on ε, of size O ( 1 1 log3/2), and (2) we demonstrate that the MG and the ε ε SpaceSaving summaries for heavy hitters are isomorphic. Supported by NSF under grants CNS0540347, IIS07
Efficient tradeoff schemes in data structures for querying moving objects
 In Proc. European Symposium on Algorithms, LNCS 3221
, 2004
"... The ability to represent and query continuously moving objects is important in many applications of spatiotemporal database systems. In this paper we develop data structures for answering various queries on moving objects, including range and proximity queries, and study tradeoffs between various p ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
The ability to represent and query continuously moving objects is important in many applications of spatiotemporal database systems. In this paper we develop data structures for answering various queries on moving objects, including range and proximity queries, and study tradeoffs between various performance measures—query time, data structure size, and accuracy of results. 1