## Shape Fitting on Point Sets with Probability Distributions

Citations: | 7 - 5 self |

### BibTeX

@MISC{Löffler_shapefitting,

author = {Maarten Löffler and Jeff M. Phillips},

title = {Shape Fitting on Point Sets with Probability Distributions},

year = {}

}

### OpenURL

### Abstract

Abstract. We consider problems on data sets where each data point has uncertainty described by an individual probability distribution. We develop several frameworks and algorithms for calculating statistics on these uncertain data sets. Our examples focus on geometric shape fitting problems. We prove approximation guarantees for the algorithms with respect to the full probability distributions. We then empirically demonstrate that our algorithms are simple and practical, solving for a constant hidden by asymptotic analysis so that a user can reliably trade speed and size for accuracy. 1

### Citations

1000 |
On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probabability and its Applications
- Vapnik, Chervonenkis
- 1971
(Show Context)
Citation Context ...er is the VC-dimension of (P, A). When (P, A) has constant VC-dimension ν, we can create an ε-sample Q of (P, A), with probability 1 − δ, by uniformly sampling O((1/ε 2 )(ν + log(1/δ))) points from P =-=[25, 16]-=-. There exist deterministic techniques to create ε-samples [19, 9] of size O(ν(1/ε 2 ) log(1/ε)) in time O(ν 3ν n((1/ε 2 ) log(ν/ε)) ν ). When P is a 2 For technical reasons, if there are (degeneratel... |

378 | Model-driven data acquisition in sensor networks
- Deshpande, Guestrin, et al.
- 2004
(Show Context)
Citation Context ... often perturbed by a known model to preserve anonymity. In spatial databases [20, 37, 13] large data sets may be summarized as probability distributions to store them more compactly. Sensor networks =-=[15]-=- stream in large data sets collected by cheap and thus inaccurate sensors. In protein structure determination [35] every atom’s position is imprecise due to inaccuracies in reconstruction techniques a... |

270 | Epsilon-nets and simplex range queries
- Haussler, Welzl
- 1987
(Show Context)
Citation Context ...le, be used to construct (with probability 1−δ) an (ε, α)quantization for the fraction of points expected to fall in a query disk. Similar constructions can be done for other coresets, such as ε-nets =-=[15]-=-, k-center [5], or smallest enclosing ball [7]. 2.3 Shape Inclusion Probabilities For a point set Q ⊂ Rd , let the summarizing shape SQ = S(Q) be from some geometric family S so (Rd, S) has bounded VC... |

107 |
Moving Objects Databases
- RH, Schneider
- 2005
(Show Context)
Citation Context ...athered by laser range finders and other sources. In data mining [1, 5] original data (such as published medical data) are often perturbed by a known model to preserve anonymity. In spatial databases =-=[20, 37, 13]-=- large data sets may be summarized as probability distributions to store them more compactly. Sensor networks [15] stream in large data sets collected by cheap and thus inaccurate sensors. In protein ... |

105 | Approximating extent measures of points
- Agarwal, Har-Peled, et al.
(Show Context)
Citation Context ...Let ω(P, u) = 〈P [u] − P [−u], u〉 describe the width of P in direction u. We say that K ⊆ P is an α-kernel of P if for all u ∈ S d−1 ω(P, u) − ω(K, u) ≤ α · ω(P, u). α-kernels of size O(1/α (d−1)/2 ) =-=[4]-=- can be calculated in time O(n + 1/α d−3/2 ) [8, 26]. Computing many extent related problems such as diameter and smallest enclosing ball on K approximates the problem on P [4, 3, 8]. 2 Randomized Alg... |

103 | Epsilon geometry: Building robust algorithms for imprecise computations
- Guibas, Salesin, et al.
- 1989
(Show Context)
Citation Context ...spect to the raw input data and the error models associated with them.Geometric error models. An early model for imprecise geometric data, motivated by finite precision of coordinates, is ε-geometry =-=[14]-=-, where each data point is known to lie within a ball of radius ε. This models has been used to study the robustness of problems such as the Delaunay triangulation [6, 18]. This model has been extende... |

98 | On linear-time deterministic algorithms for optimization problems in fixed dimension
- CHAZELLE, MATOUŠEK
- 1993
(Show Context)
Citation Context ...sion ν, we can create an ε-sample Q of (P, A), with probability 1 − δ, by uniformly sampling O((1/ε 2 )(ν + log(1/δ))) points from P [25, 16]. There exist deterministic techniques to create ε-samples =-=[19, 9]-=- of size O(ν(1/ε 2 ) log(1/ε)) in time O(ν 3ν n((1/ε 2 ) log(ν/ε)) ν ). When P is a 2 For technical reasons, if there are (degenerately) multiple optimal summarizing shapes, we say each is equally lik... |

75 |
Robotic Mapping: A Survey”, Exploring
- Thrun
- 2002
(Show Context)
Citation Context ...uld represent each data point as a 3-variate normal distribution centered at its recorded value. Similarly, large data sets are gathered and maintained for many other applications. In robotic mapping =-=[39, 16]-=- error models are provided for data points gathered by laser range finders and other sources. In data mining [1, 5] original data (such as published medical data) are often perturbed by a known model ... |

72 | core-set constructions and data stream algorithms in fixed dimensions
- CHAN, “Faster
(Show Context)
Citation Context ...idth of P in direction u. We say that K ⊆ P is an α-kernel of P if for all u ∈ S d−1 ω(P, u) − ω(K, u) ≤ α · ω(P, u). α-kernels of size O(1/α (d−1)/2 ) [4] can be calculated in time O(n + 1/α d−3/2 ) =-=[8, 26]-=-. Computing many extent related problems such as diameter and smallest enclosing ball on K approximates the problem on P [4, 3, 8]. 2 Randomized Algorithm for ε-Quantizations We develop several algori... |

55 | Geometric Discrepancy: An Illustrated Guide
- Matoušek
- 1999
(Show Context)
Citation Context ... that Ap = {A ∈ A | p ∈ A}. If (P, A) has VC-dimension ν, then (A, P ∗ ) has VC-dimension ≤ 2ν+1 . Thus, if the VC-dimension of (A, P ∗ ) is constant, then the VC-dimension of (P, A) is also constant =-=[20]-=-. When we have a distribution µ : Rd → R +, such that ∫ µ(x) dx = 1, we x∈R can think of this as the set P of all points in Rd , where the weight w of a point p ∈ Rd is µ(p). To simplify notation, we ... |

54 | Approximating center points with iterative Radon points - CLARKSON, EPPSTEIN, et al. - 1996 |

50 |
Approximations and optimal geometric divide-and-conquer
- Matousek
- 1991
(Show Context)
Citation Context ...sion ν, we can create an ε-sample Q of (P, A), with probability 1 − δ, by uniformly sampling O((1/ε 2 )(ν + log(1/δ))) points from P [25, 16]. There exist deterministic techniques to create ε-samples =-=[19, 9]-=- of size O(ν(1/ε 2 ) log(1/ε)) in time O(ν 3ν n((1/ε 2 ) log(ν/ε)) ν ). When P is a 2 For technical reasons, if there are (degenerately) multiple optimal summarizing shapes, we say each is equally lik... |

49 | Semantics of ranking queries for probabilistic data and expected ranks
- Cormode, Li, et al.
- 2009
(Show Context)
Citation Context ...o only discretize, as desired, the probability distribution of the final solution. The database community has focused on similar problems for usually onedimensional data such as indexing [2], ranking =-=[11]-=-, and creating histograms [10]. 1.1 Problem Statement Let µp : Rd → R + describe the probability distribution of a point p where the integral ∫ q∈Rd µp(q) dq = 1. Let µP : Rd × Rd × . . . × Rd → R + d... |

48 | Smaller core-sets for balls
- BĂDOIU, CLARKSON
- 2003
(Show Context)
Citation Context ... an (ε, α)quantization for the fraction of points expected to fall in a query disk. Similar constructions can be done for other coresets, such as ε-nets [15], k-center [5], or smallest enclosing ball =-=[7]-=-. 2.3 Shape Inclusion Probabilities For a point set Q ⊂ Rd , let the summarizing shape SQ = S(Q) be from some geometric family S so (Rd, S) has bounded VC-dimension ν. We randomly sample m point sets ... |

47 | Improved bounds on the sample complexity of learning
- Li, Long, et al.
- 2000
(Show Context)
Citation Context ...er is the VC-dimension of (P, A). When (P, A) has constant VC-dimension ν, we can create an ε-sample Q of (P, A), with probability 1 − δ, by uniformly sampling O((1/ε 2 )(ν + log(1/δ))) points from P =-=[25, 16]-=-. There exist deterministic techniques to create ε-samples [19, 9] of size O(ν(1/ε 2 ) log(1/ε)) in time O(ν 3ν n((1/ε 2 ) log(ν/ε)) ν ). When P is a 2 For technical reasons, if there are (degeneratel... |

41 | Fast and robust smallest enclosing balls
- GÄRTNER
- 1999
(Show Context)
Citation Context ...he points set in a direction that makes an angle of 75◦ with the cylinder axis; diam, the diameter of the point set; and seb2, the radius of the smallest enclosing ball (using code from Bernd Gärtner =-=[13]-=-). We can create ε-quantizations with m samples from µP , where the value of m is from the set {16, 64, 256, 1024, 4096}. We would like to evaluate the ε-quantizations versus the ground truth function... |

38 |
Dp-slam 2.0
- Eliazar, Parr
- 2004
(Show Context)
Citation Context ...ht). Greatly simplifying, we could model the uncertainty as a 3-variate normal distribution centered at its recorded value. Similarly, large data sets are gathered with uncertainty in robotic mapping =-=[12]-=-, anonymized medical data [1], spatial databases [23], sensor networks [17], and many other areas. However, much raw data is not immediately given as a set of probability distributions, rather as a se... |

33 | Almost-Delaunay simplices: Nearest neighbor relations for imprecise points
- Bandyopadhyay, Snoeyink
(Show Context)
Citation Context ...n of coordinates, is ε-geometry [14], where each data point is known to lie within a ball of radius ε. This models has been used to study the robustness of problems such as the Delaunay triangulation =-=[6, 18]-=-. This model has been extended to allow different uncertainty regions around each point for object intersection [21] and shape-fitting problems [24]. These approaches give worst case bounds on error, ... |

32 | Approximation algorithms for k-line center
- Agarwal, Procopiuc, et al.
- 2002
(Show Context)
Citation Context ...construct (with probability 1−δ) an (ε, α)quantization for the fraction of points expected to fall in a query disk. Similar constructions can be done for other coresets, such as ε-nets [15], k-center =-=[5]-=-, or smallest enclosing ball [7]. 2.3 Shape Inclusion Probabilities For a point set Q ⊂ Rd , let the summarizing shape SQ = S(Q) be from some geometric family S so (Rd, S) has bounded VC-dimension ν. ... |

28 | Practical Methods for Shape Fitting and Kinetic Data Structures using Core Sets
- AGARWAL, POREDDY, et al.
(Show Context)
Citation Context ...idth of P in direction u. We say that K ⊆ P is an α-kernel of P if for all u ∈ S d−1 ω(P, u) − ω(K, u) ≤ α · ω(P, u). α-kernels of size O(1/α (d−1)/2 ) [4] can be calculated in time O(n + 1/α d−3/2 ) =-=[8, 26]-=-. Computing many extent related problems such as diameter and smallest enclosing ball on K approximates the problem on P [4, 3, 8]. 2 Randomized Algorithm for ε-Quantizations We develop several algori... |

23 | A theorem on general measure - Rado - 1946 |

22 |
Constructing strongly convex approximate hulls with inaccurate primitives
- Guibas, Salesin, et al.
- 1990
(Show Context)
Citation Context ...ertain that for each point in P there is a point in the disk of radius ε around it. This model has proven fruitful and is still often used due to its simplicity. To name a few examples, Guibas et al. =-=[19]-=- define strongly convex polygons: polygons that are guaranteed to stay convex, even when the vertices are perturbed by ε. Bandyopadhyay and Snoeyink [7] compute the set of all potential simplices in R... |

22 |
Discrepancy and approximations for bounded VC-dimension
- Matoušek, Welzl, et al.
- 1993
(Show Context)
Citation Context ...points from P [41, 26]. There exist deterministic techniques to create ε-samples [29, 11] of size O(ν(1/ε 2 ) log(1/ε)) in time O(ν 3ν n((1/ε 2 ) log(ν/ε)) ν ). There exist ε-samples of smaller sizes =-=[31]-=-, but direct, efficient constructions are not known. When P is a point set in R d and the family of ranges Qk is determined by inclusion of convex shapes whose sides have one of k predefined normal di... |

20 | Delaunay triangulation of imprecise points in linear time after preprocessing
- Löffler, Snoeyink
(Show Context)
Citation Context ...n of coordinates, is ε-geometry [14], where each data point is known to lie within a ball of radius ε. This models has been used to study the robustness of problems such as the Delaunay triangulation =-=[6, 18]-=-. This model has been extended to allow different uncertainty regions around each point for object intersection [21] and shape-fitting problems [24]. These approaches give worst case bounds on error, ... |

14 | No coreset, no cry
- Har-Peled
- 2004
(Show Context)
Citation Context ...struct (with probability 1 − δ) an (ε, α)-quantization for the fraction of points expected to fall in a query disk. Similar constructions can be done for other coresets, such as ε-nets [22], k-center =-=[4, 21]-=-, or smallest enclosing ball [8]. 2.3 Shape Inclusion Probabilities We can also use a variation of Algorithm 2.1 to construct ε-shape inclusion probability functions. For a point set Q ⊂ Rd , let the ... |

14 |
Triangulating input-constrained planar point sets
- Held, Mitchell
- 2008
(Show Context)
Citation Context ... even when the vertices are perturbed by ε. Bandyopadhyay and Snoeyink [7] compute the set of all potential simplices in R 2 and R 3 that could belong to the Delaunay triangulation. Held and Mitchell =-=[23]-=- and Löffler and Snoeyink [28] study the problem of preprocessing a set of imprecise points under this model, so that when the true points are specified later some computation can be done faster. A mo... |

12 | Indexing uncertain data
- Agarwal, Cheng, et al.
- 2009
(Show Context)
Citation Context ...tainty, and to only discretize, as desired, the probability distribution of the final solution. The database community has focused on similar problems for usually onedimensional data such as indexing =-=[2]-=-, ranking [11], and creating histograms [10]. 1.1 Problem Statement Let µp : Rd → R + describe the probability distribution of a point p where the integral ∫ q∈Rd µp(q) dq = 1. Let µP : Rd × Rd × . . ... |

12 |
Tight error bounds of geometric problems on convex objects with imprecise coordinates
- Nagai, Tokura
- 2001
(Show Context)
Citation Context ...been used to study the robustness of problems such as the Delaunay triangulation [6, 18]. This model has been extended to allow different uncertainty regions around each point for object intersection =-=[21]-=- and shape-fitting problems [24]. These approaches give worst case bounds on error, for instance upper and lower bounds on the radius of the minimum enclosing ball. But when uncertainty is given as a ... |

10 | Algorithms for ε-approximations of terrains
- Phillips
- 2008
(Show Context)
Citation Context ...nt set in Rd and the family of ranges Rd is determined by inclusion in axisaligned boxes, then an ε-sample for (P, Rd) of size O((d/ε) log 2d (1/ε)) can be constructed in O((n/ε3 ) log 6d (1/ε)) time =-=[22]-=-. For a range space (P, A) the dual range space is defined (A, P ∗ ) where P ∗ is all subsets Ap ⊆ A defined for an element p ∈ P such that Ap = {A ∈ A | p ∈ A}. If (P, A) has VC-dimension ν, then (A,... |

9 | Structure determination of symmetric homo-oligomers by a complete search of symmetry configuration space, using NMR restraints and van der Waals packing,” Proteins: Struct
- Potluri, Yan, et al.
- 2006
(Show Context)
Citation Context ...summarized as probability distributions to store them more compactly. Sensor networks [15] stream in large data sets collected by cheap and thus inaccurate sensors. In protein structure determination =-=[35]-=- every atom’s position is imprecise due to inaccuracies in reconstruction techniques and the inherent flexibility in the protein. In summary, there are many large data sets with modeled errors and dyn... |

8 |
Dynamic coresets
- Chan
- 2008
(Show Context)
Citation Context ... can be calculated in time O(n + 1/α d−3/2 ) [9, 42]. Computing many extent related problems such as diameter and smallest enclosing ball on the α-kernel approximates the function on the original set =-=[3, 2, 9, 10]-=-. 2 Randomized Algorithm for ε-Quantizations We start with a general algorithm (Algorithm 2.1) which will be made specific in several places in the paper. We only assume that we can draw a random poin... |

8 |
Uncertainty envelopes
- Ostrovsky-Berman, Joskowicz
(Show Context)
Citation Context ...ill many results are available. Nagai and Tokura [32] compute the union and intersection of all possible convex hulls to obtain bounds on any possible solution, as does Ostrovsky-Berman and Joskowicz =-=[33]-=- in a setting allowing some dependence between points. Van Kreveld and Löffler [40] study the problem of computing the smallest and largest possible values of several geometric extent measures, such a... |

5 | Deterministic algorithms for sampling count data
- Akcan, Astashyn, et al.
(Show Context)
Citation Context ... (k/ε)) time [11], since the VC-dimension is k and each data point requires O(k) storage. However, we do not investigate the empirical performance of these deterministic algorithms in this paper. See =-=[6]-=- for an empirical study of alternatives to [11]. 2.2 (ε, δ, α)-Kernels The above construction works for a fixed family of summarizing shapes. In this section, we show how to build a single data struct... |

5 |
Evaluating probabilitic queries over imprecise data
- Cheng, Kalashnikov, et al.
- 2003
(Show Context)
Citation Context ...athered by laser range finders and other sources. In data mining [1, 5] original data (such as published medical data) are often perturbed by a known model to preserve anonymity. In spatial databases =-=[20, 37, 13]-=- large data sets may be summarized as probability distributions to store them more compactly. Sensor networks [15] stream in large data sets collected by cheap and thus inaccurate sensors. In protein ... |

5 |
Basic measures for imprecise point sets
- Kruger
- 2008
(Show Context)
Citation Context ...t possible values of several geometric extent measures, such as the diameter or the radius of the smallest enclosing ball, where the points are restricted to lie in given regions in the plane. Kruger =-=[25]-=- extends some of these results to higher dimensions. These models, in general, give worst case bounds on error, for instance upper and lower bounds on the radius of the minimum enclosing ball. When th... |

5 |
Geometric Discrepancy: An Illustrated Guide, volume 18
- Matoušek
- 1999
(Show Context)
Citation Context ...that Ap = {A ∈ A | p ∈ A}. If (P, A) has VC-dimension ν, then (A, P ∗ ) has VC-dimension ≤ 2 ν+1 . Thus, if the VC-dimension of (A, P ∗ ) is constant, then the VC-dimension of (P, A) is also constant =-=[30]-=-. Hence, the standard ε-sample theorems apply to dual range spaces as well. When we have a distribution µ : Rd → R +, such that ∫ x∈R µ(x) dx = 1, we can think of this as the set P of all points in Rd... |

3 |
Sariel Har-Peled, and Kasturi Varadarajan. Geometric approximations via coresets
- Agarwal
- 2007
(Show Context)
Citation Context ...of size O(1/α (d−1)/2 ) [4] can be calculated in time O(n + 1/α d−3/2 ) [8, 26]. Computing many extent related problems such as diameter and smallest enclosing ball on K approximates the problem on P =-=[4, 3, 8]-=-. 2 Randomized Algorithm for ε-Quantizations We develop several algorithms with the following basic structure: (1) sample one point from each distribution to get a random point set; (2) construct the ... |

3 |
Remote Sensing and Image Interpretaion
- Lillesand, Kiefer, et al.
- 2004
(Show Context)
Citation Context ...al imprecision. However, often as a benefit from the large data sets, models are developed to describe the pattern of the data error. For instance, in the gathering of LIDAR data for GIS applications =-=[17]-=-, each data point of a terrain can have error in its x- (longitude), y- (latitude) and z-coordinates (height). Greatly simplifying, we could model the uncertainty as a 3-variate normal distribution ce... |

3 |
Spatial Databases: A Tour. Pearsons
- Shekhar, Chawla
- 2001
(Show Context)
Citation Context ...nty as a 3-variate normal distribution centered at its recorded value. Similarly, large data sets are gathered with uncertainty in robotic mapping [12], anonymized medical data [1], spatial databases =-=[23]-=-, sensor networks [17], and many other areas. However, much raw data is not immediately given as a set of probability distributions, rather as a set of points. Approximate algorithms may treat this da... |

3 | Vapnik and Alexey Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications - Vladimir - 1971 |

2 |
Kreveld and Maarten Löffler. Largest bounding box, smallest diameter, and related problems on imprecise points
- van
- 2007
(Show Context)
Citation Context ...s of problems such as the Delaunay triangulation [6, 18]. This model has been extended to allow different uncertainty regions around each point for object intersection [21] and shape-fitting problems =-=[24]-=-. These approaches give worst case bounds on error, for instance upper and lower bounds on the radius of the minimum enclosing ball. But when uncertainty is given as a probability distribution, then t... |

2 |
Probability verifiers: Evaluating constrainted nearestneighbor queries over uncertain data
- Cheng, Chen, et al.
- 2008
(Show Context)
Citation Context ...nal distributions, not an approximation of them. This model of uncertain data has been studied in the database community but for different types of problems (e.g. indexing[38, 24] and nearest neighbor=-=[12]-=-) and approximation guarantees. We focus on computing statistics on uncertain point sets, specifically shape fitting problems in a way that allows the uncertain data problem to be reduced to well-stud... |

2 |
Index for fast retreival of uncertain spatial point data
- Kalashnikov, Ma, et al.
- 2008
(Show Context)
Citation Context ...ees with respect to the original distributions, not an approximation of them. This model of uncertain data has been studied in the database community but for different types of problems (e.g. indexing=-=[38, 24]-=- and nearest neighbor[12]) and approximation guarantees. We focus on computing statistics on uncertain point sets, specifically shape fitting problems in a way that allows the uncertain data problem t... |

2 | Chazelle and Jiri Matousek. On linear-time deterministic algorithms for optimization problems in fixed dimensions - Bernard - 1996 |

1 |
Cormode and Minos Garafalakis. Histograms and wavelets of probabilitic data
- Graham
- 2009
(Show Context)
Citation Context ... the probability distribution of the final solution. The database community has focused on similar problems for usually onedimensional data such as indexing [2], ranking [11], and creating histograms =-=[10]-=-. 1.1 Problem Statement Let µp : Rd → R + describe the probability distribution of a point p where the integral ∫ q∈Rd µp(q) dq = 1. Let µP : Rd × Rd × . . . × Rd → R + describe the distribution of a ... |

1 |
Indexing mutli-dimensional uncertain data with arbitrary probability density functions
- Tao, Cheng, et al.
- 2005
(Show Context)
Citation Context ...ees with respect to the original distributions, not an approximation of them. This model of uncertain data has been studied in the database community but for different types of problems (e.g. indexing=-=[38, 24]-=- and nearest neighbor[12]) and approximation guarantees. We focus on computing statistics on uncertain point sets, specifically shape fitting problems in a way that allows the uncertain data problem t... |