Results 1 -
5 of
5
Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension
, 1994
"... We propose the concept of fractal dimension of a set of points, in order to quantify the deviation from the uniformity distribution. Using measurements on real data sets (road intersections of U.S. counties, star coordinates from NASA's Infrared-Ultraviolet Explorer etc.) we provide evidence that re ..."
Abstract
-
Cited by 136 (16 self)
- Add to MetaCart
We propose the concept of fractal dimension of a set of points, in order to quantify the deviation from the uniformity distribution. Using measurements on real data sets (road intersections of U.S. counties, star coordinates from NASA's Infrared-Ultraviolet Explorer etc.) we provide evidence that real data indeed are skewed, and, moreover, we show that they behave as mathematical fractals, with a measurable, non-integer fractal dimension. Armed with this tool, we then show its practical use in predicting the performance of spatial access methods, and specifically of the R-trees. We provide the first analysis of R-trees for skewed distributions of points: We develop a formula that estimates the number of disk accesses for range queries, given only the fractal dimension of the point set, and its count. Experiments on real data sets show that the formula is very accurate: the relative error is usually below 5%, and it rarely exceeds 10%. We believe that the fractal dimension will help rep...
Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension
, 1995
"... We examine the estimation of selectivities for range and spatial join queries in real spatial databases. As we have shown earlier [FK94a], real point sets: (a) violate consistently the "uniformity" and "independence" assumptions, (b) can often be described as "fractals", with non-integer (fractal) d ..."
Abstract
-
Cited by 112 (15 self)
- Add to MetaCart
We examine the estimation of selectivities for range and spatial join queries in real spatial databases. As we have shown earlier [FK94a], real point sets: (a) violate consistently the "uniformity" and "independence" assumptions, (b) can often be described as "fractals", with non-integer (fractal) dimension. In this paper we show that, among the infinite family of fractal dimensions, the so called "Correlation Dimension" D 2 is the one that we need to predict the selectivity of spatial join. The main contribution is that, for all the real and synthetic point-sets we tried, the average number of neighbors for a given point of the point-set follows a power law, with D 2 as the exponent. This immediately solves the selectivity estimation for spatial joins, as well as for "biased" range queries (i.e., queries whose centers prefer areas of high point density). We present the formulas to estimate the selectivity for the biased queries, including an integration constant (K `shape 0 ) for ea...
A Population Analysis for Hierarchical Data Structures
- In Proceedings of the SIGMOD Conference
, 1987
"... A new method termed population analysis 1s presented for approxlmatmg the dlstrlbutlon of node occupancies m hierarchical data structures which store a variable number of geometric data items per node The basic idea 1s to describe a dynamic data structure as a set of populations which are per-mitted ..."
Abstract
-
Cited by 31 (19 self)
- Add to MetaCart
A new method termed population analysis 1s presented for approxlmatmg the dlstrlbutlon of node occupancies m hierarchical data structures which store a variable number of geometric data items per node The basic idea 1s to describe a dynamic data structure as a set of populations which are per-mitted to transform mto one another according to certain rules The transformation rules are used to obtam a set of equations describing a population dlstrlbutlon which 1s stable under msertion of addttional mformation mto the structure These equations can then be solved, &her analytically or numerlcally, to obtain the population distribution Hierarclu-cal data structures are modeled by letting each population represent the nodes of a given occupancy A detailed analysis of quadtree data structures for storing point data IS presented, and the results are compared to experimental data Two phenomena referred to as agang and phasmg are defined and shown to account for the differences between the expert-mental results and those predicted by the model The popu-lation techmque IS compared with statistical methods of analyzing smular data structures CR Categories and Subject Descriptors E 1 [Data] Data Structures- trees, F 2 2 [Theory of Computation] Analysis of nonnumernzal algorithms and problems-Geometrical problems and computations, H 3 3 [ Informa-tion Storage and Retrieval] Content Analysis and Index-mg- mdexmg methods Key words and phrases file structures, bucketing methods, multidimensional attributes, hierarchical data structures, quadtrees Pernusslon to copy without fee all or part of this material IS granted provided that the copies are not made or chstrlbuted for direct commercial advantage, the ACM copyright notice and the title of the publication and Its date appear, and notlcc 1s given that copymg
A Probabilistic Analysis of Trie-Based Sorting of Large Collections of Line Segments in Spatial Databases
, 2000
"... The size of five trie-based methods of sorting large collections of line segments in a spatial database is investigated analytically using a random lines image model and geometric probability techniques. The methods are based on sorting the line segments with respect to the space that they occupy. ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
The size of five trie-based methods of sorting large collections of line segments in a spatial database is investigated analytically using a random lines image model and geometric probability techniques. The methods are based on sorting the line segments with respect to the space that they occupy. Since the space is two-dimensional, the trie is formed by interleaving the bits corresponding to the binary representation of the x and y coordinates of the underlying space and then testing two bits at each iteration. The result of this formulation yields a class of representations that are referred to as quadtrie variants, although they have been traditionally referred to as quadtree variants. The analysis differs from prior work in that it uses a detailed explicit model of the image instead of relying on modeling the branching process represented by the tree and leaving the underlying image unspecified. The analysis provides analytic expressions and bounds on the expected size of these quadtree variants. This enables the prediction of storage required by the representations and of the associated performance of algorithms that rely on them. The results are useful in two ways: 1. They reveal the properties of the various representations and permit their comparison using analytic, non-experimental, criteria. Some of the results confirm previous analyses (e.g., that the storage requirement of the MX quadtree is proportional to the total lengths of the line segments). An important new result is that for a PMR and Bucket PMR quadtree with sufficiently high values of the splitting threshold (i.e., # 4) the number of nodes is proportional to the numberof line segments and is independent of the maximum depth of the tree. This provides a theoretical justification for ...
Node Distribution in a PR Quadtree
- in Proceedings 1st International Symposium on Large Spatial Databases
, 1989
"... A method, termed approximate splitting, is proposed to model the node distri-bution that results when the PR quadtree is used to store point data drawn from a uni-form distribution. This method can account for the aging and phasing phenomena which are common in most hierarchical data structures. App ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
A method, termed approximate splitting, is proposed to model the node distri-bution that results when the PR quadtree is used to store point data drawn from a uni-form distribution. This method can account for the aging and phasing phenomena which are common in most hierarchical data structures. Approximate splitting is also shown to be capable of being adapted to model the node distribution of the PR quad-tree with points drawn from a known non-uniform distribution.

