## Analysis of the clustering properties of the Hilbert space-filling curve (2001)

### Cached

### Download Links

Venue: | IEEE Transactions on Knowledge and Data Engineering |

Citations: | 153 - 11 self |

### BibTeX

@ARTICLE{Moon01analysisof,

author = {Bongki Moon and H. V. Jagadish and Christos Faloutsos and Joel H. Saltz},

title = {Analysis of the clustering properties of the Hilbert space-filling curve},

journal = {IEEE Transactions on Knowledge and Data Engineering},

year = {2001},

volume = {13},

pages = {2001}

}

### Years of Citing Articles

### OpenURL

### Abstract

AbstractÐSeveral schemes for the linear mapping of a multidimensional space have been proposed for various applications, such as access methods for spatio-temporal databases and image compression. In these applications, one of the most desired properties from such linear mappings is clustering, which means the locality between objects in the multidimensional space being preserved in the linear space. It is widely believed that the Hilbert space-filling curve achieves the best clustering [1], [14]. In this paper, we analyze the clustering property of the Hilbert space-filling curve by deriving closed-form formulas for the number of clusters in a given query region of an arbitrary shape (e.g., polygons and polyhedra). Both the asymptotic solution for the general case and the exact solution for a special case generalize previous work [14]. They agree with the empirical results that the number of clusters depends on the hypersurface area of the query region and not on its hypervolume. We also show that the Hilbert curve achieves better clustering than the z curve. From a practical point of view, the formulas given in this paper provide a simple measure that can be used to predict the required disk access behaviors and, hence, the total access time.

### Citations

478 | An introduction to disk drive modeling
- Ruemmler, Wilkes
- 1994
(Show Context)
Citation Context ...ecified in the range query, it is reasonable to assume that the set of blocks fetched can be rearranged into a number of groups of consecutive blocks by a database server or disk controller mechanism =-=[25]-=-. Since it is more efficient to fetch a set of consecutive disk blocks rather than a randomly scattered set in order to reduce additional seek time, it is desirable that objects close together in a mu... |

203 |
Linear clustering of objects with multiple attributes
- Jagadish
- 1990
(Show Context)
Citation Context ...ng, which means the locality between objects in the multidimensional space being preserved in the linear space. It is widely believed that the Hilbert space-filling curve achieves the best clustering =-=[1, 14]-=-. In this paper, we analyze the clustering property of the Hilbert space-filling curve by deriving closed-form formulas for the number of clusters in a given query region of an arbitrary shape (e.g., ... |

179 |
Spatial Query Processing in an Object-Oriented Database System
- Orenstein
- 1986
(Show Context)
Citation Context ...thod (e.g., B + -tree), which may yield good performance for multidimensional queries. An interesting application of the ordering arises in a multidimensional indexing technique proposed by Orenstein =-=[19]-=-. The idea is to develop a single numeric index on a one-dimensional space for each point in a multidimensional space, such that for any given object, the range of indices, from the smallest index to ... |

153 | Fractals for secondary key retrieval
- Faloutsos, S
- 1989
(Show Context)
Citation Context ...was proposed [19]. Its improvement was suggested by Faloutsos [8], using Gray coding on the interleaved bits. A third method, based on the Hilbert curve [13], was proposed for secondary key retrieval =-=[11]-=-. In the mathematical context, these three mapping functions are based on different space-filling curves: the z curve, the Gray-coded curve and the Hilbert curve, respectively. Figure 1 illustrates th... |

104 |
Uber die stetige Abbildung einer Linie auf ein Flachenstuck
- Hilbert
- 1891
(Show Context)
Citation Context ...m the coordinates, which is called z-ordering, was proposed [19]. Its improvement was suggested by Faloutsos [8], using Gray coding on the interleaved bits. A third method, based on the Hilbert curve =-=[13]-=-, was proposed for secondary key retrieval [11]. In the mathematical context, these three mapping functions are based on different space-filling curves: the z curve, the Gray-coded curve and the Hilbe... |

96 |
Sur une courbe, qui remplit toute une aire plane
- Peano
(Show Context)
Citation Context ...tions of this paper and suggest future work. 2 Historical Survey and Related Work G. Peano, in 1890, discovered the existence of a continuous curve which passes through every point of a closed square =-=[21]-=-. According to Jordan's precise notion (in 1887) of continuous curves, Peano's curve is a continuous mapping of the closed unit interval I = [0; 1] into the closed unit square S = [0; 1] 2 . Curves of... |

85 |
Introduction to Topology and Modern Analysis
- Simmons
- 1963
(Show Context)
Citation Context ...Peano's curve is a continuous mapping of the closed unit interval I = [0; 1] into the closed unit square S = [0; 1] 2 . Curves of this type have come to be called Peano curves or space-filling curves =-=[28]-=-. Formally, Definition 2.1 If a mapping f : I ! E n (ns2) is continuous, and f(I) the image of I under f has positive Jordan content (area for n = 2 and volume for n = 3), then f(I) is called a space-... |

69 |
Partial match retrieval algorithms
- Rivest
- 1976
(Show Context)
Citation Context ...linear mapping that preserves locality: 1. In traditional databases, a multi-attribute data space must be mapped into a one-dimensional disk space to allow efficient handling of partial-match queries =-=[22]-=-; in numerical analysis, large multidimensional arrays [6] have to be stored on disk, which is a linear structure. 2. In image compression, a family of methods use a linear mapping to transform an ima... |

63 |
Space-filling curves: Their generation and their application to bandwidth reduction
- Bially
- 1969
(Show Context)
Citation Context ...ample, for the traveling salesman problem, the cities are linearly ordered and visited accordingly [2]. 2 5. Locality-preserving mappings are used for bandwidth reduction of digitally sampled signals =-=[4]-=- and for graphics display generation [20]. 6. In scientific parallel processing, locality-preserving linearization techniques are widely used for dynamic unstructured mesh partitioning [17]. Sophistic... |

60 |
Geometry I and
- Berger
- 1987
(Show Context)
Citation Context ...e called homeomorphic if there exists a continuous bijective mapping, f :X!Y , with a continuous inverse f 1 [12]. TABLE 1 Definition of Symbols For d ˆ 2, the set V is, by definition, a Jordan curve =-=[3]-=-, which is essentially a simple closed curve in R 2 . The set of surfaces of a polyhedron divides the d-dimensional space R d into two connected components, which may be called the interior and the ex... |

56 |
Spatial search with polyhedra
- Jagadish
- 1990
(Show Context)
Citation Context ...hat, in a d-dimensional space (ds3), accessing the minimum bounding hyper-rectangle of a given query region may incur additional non-consecutive disk accesses, and hence supports the argument made in =-=[15]-=- that the minimum bounding rectangle may not be a good approximation of a non-rectangular object. 5.3 Comparison with the Gray-coded and z curves It may be argued that it is not convincing to make a d... |

37 |
Graphs, surfaces and homology
- Giblin
- 1981
(Show Context)
Citation Context ...t curve into (or from) the polyhedron. 1 Two subsets X and Y of Euclidean space are called homeomorphic if there exists a continuous bijective mapping, f : X ! Y , with a continuous inverse f \Gamma1 =-=[12]-=-. 7 Thus, we expect that the number of clusters is approximately proportional to the perimeter or hypersurface area of the d-dimensional polyhedron (ds2). With this observation, the task is reduced to... |

35 |
Multiattribute Hashing Using Gray Codes
- Faloutsos
- 1986
(Show Context)
Citation Context ... mapping functions have been proposed in the literature. One based on interleaving bits from the coordinates, which is called z-ordering, was proposed [19]. Its improvement was suggested by Faloutsos =-=[8]-=-, using Gray coding on the interleaved bits. A third method, based on the Hilbert curve [13], was proposed for secondary key retrieval [11]. In the mathematical context, these three mapping functions ... |

34 |
A Comparative Analysis of Some Two-Dimensional Orderings
- Abel, Mark
- 1990
(Show Context)
Citation Context ...ng, which means the locality between objects in the multidimensional space being preserved in the linear space. It is widely believed that the Hilbert space-filling curve achieves the best clustering =-=[1, 14]-=-. In this paper, we analyze the clustering property of the Hilbert space-filling curve by deriving closed-form formulas for the number of clusters in a given query region of an arbitrary shape (e.g., ... |

27 |
Analysis of the Hilbert curve for representing two-dimensional space
- Jagadish
- 1996
(Show Context)
Citation Context ... same as the result given in [14]) and in general approaches one third of the perimeter of the query rectangle plus two thirds of the side length of the rectangle in the unfavored direction. Jagadish =-=[16]-=- derived closed-form, exact expressions of the average number of clusters for the Hilbert curve in a 2-dimensional grid, but only for 2 \Theta 2 and 3 \Theta 3 square regions. This is a special case o... |

25 |
Design features of a frontal code for solving sparse unsymmetric linear systems out-of-core
- Duff
- 1984
(Show Context)
Citation Context ...atabases, a multi-attribute data space must be mapped into a one-dimensional disk space to allow efficient handling of partial-match queries [22]; in numerical analysis, large multidimensional arrays =-=[6]-=- have to be stored on disk, which is a linear structure. 2. In image compression, a family of methods use a linear mapping to transform an image into a bit string; subsequently, any standard compressi... |

24 |
Geometry II
- Berger
- 1977
(Show Context)
Citation Context ... which is perpendicular to one of the d coordinate axes, where V is a subset of R d and homeomorphic 1 to a (d-1)-dimensional sphere S d\Gamma1 . For d = 2 the set V is, by definition, a Jordan curve =-=[3]-=-, which is essentially a simple closed curve in R 2 . The set of surfaces of a polyhedron divides the d-dimensional space R d into two connected components, which may be called the interior and the ex... |

22 |
The Space Efficiency of Quadtrees
- Dyer
- 1982
(Show Context)
Citation Context ...s the order of the Hilbert curve approximation grows into infinity. Several closely related analyses for the average number of 2-dimensional quadtree nodes have been presented in the literature. Dyer =-=[7]-=- presented an analysis for the best, worst and average case of a square of size 2 n \Theta2 n , giving an approximate formula for the average case. Shaffer [27] gave a closed formula for the exact num... |

14 |
Convergence with Hilbert’s space filling curve
- Butz
- 1969
(Show Context)
Citation Context ...nal Euclidean space). The generation of a 3-dimensional Hilbert curve was described in [14, 26]. A generalization of the Hilbert curve, in an analytic form, for higher dimensional spaces was given in =-=[5]-=-. In this paper, a d-dimensional Euclidean space with finite granularity is assumed. Thus, we use the k-th order approximation of a d-dimensional Hilbert space-filling curve (ks1 and ds2), which maps ... |

14 |
Analytical results on the quadtree decomposition of arbitrary rectangles
- Faloutsos
- 1992
(Show Context)
Citation Context ...ve a formula for the average number of blocks for such squares (averaged over all possible positions). Some of these formulae were generalized for arbitrary 2-dimensional and d-dimensional rectangles =-=[9, 10]-=-. 3 Asymptotic Analysis In this section, we give an asymptotic formula for the clustering property of the Hilbert space-filling curve for general polyhedra in a d-dimensional space. The symbols used i... |

14 |
Mapping multidimensional space to one dimension for computer output display
- Patrick, Anderson, et al.
- 1968
(Show Context)
Citation Context ...m, the cities are linearly ordered and visited accordingly [2]. 2 5. Locality-preserving mappings are used for bandwidth reduction of digitally sampled signals [4] and for graphics display generation =-=[20]-=-. 6. In scientific parallel processing, locality-preserving linearization techniques are widely used for dynamic unstructured mesh partitioning [17]. Sophisticated mapping functions have been proposed... |

14 |
Attribute based file organization in a paged memory environment
- Rothnie, Lozano
- 1974
(Show Context)
Citation Context ...index to the largest, includes few points not in the object itself. Consider a linear traversal or a typical range query for a database where record signatures are mapped with multi-attribute hashing =-=[24]-=- to buckets stored on disk. The linear traversal specifies the order in which the objects are fetched from disk as well as the number of blocks fetched. The number of nonconsecutive disk accesses will... |

13 |
A formula for computing the number of quadtree node fragments created by a shift
- Shaffer
- 1988
(Show Context)
Citation Context ...e been presented in the literature. Dyer [7] presented an analysis for the best, worst and average case of a square of size 2 n \Theta2 n , giving an approximate formula for the average case. Shaffer =-=[27]-=- gave a closed formula for the exact number of blocks that such a square requires when anchored at a given position (x; y); he also gave a formula for the average number of blocks for such squares (av... |

11 | Analysis of the nDimensional Quadtree Decomposition for Arbitrary Hyperrectangles
- Floutsos, Jagadish, et al.
- 1997
(Show Context)
Citation Context ...formula for the average number of blocks for such squares (averaged over all possible positions). Some of these formulae were generalized for arbitrary 2-dimensional and d-dimensional rectangles [9], =-=[10]-=-. 3 ASYMPTOTIC ANALYSIS In this section, we give an asymptotic formula for the clustering property of the Hilbert space-filling curve for general polyhedra in a d-dimensional space. The symbols used i... |

10 |
Yannis Manolopoulos. Analysis of the n-dimensional quadtree decomposition for arbitrary hyper-rectangles
- Faloutsos, Jagadish
- 1994
(Show Context)
Citation Context ...ve a formula for the average number of blocks for such squares (averaged over all possible positions). Some of these formulae were generalized for arbitrary 2-dimensional and d-dimensional rectangles =-=[9, 10]-=-. 3 Asymptotic Analysis In this section, we give an asymptotic formula for the clustering property of the Hilbert space-filling curve for general polyhedra in a d-dimensional space. The symbols used i... |

10 |
Partitioning unstructured computational graphs for nonuniform and adaptive environments
- Kaddoura, Ou, et al.
- 1995
(Show Context)
Citation Context ...led signals [4] and for graphics display generation [20]. 6. In scientific parallel processing, locality-preserving linearization techniques are widely used for dynamic unstructured mesh partitioning =-=[17]-=-. Sophisticated mapping functions have been proposed in the literature. One based on interleaving bits from the coordinates, which is called z-ordering, was proposed [19]. Its improvement was suggeste... |

7 |
Compression of TwoDimensional Images
- Lempel, Ziv
- 1985
(Show Context)
Citation Context ...k, which is a linear structure. 2. In image compression, a family of methods use a linear mapping to transform an image into a bit string; subsequently, any standard compression method can be applied =-=[18]-=-. A good clustering of pixels will result in a fewer number of long runs of similar pixel values, thereby improving the compression ratio. 3. In geographic information systems (GIS), run-encoded forms... |

7 |
A three-dimensional Hilbert-Curve
- Sagan
- 1993
(Show Context)
Citation Context ...t interval. Figure 3 describes how this process is to be carried out for the first three steps. It has been shown that the Hilbert curve is a continuous, surjective and nowhere differentiable mapping =-=[26]-=-. However, Hilbert gave the space-filling curve, in a geometric form only, for mapping I into S (i.e., 2-dimensional Euclidean space). The generation of a 3-dimensional Hilbert curve was described in ... |

5 |
An O(n log n) travelling salesman heuristic based on spacefilling curves
- Bartholdi, Platzman
- 1982
(Show Context)
Citation Context ...ge as sets of runs [1]. 4. Heuristics in computational geometry problems use a linear mapping. For example, for the traveling salesman problem, the cities are linearly ordered and visited accordingly =-=[2]-=-. 2 5. Locality-preserving mappings are used for bandwidth reduction of digitally sampled signals [4] and for graphics display generation [20]. 6. In scientific parallel processing, locality-preservin... |

4 |
and Christos Faloutsos. Analysis of the clustering property of Peano curves
- Rong
- 1991
(Show Context)
Citation Context ...rt curve (2), the Hilbert curve was the best in minimizing the number of clusters. The numbers within the parentheses are the average number of clusters for 2\Theta2 range queries. Rong and Faloutsos =-=[23]-=- derived a closed-form expression of the average number of clusters for the z curve, which gives 2.625 for 2 \Theta 2 range queries (exactly the same as the result given in [14]) and in general approa... |

4 |
ªAn Introduction to Disk Drive Modeling,º
- Ruemmler, Wilkes
- 1994
(Show Context)
Citation Context ...tkde@computer.org, and reference IEEECS Log Number 104359. æ 1041-4347/01/$10.00 ß 2001 IEEE rearranged into a number of groups of consecutive blocks by a database server or disk controller mechanism =-=[25]-=-. Since it is more efficient to fetch a set of consecutive disk blocks rather than a randomly scattered set in order to reduce additional seek time, it is desirable that objects close together in a mu... |

3 |
Sanjay Ranka. Partitioning unstructured computational graphs for nonuniform and adaptive environments
- Kaddoura, Ou
- 1995
(Show Context)
Citation Context ...led signals [4] and for graphics display generation [20]. 6. In scientific parallel processing, locality-preserving linearization techniques are widely used for dynamic unstructured mesh partitioning =-=[17]-=-. Sophisticated mapping functions have been proposed in the literature. One based on interleaving bits from the coordinates, which is called z-ordering, was proposed [19]. Its improvement was suggeste... |

3 |
An O(n logn) travelling salesman heuristic based on spacefilling curves
- Bartholdi, Platzman
- 1982
(Show Context)
Citation Context ...ge as sets of runs [1]. 4. Heuristics in computational geometry problems use a linear mapping. For example, for the traveling salesman problem, the cities are linearly ordered and visited accordingly =-=[2]-=-.s2 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 13, NO. 1, JANUARY/FEBRUARY 2001 Fig. 1. Illustration of space-filling curves. 5. Locality-preserving mappings are used for bandwidth redu... |

1 |
ªConvergence with Hilbert's Space Filling Curve,º
- Butz
- 1969
(Show Context)
Citation Context ...al Euclidean space). The generation of a 3-dimensional Hilbert curve was described in [14], [26]. Ageneralization of the Hilbert curve, in an analytic form, for higher-dimensional spaces was given in =-=[5]-=-. In this paper, a d-dimensional Euclidean space with finite granularity is assumed. Thus, we use the kth order approximation of a d-dimensional Hilbert space-filling curve (k 1 andd 2), which maps an... |

1 |
ªThe Space Efficiency of Quadtrees,º Computer Graphics and
- Dyer
- 1982
(Show Context)
Citation Context ...s the order of the Hilbert curve approximation grows into infinity. Several closely related analyses for the average number of 2-dimensional quadtree nodes have been presented in the literature. Dyer =-=[7]-=- presented an analysis for the best, worst, and average case of a square of size 2 n 2 n , giving an approximate formula for the average case. Shaffer [27] gave a closed formula for the exact number o... |

1 |
ªMultiattribute Hashing Using Gray Codes,º
- Faloutsos
- 1986
(Show Context)
Citation Context ... mapping functions have been proposed in the literature. One based on interleaving bits from the coordinates, which is called z-ordering, was proposed [19]. Its improvement was suggested by Faloutsos =-=[8]-=-, using Gray coding on the interleaved bits. Athird method, based on the Hilbert curve [13], was proposed for secondary key retrieval [11]. In mathematical context, these three mapping functions are b... |

1 |
ªFractals for Secondary Key Retrieval,º
- Faloutsos, Roseman
- 1989
(Show Context)
Citation Context ... was proposed [19]. Its improvement was suggested by Faloutsos [8], using Gray coding on the interleaved bits. Athird method, based on the Hilbert curve [13], was proposed for secondary key retrieval =-=[11]-=-. In mathematical context, these three mapping functions are based on different space-filling curves: the z curve, the Gray-coded curve, and the Hilbert curve, respectively. Fig. 1 illustrates the lin... |

1 |
ªSpatial Search with Polyhedra,º
- Jagadish
- 1990
(Show Context)
Citation Context ...that, in a d-dimensional space (d 3), accessing the minimum bounding hyperrectangle of a given query region may incur additional nonconsecutive disk accesses and, hence, supports the argument made in =-=[15]-=- that the minimum bounding rectangle may not be a good approximation of a nonrectangular object. 5.3 Comparison with the Gray-Coded and Z Curves It may be argued that it is not convincing to make a de... |

1 |
ªCompression of Two-Dimensional Images,º
- Lempel, Ziv
- 1984
(Show Context)
Citation Context ...k, which is a linear structure. 2. In image compression, a family of methods use a linear mapping to transform an image into a bit string; subsequently, any standard compression method can be applied =-=[18]-=-. Agood clustering of pixels will result in a fewer number of long runs of similar pixel values, thereby improving the compression ratio. 3. In geographic information systems (GIS), runencoded forms o... |

1 |
ªPartial Match Retrieval Algorithms,º
- Rivest
- 1976
(Show Context)
Citation Context ... linear mapping that preserves locality: 1. In traditional databases, a multiattribute data space must be mapped into a one-dimensional disk space to allow efficient handling of partial-match queries =-=[22]-=-; in numerical analysis, large multidimensional arrays [6] have to be stored on disk, which is a linear structure. 2. In image compression, a family of methods use a linear mapping to transform an ima... |

1 |
ªAnalysis of the Clustering Property of Peano Curves,º
- Rong, Faloutsos
- 1991
(Show Context)
Citation Context ...Hilbert curve (2), the Hilbert curve was the best in minimizing the number of clusters. The numbers within the parentheses are the average number of clusters for 2 2 range queries. Rong and Faloutsos =-=[23]-=- derived a closed-form expression of the average number of clusters for the z curve, which gives 2.625 for 2 2 range queries (exactly the same as the result given in [14]) and, in general, approaches ... |

1 |
ªAThree-Dimensional Hilbert Curve,º Int'l
- Sagan
- 1993
(Show Context)
Citation Context ...at interval. Fig. 3 describes how this process is to be carried out for the first three steps. It has been shown that the Hilbert curve is a continuous, surjective, and nowhere differentiable mapping =-=[26]-=-. However, Hilbert gave the space-filling curve, in a geometric form only, for mappingI intoS (i.e., 2-dimensional Euclidean space). The generation of a 3-dimensional Hilbert curve was described in [1... |